Evidence from a Field Experiment

Viewer
Transcript

Do Workers Feel Entitled to High Wages? Evidence from a Field Experiment Matthieu Chemin

AndrÈ Kurmann

McGill University

Drexel University

October 25, 2014

Abstract We present a Öeld experiment to test for wage entitlement in reciprocity-based models of e¢ciency wages; i.e. the tendency of workers to adopt their existing wage as the reference point when deciding on work e§ort. Fieldworkers administering a household survey were exposed to a sequence of wage increases and wage cuts. Reciprocity was measured with a dimension of e§ort that Öeldworkers perceived as unmonitored. Panel estimates provide evidence of a negative reciprocity response to wage cuts and wage entitlement. Wage entitlement is essential for reciprocity-based models to explain why Örms rarely cut wages, a phenomenon known as Downward Wage Rigidity. Keywords: Field experiment; Reciprocity; Wage entitlement; Downward Wage Rigidity JEL ClassiÖcation: C93, D03, J30



We thank Ernst Fehr, Luke Taylor, Matthew Weinberg as well as seminar participants at The Wharton School, McGill University, Paris School of Economics and Toulouse School of Economics for valuable comments. Chemin thanks the Social Sciences and Humanities Research Council of Canada (SSHRC) and the Fonds de Recherche sur la SociÈtÈ et la Culture (FQRSC) for Önancial support. Kurmann gratefully acknowledges the hospitality of The Wharton School where part of this project was completed. Contact information: [email protected]; [email protected].

1

ì...Employees usually have little notion of a fair or market value for their services and quickly come to believe that they are entitled to their existing wage, no matter how high it may be...î (Bewley, 2002, page 7).

1

Introduction

One of the most enduring questions in economics is why Örms rarely cut wages of their workers, even during severe recessions.1 A commonly proposed explanation for this observation ñ called Downward Wage Rigidity (DWR) ñ is reciprocity in labor relations.2 Introduced into modern economics under the name of ìpartial gift exchangeî and ìfair wage hypothesisî by Solow (1979) and Akerlof (1982), the theory posits that due to the inherent incompleteness of contracts, Örms rely at the least to some extent on workersí propensity to reciprocate a wage perceived as fair with a commensurate level of e§ort. For this theory to explain why Örms rarely cut wages, workers need to take their own existing wage as a reference point when evaluating the fairness of a new wage o§er; i.e. what Bewley (2002) calls wage entitlement in the above quote. To see why, suppose instead that the workerís fair wage reference depended primarily on the market wage the worker can expect to make outside of the Örm. But then, as job opportunities worsen during a downturn, the expected market wage and with it the workerís reference point would fall, making it optimal for Örms to cut wages. This would lower expected market wages further, leading to even further wage cuts. Likewise, if the workerís fair wage reference was primarily a function of peer wages inside the Örm, the Örm would Önd it optimal to simultaneously cut all wages, resulting again in the absence of DWR. By contrast, if workers take their own existing wage as an important reference point, then Örms may refrain from cutting wages even during severe recessions.3 In this paper, we conduct a Öeld experiment that allows us to formally test for wage entitlement in a real-world employment situation. The paper contributes to a growing literature using Öeld experiments to test for reciprocity in labor relations.4 Prominent examples 1

More speciÖcally, the distribution of wage changes in many countries is asymmetric, with a remarkable absence of wage cuts and a spike at zero wage change. See Dickens et al. (2007) for a summary of the evidence. Also see Fallick et al. (2011) and Daly et al. (2012) for evidence in U.S. data during the recent Great Recession. 2 See Kahneman, Knetsch and Thaler (1986); Blinder and Choi (1990); Levine (1993); Agell and Lundborg (1995, 1999); Campbell and Kamlani (1997); Bewley (1999) and the surveys by Bewley (2002); Howitt (2002); and Rotemberg (2006). 3 The consequences of di§erent reference points are analyzed in a modern dynamic macro context by Danthine and Donaldson (1990), Collard and De la Croix (2000), Danthine and Kurmann (2004, 2010) and Elsby (2009). These papers show that if the reference point equals the expected market wage, the model implies complete absence of wage rigidity, contradicting the empirical evidence. The same result would arise in the labor search environment with reciprocity-based preferences of Eliaz and Spiegler (2013). Also see Kaur (2013) for novel evidence with Indian data that is consistent with the implications of the above models. 4 There is also a large literature testing for reciprocity in laboratory settings. See Fehr, Kirchsteiger et Riedl (1993); Fehr and Falk (1999); Hannan, Kagel and Moser (2002); Charness, Frechette and Kagel (2004)

2

include Gneezy and List (2006), Bellemare and Shearer (2009), Hennig-Schmidt, Sadrieh and Rockenbach (2010) and Cohn, Fehr and Goette (2013) who report mixed evidence of wage raises being associated with higher e§ort; and Kube, Marechal and Puppe (2013) and Cohn, Fehr, Herrmann and Schneider (2014) who Önd robust evidence of wage cuts leading to lower work e§ort. The principal novelty of our paper is that we expose the same workers to a sequence of positive and negative wage changes over an extended period of time. This allows us to test not only for asymmetries in the workerís e§ort response but also for wage entitlement; i.e. whether a wage raise subsequently leads to an increase in the reference point considered as fair. By contrast, the Öeld experiments in the above-mentioned literature are short-term, exposing workers to one single wage change. This makes it impossible to evaluate the importance of wage entitlement and therefore the ability of fairness-based theories to explain DWR.5 The Öeld experiment took place in rural Kenya where we employed Öeldworkers over a 12-week period to administer a household survey of approximately 900 questions that was initially designed to evaluate a development project. Fieldworkers were paid per survey at a rate that was several times higher than the going market wage. After six weeks of work at a constant rate, the wage was exogenously increased by 45%. Three weeks later, the wage was reduced back to the original rate for one week. Finally, for the last two weeks, the wage was cut by 27% relative to the original rate. To assess the e§ect of the wage changes on reciprocal behavior, we require a measure of e§ort that Öeldworkers considered as unmonitored. Otherwise, it is impossible to disentangle the e§ects of reciprocal behavior from explicit incentives such as Öring threats or career motives that naturally arise in longer-term employment situations (e.g. Shapiro and Stiglitz, 1984; MacLeod and Malcomson, 1989, 1998). We overcome this challenge by measuring (the inverse of) e§ort through the rate of "inconsistencies" across answers to the di§erent questions of the survey. Fieldworkers were expected to turn in surveys of good quality but surveys were never checked for inconsistencies during the employment relationship for the simple reason that we as principal investigators had not established a list of possible inconsistencies at the time of data collection. Only more than a year later, after the survey or Charness and Kuhn (2007) among many others; and Fehr and Gaechter (2000a) and Fehr, Gaechter and Zehnder (2009) for extensive surveys. Also see Levitt and List (2007) and Al-Ubaydli and List (2012) who question the generalizability of laboratory Öndings to real-world situations. 5 Kube, Marechal and Puppe (2013), for example, employ workers for a 6-hour data entry task, paying them either a higher or lower hourly wage than advertised prior to employment. Workers with the higher wage show no evidence of higher productivity whereas workers with lower wage display a negative reaction. Cohn, Fehr, Hermann and Schneider (2014) hire teams of two workers to sell promotional cards for two consecutive weekends. The second weekend, hourly wages are randomly lowered for either one or both workers of some teams. The wage cuts lead to a signiÖcant decline in the number of cards sold, which is more than twice as large for workers whose team memberís wage is not cut, illustrating the importance of peer wages in the workerís reference. Neither experiment allows to test the extent to which the workersí fair wage reference was a§ected by the workersí existing wage, simply because there is no employment relationship with variations in wages prior to the experiment.

3

answers had been entered into an electronic database, did we compile such a list and compute the rate of inconsistencies for each survey via a computer algorithm. Given the complete absence of monitoring, Öeldworkers therefore had no reason to expect inconsistencies to be monitored, nor were they aware that such a measure would be computed ex-post.6;7 The sample consists of 2,864 survey observations, collected by 11 Öeldworkers. We use panel regressions to estimate the e§ects of the di§erent wage changes on inconsistencies, controlling for time-, Öeldworker- and survey-speciÖc e§ects. To correct for possible correlation of residuals within Öeldworker and weeks, standard errors are computed using the wild cluster bootstrap-t method of Cameron, Gelbach and Miller (2008) which takes into account the small number of Öeldworkers. To further address concerns about statistical inference in small samples, we follow Bloom et al. (2013) and perform permutation tests that are independent of sample size. We also exploit the 72 days of data per Öeldworker to implement the testing procedure of Ibragimov and Mueller (2010, 2014) which relies on large T (instead of large N) asymptotics. All results are robust to these tests. The main results are as follows. The 45% increase in the wage does not have a signiÖcant e§ect on inconsistencies, our inverse measure of unmonitored e§ort. By contrast, the wage cut back to baseline after the 3-week period of higher wages leads to a signiÖcant increase in the rate of inconsistencies of about 30% relative to the rate before the wage increase, even though the wage after this decrease was exactly the same as before the wage increase. The wage cut below the initial wage rate during the last two weeks of data collection results in an additional signiÖcant increase in inconsistencies. The estimates provide clear support for Bewleyís (2002) observation that workers quickly come to feel entitled to their wage, no matter how high it may be. Moreover, the estimates conÖrm the recent Önding in the literature that reciprocal behavior is asymmetric, with wage cuts eliciting a large negative reaction in e§ort. Our strategy of identifying the causal e§ect of wage changes on work e§ort by following Öeldworkers through time and switching wage treatments simultaneously for all Öeldworkers at an exogenously chosen time is close in spirit to the strategy adopted in other contexts by Bandiera, Barankay and Rasul (2005, 2007, 2009). Three considerations motivate this strategy. First and as emphasized above, exposing Öeldworkers to a sequence of wage raises and 6 Moreover, Öeldworkers knew from the beginning that the employment relationship would be of Önite duration and had no reason to believe that there would be other employment opportunities with us in the future (there were none). For the sceptical reader, we further note that if, despite the complete absence of monitoring, Öeldworkers expected inconsistencies to be measured later on and, despite our communication to the contrary, Öeldworkers expected future employment with us, then our results about the negative reciprocity e§ects of wage cuts would simply represent a conservative estimate. 7 To illustrate the importance of measuring reciprocal behavior through a measure of e§ort considered as unmonitored, we also consider a second (inverse) measure of e§ort in our experiment, called ìblanks and mistakesî, which occurred if a Öeldworker left a survey Öeld empty or made an obvious error. The crucial di§erence to inconsistencies is that blanks and mistakes were explicitly monitored, with the clear understanding that insu¢cient performance in this dimension would lead to dismissal. Our estimates indicate that blanks and mistake do not react signiÖcantly to any of the wage changes, consistent with the implication of a standard shirking model (e.g. Shapiro and Stiglitz, 1984).

4

wage cuts is essential to test for wage entitlement. Second, following the same Öeldworkers through time allows us to control for time-invariant sources of unobservable heterogeneity across Öeldworkers, thus increasing statistical power. This is especially relevant here since ability and perception of what constitutes a fair wage may di§er substantially across Öeldworkers (e.g. Cohn, Fehr and Goette, 2013). Third, simultaneously administering the same wage changes to all Öeldworkers ensures that the experiment is not contaminated by social comparison e§ects. As discussed in Bandiera, Barankay and Rasul (2011) and illustrated by the results in Shi (2010) and Cohn, Fehr, Herrmann and Schneider (2014), such social comparison e§ects are a Örst-order issue in Öeld experiments within Örms in which it is almost by deÖnition impossible to isolate treatment from control groups in order to avoid information spillover. A potential concern with our identiÖcation strategy is that the estimates are biased because of shocks that coincided with the exogenous wage changes but were not captured by variations in the time-, Öeldworker- and survey-speciÖc controls. We address this concern by basing all estimates on week-long averages so as to minimize the e§ects of random noise.8 Moreover, to control for potential seasonal factors, we exploit a similar survey collection done with the same households three years later in which (di§erent) Öeldworkers were paid the same wage over the entire survey period. The resulting counterfactual shows no change in inconsistencies and all of our results survive a di§erence-in-di§erence speciÖcation. Finally, we observe that the rate of inconsistencies follows a secular downward trend, interrupted by upward jumps at the time of the wage cuts. This rules out fatigue, end-of-employment, and selection e§ects as potential alternative explanations because all of these explanations would imply a gradual upward trend in the rate of inconsistencies. The paper expands on the existing literature in several important ways. First and as already discussed, our experimental design of exposing workers to a sequence of exogenous wage raises and wage cuts allows us to test not only for asymmetries in the e§ort response but also for the importance of wage entitlement. Both of these features are essential for reciprocity-based theories to explain DWR. The experimental design also helps in drawing a sharper contrast to theories of inequity aversion (e.g. Fehr and Schmidt, 1999; Bolton and Ockenfels, 2000) that o§er another rationale for results obtained in recent wage experiments. In the present context, equity aversion would be consistent with the observed drop in e§ort in response to the Örst wage cut only under the assumption that this wage cut led workers to increase their evaluation of the Örmís payo§. But since the Örst wage cut simply brought the wage rate back to the initial level, this assumption seems hard to defend, suggesting that a reciprocity-based theory combined with wage entitlement o§ers a more plausible explanation for the observed results. Second, the rate of inconsistencies that we use as the (inverse) measure of unmonitored e§ort is very di§erent from commonly observed quantity measures that the existing litera8

The results are robust to using 3-day averages instead of week-long averages.

5

ture has used as proxies for e§ort in short-term experiments. In our opinion, our measure captures the notion of work morale that the literature typically associates with reciprocal behavior and that Örms are concerned with because of the inherent di¢culty of controlling it; i.e., a cooperative attitude "...whereby gaps are Ölled, initiative is taken, and judgement is exercised..." (Williamson, 1985) and a willingness to make voluntary sacriÖces for the company (Bewley, 2002). It is then interesting to see that despite this di§erence, our experiment leads to similarly asymmetric e§ort reactions as the ones reported in recent papers based on piece-rates (e.g. Kube, Marechal and Puppe, 2013; Cohn, Fehr, Herrmann and Schneider, 2014).9 Third, the focus on wage entitlement relates our paper to another literature built on the idea that expectations act as a reference point (e.g. Koszegi and Rabin, 2006). Our results suggest that these expectations are ináuenced importantly by existing wages, consistent with Hart and Moore (2008) who argue that existing contracts serve as a natural reference point.10 The remainder of the paper proceeds as follows. Section 2 describes the context, measures of e§ort, and the theory behind the experiment. Section 3 provides details of the wage experiment. Sections 4 and 5 present econometric methodology and results. Section 6 discusses potential alternative explanations. Section 7 concludes.

2

Context, measures of e§ort, and theory

This section describes the context in which the wage experiment was conducted, the measures of e§ort used to quantify the e§ects of the experiment, and a theoretical framework motivating the experimental design.

2.1

Context

The experiment was conducted in the context of a household survey that took place in a rural part of Kenya in 2007. Data from a second survey conducted in 2010 is used a counterfactual. The primary purpose of both surveys was not the wage experiment but the collection of socioeconomic information for development projects in which the households participated. We Örst provide details about the 2007 survey. The survey consisted of an average of about 900 questions per survey (depending on the size and activities of the household). The number of households to be surveyed was initially targeted at 2500, with an estimated 9 In the context of an experiment on multi-tasking, Al-Ubaydli, Andersen, Gneezy and List (2012) also distinguish between an observed quantity and an unobserved quality measure of e§ort. The relation of our experiment with theories of multi-tasking is brieáy discussed at the end of Section 6. 10 A well-known application of the reference point theory is income targeting (e.g. Camerer et al., 1997 or Abeler et al., 2011). Interestingly, we Önd no evidence in our experiment that workers change the number of administered surveys in response to the exogenous changes in the wage rate, as income targeting would suggest.

6

duration of 8 to 10 weeks. As explained below, the survey collection was later extended to a total of 12 weeks. To administer the surveys, we hired 12 individuals from the local community who were selected based on a competitive interview process. The hired Öeldworkers were aged between 19 and 37, 7 women and 5 men, with a median age of 24. All were economically average residents, all spoke English but none had university education, and previous work experience was limited to occasional low paid employment and/or home production (e.g. farming). From the beginning, Öeldworkers were told that the employment relationship would be limited to this particular survey collection. Indeed, in 2007 we had no plans to conduct another survey in 2010; and none of the 2007 Öeldworkers were hired again in 2010. Prior to the start of the survey collection, Öeldworkers participated in a 4-day training camp that was held at a secluded lodge to ensure full focus on the training and to foster a sense of team spirit. The workers received a specially designed T-shirt and they were informed that upon successful completion of the survey collection, they would be invited to a weekend retreat to another community in Kenya. Furthermore, the PIs promised to organize a CV workshop and to provide a letter of recommendation. All of these perks were o§ered in an e§ort to generate a cooperative work environment and should dampen any negative reciprocity behavior to wage cuts. After the 4-day training camp and a Önal performance assessment, the Öeldworkers started administering the surveys. They were supervised by two students with previous survey experience who assigned Öeldworkers to households and collected the completed surveys at the end of each workday. During the Örst two weeks of work, one of the authors was present to help the two students in supervising and Öne-tuning the survey collection. Thereafter, regular work without our direct presence started. In the beginning, Öeldworkers typically administered between two and three surveys per day, six days a week. As the survey collection became more e¢ciently organized, Öeldworkers increased their workload but were explicitly discouraged from administering more than 4 surveys per day. This target was generally well respected throughout the entire experiment, with the average number of surveys per Öeldworker per day equalling 3.8 from week 4 onward.11 The 2010 survey was organized very similarly. It was conducted with a subset of households interviewed in 2007 and consisted of an average of about 700 questions per survey. We hired 15 new Öeldworkers (all di§erent from the workers hired in 2007) of comparable socioeconomic background and age. They received the same training and their work was supervised by two students as in 2007. 11

Some Öeld workers occasionally exceeded and one Öeld worker consistently exceeded the limit of 4 surveys per day. All of the results reported below are robust to whether we consider only the Örst four surveys per Öeld worker per day; and to whether we exclude the Öeld worker who consistently exceeded the limit of 4 surveys per day. Also, our regressions always control for the number of surveys administered per day, and for Öeldworkers Öxed e§ects.

7

2.2

Unmonitored e§ort

The principal (inverse) measure of e§ort we consider is the rate of inconsistencies per survey. An inconsistency occurs when two or more answers recorded in a survey contradict each other. For example, one respondent answered in the occupation section of the survey that he/she was not farming but indicated in the time-use section that he/she spent time farming. We argue that the rate of inconsistencies per survey is a good measure of reciprocal behavior of the Öeldworker towards the employer because resolving inconsistencies was (i) costly to achieve for the Öeldworker; (ii) beneÖcial for the employer; and (iii) perceived as unmonitored by the Öeldworker. First, detecting and resolving inconsistencies meant that Öeldworkers needed to pay extra attention to potentially conáicting answers when administering the survey; áip back and forth through the 20 pages of the survey when an inconsistency was detected; ask the respondent to clarify his/her answers; and resolve the inconsistency. This was an onerous and timeconsuming process, especially because respondents were often household heads who command substantial respect in their community. Second, while we as employers did not explicitly mention inconsistencies during neither the training camp nor the actual survey collection, we repeatedly emphasized that we needed "good data" to rigorously evaluate the development project that the survey was intended to cover. Inconsistencies represent an obvious example of "bad data" and thus, Öeldworkers likely perceived resolving inconsistencies as beneÖcial for the employer. Third, neither we nor the supervisors monitored in any way inconsistencies during the survey collection for the simple reason that we had not established a list of possible inconsistencies at that time. Nor did anyone know during the survey collection that such a measure would be computed ex-post. Only more than a year after the survey collection had ended and the di§erent survey answers had been manually entered into an electronic database, did we identify 93 possible inconsistencies and compile the number of inconsistencies for each survey via a computer algorithm (see the appendix for a list of the inconsistencies). For all means and purposes of this experiment, inconsistencies therefore constitute an (inverse) measure of e§ort that Öeldworkers perceived as unmonitored.12 Since Öeldworkers were paid per survey and did not receive any direct or indirect reward for turning in surveys with less inconsistencies, inconsistencies reáect how much Öeldworkers identiÖed with the survey collection and how willing they were to ìgo the extra mileî for the employer. In our opinion, inconsistencies therefore provide an (inverse) measure of work 12

Fieldworkers knew from the beginning that the employment relationship would be of Önite duration and had no reason to believe that there would be other employment opportunities with us in the future (there were none). Hence, even in the unlikely event that Öeldworkers attributed some non-zero probability to the possibility that we would devise a method to check for inconsistencies later on, this would not change the fact that Öeldworkers considered inconsistencies as unmonitored during the employment relationship. For the sceptical reader, we further note that if, despite the complete absence of controls, Öeldworkers expected inconsistencies to be monitored during the employment relationship, then our results would simply represent a lower bound on the reciprocity e§ects of wage changes.

8

morale that the literature typically associates with reciprocal behavior and that Örms are concerned with because of the inherent di¢culty of controlling it; i.e., a cooperative attitude "...whereby gaps are Ölled, initiative is taken, and judgement is exercised" (Williamson, 1985) and a willingness to make voluntary sacriÖces for the company (Bewley, 2002).

2.3

Monitored e§ort

To illustrate the importance of measuring reciprocal behavior through a measure of e§ort considered as unmonitored, we consider a second (inverse) measure of e§ort, called ìblanks and mistakesî. As explained below, we will also use this monitored measure as a control variable for potential changes in the environment. A blank or mistake occurred if a survey Öeld was left empty (i.e., the Öeld worker omitted to ask the question / pencil in the answer) or the Öeld contained a clear error (e.g. reporting zero members in a household). In contrast to inconsistencies, Öeldworkers were explicitly trained to avoid blanks and mistakes. In addition, the supervisors randomly checked each day between 40% and 100% of all incoming surveys for blanks and mistakes. We therefore label blanks and mistakes as ìmonitored errorsî. If a survey with too many blanks and mistakes was detected, the Öeldworker was given a warning and, in case of repeated subpar performance, risked dismissal. This threat of dismissal was real. In fact, during the Örst two weeks of employment, one Öeldworker consistently made numerous avoidable mistakes. Despite further training, performance did not improve, and the Öeldworker was laid o§.

2.4

Theory

To motivate the experimental design, we present a stylized model of e¢ciency wages. The core of the model consists of a reduced-form e§ort function that Akerlof and Yellen (1990) proposed as a characterization of the fair wage hypothesis. We use this e§ort function to illustrate important implications of wage entitlement. Similar implications obtain for alternative, more micro-founded formulations of e§ort. As the appendix shows, for example, a version of Rabinís (1993) two-player game with reciprocity as adapted to a modern e¢ciency wage setting by Danthine and Kurmann (2008, 2010) delivers the same results as the ones presented here. The model consists of a Örm that hires workers to perform a particular task (e.g. administer a household survey). Per completed unit of the task, the worker is paid wage rate w. The quality of a completed unit (e.g. the proportion of consistent answers per survey) is given by y = f (e; ), (1) where the function f is increasing in both arguments; e denotes the amount of e§ort the worker puts into completing the task (e.g. resolving inconsistencies); and  is an idiosyncratic

9

factor a§ecting quality that is outside of the workerís control (e.g. the proportion of questions that are consistently answered in the Örst place).13 Due to the di¢culty in observing e and , the Örm cannot directly contract on e§ort. Following Akerlof and Yellen (1990), we assume that in the absence of explicit incentives, a worker supplies e§ort according to e =  min(w=w ; 1);

(2)

where w denotes the reference wage the worker perceives as fair. For  = 0, the worker has no reciprocity concerns and supplies e = 0 for any wage o§er (the normalization to zero is without loss of generality). For  > 0, the worker cares about reciprocating a given wage o§er with a proportional level of e§ort up to some full e§ort level equal to . Whether the e§ort function is non-di§erentiable at w = w or not is unimportant for our test of wage entitlement; but it is consistent with our estimation results.14 The literature has made di§erent assumptions about the determinants of the fair wage reference w , ranging from expected wages outside of the Örm (Akerlof, 1982) over the Örmís ability to pay (Kahneman, Knetsch and Thaler, 1986) and wages of peer workers in the same Örm (Akerlof and Yellen, 1990) to the workerís own existing wage (Bewley, 1999). Since the focus of our experiment is on wage entitlement, we express the fair wage reference as w = w1 + (1  )w; ~

(3)

where w1 denotes the workerís own existing wage; w~ all determinants that are external to the worker; and  2 [0; 1] the relative importance of w1 for w (i.e. wage entitlement). Equations (2) and (3) have important implications for the design of our experiment, as illustrated in Figure 1. To begin, consider Panel A which shows the e§ort function in (2) for two di§erent wage references w (solid line) and w0 > w (dashed line). Wage entitlement can be tested by exposing workers to a sequence of wage changes: Örst a wage raise from some w (point A) to some w+ > w (point B), followed by a wage cut back to the initial w. If workers have wage entitlement, i.e.  > 0 in equation (3), then the fair wage reference increases from w to w0 , resulting in the e§ort function to pivot downward and e§ort after the return of the wage back to the initial level to be lower (point C).15 Panel B shows tests of reciprocity in the existing literature. As discussed in the introduction, the experiments in this literature consist of one single wage cut (e.g. from w to w ) 13 To simplify notation, we abstract from unit and worker subscripts; i.e. the quality for task completion j performed by worker i is yij = f (ei ;  j ). 14 In the micro-founded model of the appendix, the non-di§erentiability with e =  for w > w arises either if the marginal disutility of providing e§ort becomes inÖnity above maximum e§ort level  or if the worker believes that e§ort above  does not lead to further increases in the Örmís payo§. 15 The location of the initial w above w is arbitrary. It implies that w ~ < w and  < 1. All results would apply for w ~ > w or  = 1 (in which case w = w initially).

10

or one single wage increase (e.g. from w to w+ ) in an employment relationships of much shorter duration than ours (typically a few hours). By design, these experiments can only provide information about the shape of the e§ort function but not about the importance of wage entitlement; i.e. whether a wage raise subsequently leads to an increase in the reference wage and therefore a shift in the e§ort function.16 We close the theory section with two important points concerning the empirical implementation of our experiment. First, suppose as in Shapiro and Stiglitzí (1984) shirking model of e¢ciency wages that Örms are able to monitor e§ort with some probability and if e§ort is below some threshold level e, the worker is Öred. As shown formally in the appendix, the result is an incentive compatibility constraint (ICC) providing a lower bound for e§ort supply e = e as long as the wage remains above threshold level w that makes the worker indi§erent between providing zero e§ort and risk getting Öred and providing e§ort e = e. Panel C of Figure 1 depicts this constraint together with the e§ort function arising from reciprocity concerns. As long as e > , wage changes that keep the wage above w have no e§ect on e§ort even if the worker has reciprocity concerns. This illustrates the importance of having an unmonitored measure of e§ort when testing for reciprocal behavior in ongoing employment relationships. Otherwise, it is impossible to disentangle the e§ects of reciprocity concerns from explicit incentives. Second, as indicated by equation (1), measuring e§ort through a quality variable y may be subject to considerable noise, depending on the importance of the idiosyncratic factor . Likewise, y may vary independently of the wage if there are idiosyncratic di§erences in  across workers and time, for example due to variations in the workerís propensity to reciprocate or the disutility of providing e§ort (see the model in the appendix for details). When testing for reciprocity, it is therefore crucial to have a large number of observations to average out di§erences in  or . This can be accomplished either by having a large crosssection of workers (large N) or by having many repeated observations per worker over time (large T). The latter has the advantage that one can also control for permanent di§erences in  across workers, thus increasing statistical power. We will return to this point below when discussing our empirical methodology.

3

Experimental design

In both 2007 and 2010, Öeldworkers were paid per administered survey. At any given time, the wage rate was the same for all workers. The experiment took place during the 2007 survey and consists of a sequence of exogenous changes in the wage-rate. The 2010 survey 16

Panel B also illustrates that tests of asymmetry of the e§ort function depend importantly on the position of the initial wage w relative to the reference wage w . If w  w , as shown in Panel B, then a wage increase does not lead to an increase in e§ort. If w < w , then a wage increase does lead to an increase in e§ort. To our knowledge, this point has been ignored so far and may help explain why the existing literature has found mixed results about the e§ects of wage increases on e§ort.

11

was conducted without changes in the wage rate and serves as a counterfactual.

3.1

Wage changes

Figure 2 summarizes the di§erent wage treatments that occurred over the 12 weeks of employment of the 2007 survey. Work weeks started on Wednesdays; hence, the weeks in Figure 2 represent intervals from Wednesday through the following Tuesday. During the Örst 6 weeks of employment, Öeldworkers were paid 150 Ksh per survey for each of the Örst three surveys per day and 100 Ksh per survey for every additional survey per day. We call this scheme the ì150/100 treatmentî from hereon. The rate beyond the third survey was set lower to reduce disappointment for days when only 3 surveys were possible. Since Öeldworkers turned in approximately 4 surveys a day from week 4 onward, daily earnings amounted to about 550 Ksh ñ three to four times more than what a Öeldworker could hope to earn elsewhere.17 In the beginning of work week 7, the wage rate was raised to 200 Ksh per survey, including for surveys beyond the third survey of the day. This new ì200/200 treatmentî represents an increase in daily earnings of about 45%. The announcement came without any information on whether the raise was permanent or not. The 200/200 treatment continued for three weeks. In the beginning of week 10, compensation reverted back to the initial 150/100 treatment (i.e., 150 Ksh for each of the Örst three daily surveys and 100 Ksh for every additional survey). A week later, in the beginning of week 11, the piece-rate was cut to 100 Ksh/survey for all surveys. This ì100/100 Ksh treatmentí, which represented a cut of about 27% in daily earnings, remained in e§ect for the last two weeks of the experiment. The Öeldworkers did not know in advance about any of the wage changes, nor did they know that they were taking part in an experiment that would be used for research. The experiment is therefore a natural Öeld experiment as deÖned by Harrison and List (2004).

3.2

Communication of wage changes

All wage changes were communicated through the supervisors, either by reading an email (for the wage raise) or by playing a pre-recorded video from us (for the two wage cuts).18 The exact wording of the wage announcements is available in the Appendix. None of the wage changes was preannounced and did not come with any information about the length of the new wage treatment. In theory, to measure reciprocity e§ects, no justiÖcation should be given for either a wage increase or a wage cut. In practice, employees expect at least some justiÖcation for a wage 17

At the time of the surveys, 550 Ksh were worth about US$7.4. In this rural community, no other data collection or equivalent development work was going on at the same time. 18 The wage cuts were communicated by video so as to avoid any suspicion on part of the workers that the supervisors embezzled the money destined to pay for wages.

12

change. Otherwise, they may believe that the project is mismanaged, which could adversely a§ect performance not because of reciprocal behavior but because of workersí beliefs about the quality of the employer. To address this communication issue, we chose to justify the wage cuts in a way that was both minimal and designed to dampen possible negative reciprocity e§ects. In this way, any negative reciprocity e§ects that we may Önd should be considered as a lower bound. SpeciÖcally, the Örst wage cut in the beginning of week 10 back to the original 150/100 treatment was justiÖed on grounds that our budget did not allow us to continue the higher 200/200 treatment. This justiÖcation conveyed a reduction in the Örmís ability to pay which by itself would have a positive impact on reciprocal behavior if the workersí fair wage reference was a§ected by rent-sharing motives (i.e., Kahneman, Knetsch and Thaler, 1986). The second wage cut to the 100/100 treatment at the end week 10 was justiÖed by the fact that we had reached the planned objective of 2500 surveys and therefore the end of the originally agreed upon employment contract. In the beginning of week 11, we reminded Öeldworkers of this agreement and at the same time o§ered them to collect additional surveys for three more weeks at a lower rate. Since 100Ksh per survey was still well above the best available outside option, this 3-week extension can be considered, if anything, as an unanticipated opportunity to earn more money. Indeed, all Öeldworkers decided to stay on for the 3 additional weeks of work even though they were free not to participate (without losing out on any of the promised perks after the end of survey collection). Finally, so as to avoid possible end-of-employment e§ects, we informed workers at the beginning of week 13 (i.e., one week before the planned end of the additional employment) that since the target number of households had been reached, survey collection would halt immediately. Fieldworkers continued to be paid 400 Ksh per day for the last week without work so as to honor the promised employment contract.

3.3

Ethics

While workers did not know at any point during or after the employment relationship that they were taking part in an experiment, the di§erent wage changes and related justiÖcations respected the ethical principles of no breach of promise and beneÖcence (see Bandiera, Barankay and Rasul, 2011).19 First, as employers, we respected or exceeded all agreed upon contracts. In week 7, we increased the wage with no information about duration. Reverting back to the initial wage in week 10 therefore did not represent a breach of promise. In week 11, after the end of the initial data collection, we o§ered a new employment relationship. Even though this new relationship came with a lower wage, it therefore did not represent a breach of promise either. Moreover, the justiÖcation given for the wage cuts (a limited budget) was true: the original budget allowed collection of exactly 2500 surveys at the initial 19

All ethical approvals from the relevant authorities were obtained and are available upon request.

13

150/100 treatment. Only the extra Önancial assistance from one of the PIs for the explicit purpose of the wage experiment made it possible to increase wages in weeks 7-9 and to extend employment for 3 weeks. Second, the experiment did not cause any decrease in total compensation. To the contrary, the experiment allowed Öeldworkers to make more money, Örst because surveys in weeks 7-9 were paid at a higher rate and second because there was an extension in employment for 3 weeks. Fieldworkers were free to terminate employment at any time but everyone chose to keep working through the entire experiment.

3.4

Small sample concerns

The small number of Öeldworkers in the experiment raises two issues. The Örst issue is that the sample size may be too small to identify signiÖcant e§ects of wage changes on work e§ort. The second issue is that standard statistical inference is inadequate in this context. We discuss each issue and describe our attempts to address them. 3.4.1

SigniÖcance of results

Even though our experiment contained only 11 Öeldworkers, the results presented below show statistically signiÖcant e§ects to wage cuts. There are three reasons for this. First, the wage changes to which we exposed the Öeldworkers were large, increasing the chances that we would detect signiÖcant treatment e§ects conditional on such e§ects being present. Second, each Öeldworker collected approximately four surveys per day for six days a week during 12 weeks, thereby providing us with a large number of observations over the course of the di§erent treatments. This allows us to control for time-invariant sources of unobservable heterogeneity across Öeldworkers, thus increasing statistical power. Third and related, taking multiple measurements of outcomes at relatively short time intervals allows us to average out idiosyncratic shocks. As shown by McKenzie (2012), this reduces the sample size needed to detect a given treatment e§ect, especially if there is low autocorrelation in the outcome. This is true in our case: the correlation of the rate of inconsistencies within Öeldworkers is only 0.15. The low correlation should not come as a surprise. As illustrated by the above model, the rate of inconsistencies depends not only on a Öeldworkerís e§ort response to a given wage treatment but also on other factors including the quality of the householdís answers to the survey, the di¢culty in getting to the survey location, or daily variations in the Öeldworkerís general level of motivation to perform work. Taking a single measurement of the outcome variable would therefore be unlikely to accurately capture the Öeldworkerís e§ort response. In this context, observing multiple surveys on few Öeldworkers (large T, small N) can deliver a more powerful test than observing few surveys for many Öeldworkers (small T, large N).

14

3.4.2

Statistical inference

The small number of Öeldworkers in the experiment implies that standard tests based on cross-sectional asymptotics to justify a normal approximation are inadequate. To address this issue, we follow Bloom et al. (2013) who are confronted with a similar small-sample problem in another context and pursue three alternative approaches. First, we compute standard errors using the wild cluster bootstrap-t method of Cameron, Gelbach and Miller (2008) which takes into account the small number of Öeldworkers.20 Second, since we have 72 days of data for each Öeldworker, we can implement the testing procedure of Ibragimov and Mueller (2010, 2014) which relies on large T (instead of large N) asymptotics. Third, we perform permutation tests that are independent of sample size. Details of all three approaches are provided in the Appendix.

4

Descriptive results

Table 1 reports descriptive statistics for the rate of inconsistencies ñ our (inverse) measure of unmonitored e§ort ñ and the rate of blanks and mistakes ñ our inverse measure of monitored e§ort ñ in the 2007 survey. The table also shows the statistics for the rate of inconsistencies in the 2010 survey that we use as a counterfactual. For 2007, a total of 2,864 surveys were administered with an average of 4.65 percent of inconsistencies per survey (out of an average of 93.8 possible inconsistencies) and an average rate of blanks and mistakes of 1.31 percent per survey (out of an average of 911.6 possible blanks and mistakes). For 2010, a total of 438 surveys were administered, with an average of 1.08 percent inconsistencies per survey (out of an average of 153 possible inconsistencies per survey). The di§erence in the rate of inconsistencies between 2007 and 2010 is due to the fact that the two surveys, while similar in nature, were composed of di§erent questions. The below estimations take this di§erence into account with Öxed e§ects. As the standard deviations and extreme values in Table 1 indicate, there is considerable variation in the rate of inconsistencies across surveys. Closer inspection reveals that the majority of this variation is idiosyncratic at the survey level and not systematically associated with particular Öeldworkers, consistent with the low intra-Öeldworker correlation of the rate of inconsistencies noted above. Figure 3 reports the evolution of the di§erent e§ort measures over time. To smoothen out idiosyncratic variation, we take weekly averages of each measure. As shown in Panel A, the average rate of inconsistencies in the 2007 survey exhibits a secular downward trend 20

If Öeldworkers ináuenced each other during the experiment, in particular when the unexpected wage changes occurred, then this may also lead to correlated residuals within weeks. We Önd, however, that the intra-week correlation of the rate of inconsistencies is 0:008 and not signiÖcantly di§erent from zero, contradicting this hypothesis. Nonetheless, we alternatively clustered standard errors at the week level and found that none of the inference changes (results are available upon request).

15

over the course of the employment relationship, suggesting that Öeld workers became more e¢cient over time in detecting and resolving inconsistencies (i.e. learning-by-doing). This trend is interrupted by upward jumps at the time of the wage cuts (week 10 and week 11), suggesting a negative reciprocity e§ect that lead to an increase in the rate of inconsistency. Conversely, there is a relatively large drop in the rate of inconsistencies in week 7 at the time of the 200/200 regime, suggesting a positive reciprocity e§ect. To assess the signiÖcance of these e§ects and more particularly to make comparisons between the original 150/100 treatment in weeks 1 to 6 and the return to the 150/100 treatment in week 10, it is crucial to control for the secular downward trend as well as all other potentially confounding factors. This motivates the panel estimation of the next section. Panel B of Figure 3 displays the weekly average rates of blanks and mistakes in the 2007 survey. In contrast to the rate of inconsistencies, there is no secular downward trend. To the contrary, apart from an initial drop, the rate of blanks and mistakes is trending upwards over time.21 This is a priori consistent with a shirking model as in Shapiro and Stiglitz (1984): the less time is remaining in the employment relationship, the smaller is the expected cost of getting caught shirking and therefore the lower is the amount of monitored e§ort that the worker optimally provides.22 Panel C of Figure 3, Önally, displays the weekly average rates of inconsistencies in the 2010 survey. As described above, the 2010 survey was collected over a longer period of time than the 2007 survey. We can therefore consider di§erent 12-week periods. Since we want to use the 2010 survey primarily to control for possible seasonal e§ects, we focus here on the same 12-week period between mid-May and mid-August as in 2007. As the Ögure shows, the rate of inconsistencies in 2010 exhibits a smaller downward trend than in 2007, which is consistent with the fact that the survey had already been underway for several weeks at this point and that learning-by-doing e§ects are strongest in the beginning.23 Similar to 2007, the 2010 data shows a drop in the rate of inconsistencies in week 7, suggesting a potential seasonal e§ect. Importantly, however, the 2010 data does not exhibit an increase in the rate of inconsistencies in weeks 10 and 11, indicating that the jumps in the rate of inconsistencies during the same weeks of 2007 are not due to seasonal factors. Otherwise, similar jumps would be present in the 2010 data. 21 The initial drop in the rate of blanks and mistakes is explained by a beginning-of-work e§ect during which the Öeld workers learned that supervisors were indeed monitoring blanks and mistakes. It is during this initial phase that one of the workers was laid o§ for insu¢cient performance on the blanks and mistakes dimension (see the above description of context). 22 See the model in the appendix for a formal exposition of this point. Also note that this e§ect is not present for inconsistencies by the fact that workers perceive inconsistencies as being unmonitored. 23 Indeed, the beginning of the 2010 survey shows a similar secular downward trend in the average rate of inconsistencies as the 2007 survey.

16

5

E§ect of wage changes on unmonitored e§ort

We estimate the e§ect of wage changes on unmonitored e§ort with the following panel regression rateijt = i + Dwage + Xij +  1 t +  2 t2 + uijt ,

(4)

where rateijt denotes the rate of inconsistencies for survey j collected by Öeldworker i on survey day t, our inverse measure of unmonitored e§ort. The coe¢cient i captures a Öxed worker e§ect; Dwage is a vector of dummy variables for each of the wage treatments (described in detail below); and Xij represents a set of observable controls that change systematically across survey and Öeldworkers.24 The term  1 t+ 2 t2 captures secular trends due for example to learning-by-doing as discussed in the previous section. We specify this trend in quadratic form so as to provide the estimation with áexibility to accommodate e§ects that are either slowly dying out over time or manifest themselves only over time. As shown in Section 7, all results are robust to linear time trends and time trends of order three or higher.25 Finally, uijt denotes the residual. Its properties are investigated in the next section. The key coe¢cients of interest are contained in the vector  and measure the e§ect that the di§erent wage dummies in Dwage have on the rate of inconsistencies. In deÖning these dummies, we face a choice of time interval per dummy. We choose to deÖne one separate dummy per week. This is a natural benchmark because all wage changes occurred on Wednesdays and because it averages out random noise while keeping time intervals su¢ciently small to captures jumps in the rate of inconsistencies due to wage changes.26 We deÖne week 6 as the reference week, which is the last week of the initial 150/100 treatment before the increase to the 200/200 treatment.27 Vector Dwage therefore contains eleven dummies taking on the value of 1 for the respective week and 0 otherwise; and the di§erent coe¢cients in  = [ 1 ; ::: 5 ;  7 ; ::: 12 ] capture the impact of each week relative to the omitted reference week 6. Remembering the timing of the wage changes described in Figure 2,  7 captures the average rate of inconsistencies of the 200/200 treatment in week 7, as opposed to the 150/100 treatment during reference week 6;  10 captures the impact of returning to the 150/100 treatment in week 10 relative to the initial 150/100 treatment during the reference period in week 6; and so forth. In what follows, we present results with the full set of controls, using all surveys of the di§erent Öeldworkers. In Section 7 below, we show that the results are robust to dropping di§erent controls; and to using only the Örst three surveys per Öeldworker per day. 24

SpeciÖcally, Xij includes indicators for the area in which the interview took place; the relationship of the interview respondent to the household head, and the number of surveys done per day by the Öeldworker. 25 Note that the time trend is identiÖed separately from the wage dummies in Dwage because we make it a function of survey day t. 26 Results are robust to using smaller 3-day regimes and are available from the authors upon request. 27 As shown in Section 7, all results are robust to choosing week 5 as the reference week.

17

5.1

Results

Column (1) of Table 2 displays the estimates for the wage dummies on the rate of inconsistencies. Robust standard errors clustered at the Öeldworker are reported in parentheses below each estimate. Stars indicating signiÖcance at 10%, 5% or 1% are calculated with the wild cluster bootstrap-t method (Cameron, Gelbach and Miller, 2008), described in more detail in the appendix, that explicitely addresses the low number of clusters. The Örst Öve coe¢cients ( 1 to  5 ) show that there is no signiÖcant di§erence in the rate of inconsistencies between reference week 6 and the Örst Öve weeks during which compensation is at the initial 150/100 treatment. The next three coe¢cients ( 7 to  9 ) capture the e§ect of the increase in compensation to the 200/200 treatment in weeks 7 to 9. None of these e§ects are signiÖcant either. The last three coe¢cients ( 10 to  12 ) show the e§ect of the wage cut back to the initial 150/100 treatment in week 10 and the further wage cut to the 100/100 treatment in weeks 11 and 12. Relative to week 6, the rate of inconsistencies jumps signiÖcantly by 1.38 percentage points in week 10, and increases further by 2 to 2.4 percentage points in weeks 11 and 12. This represents an increase of 31 percent and about 50 percent, respectively. As the p-values for the di§erences in coe¢cients  10   9 and  11   10 indicate, the increase in the rate of inconsistencies is signiÖcant not only with respect to reference week 6 but also with respect to the weeks directly preceding the wage cuts. Columns (2) and (3) of Table 2 report p-values for the large-T procedure of Ibragimov and Mueller (2010, 2014), which exploits asymptotics in the time-series instead of the crosssection, and permutation tests, which have exact size. Both tests indicate that the di§erence in the rate of inconsistencies in weeks 10, 11 and 12 relative to week 6 is highly signiÖcant. The di§erence between week 9 and 10 and the di§erence between week 10 and 11 are also signiÖcant. None of the other wage dummy coe¢cients are signiÖcantly di§erent from the one in week 6. The results strongly conÖrm the panel estimates: the return in week 9 to the 150/100 treatment and the reduction in week 10 to the 100/100 treatment led to a highly signiÖcant increase in the rate of inconsistencies. Several key implications come out of these results. First, as discussed in Section 2, the absence of a signiÖcant response in unmonitored e§ort to the 45% increase in daily earnings in weeks 7-9 is consistent with the type of e§ort function presented in (2) provided that the initial wage that is equal or above the fair wage reference. Given that the initial 150/100 treatment amounted to a daily compensation that was three to four times higher than the going market compensation, this is a distinct possibility.28 Second, the signiÖcant increase in the rate of inconsistencies in week 10 relative to week 6 even though the wage treatment was exactly the same provides clear evidence of wage 28

An alternative explanation for the absence of a signiÖcant response in the rate of inconsistencies in weeks 7-9 is that positive reciprocity e§ects from the wage increase are very short-lived and therefore average out over the week-long interval. This would be consistent with Gneezy and List (2006) who Önd in their Öeld experiment that positive reciprocity disappears already after a few hours of work. As we show in the appendix, however, there is no evidence of such a short-lived drop in the rate of inconsistencies.

18

entitlement. This result is all the more striking because the daily compensation implied by this wage treatment was several times higher than the going market wage; we as the employer went to great lengths to foster a cooperative work environment; and the wage cut back to the initial treatment was justiÖed with budget limitations. Our estimates thus o§er strong support for Bewleyís (2002) quote from the beginning of the paper. Workers quickly adapted to the higher 200/200 treatment in weeks 7-9 and used it as the new reference point against which to assess the fairness of new wage o§ers. Third, the signiÖcant increase in the rate of inconsistencies in week 10 relative to week 9 and week 11 relative to week 10 conÖrms the recent results by Kube, Marechal and Puppe (2013) and Cohn, Fehr, Herrmann and Schneider (2014) of negative reciprocity. This is all the more interesting because, as discussed above, our experimental design and our measure of e§ort are both very di§erent from the ones used in these papers. The increase in the rate of inconsistencies between week 10 and 11 further suggests that reciprocity is not a 0-1 decision as it is sometimes modeled but a continuous choice as described by the theory presented above.

5.2

Counterfactual

Since wage changes in our experiment were administered simultaneously to all Öeldworkers, identiÖcation of the causal e§ects of wage changes on e§ort occurs through the comparison of Öeldworkersí performance over time. As emphasized above, this design ensures that the experiment is not contaminated by social comparison e§ects. At the same time, a concern with our identiÖcation is that the estimates could be biased because of shocks that coincided with the exogenous wage changes but were not captured by variations in the time-, Öeldworker- and survey-speciÖc controls. Note Örst that all of our estimates are based on week-long averages, which reduces considerably the e§ects of random noise.29 Moreover, extensive debrieÖng with the supervisors did not reveal any obvious longer-lasting shocks such as an extended period of rain, changes to the macroeconomic environment, or festivities / holidays that could have adversely a§ected work performance during weeks 10-12 when wages were reduced. Nevertheless, we cannot rule out other unobserved seasonalities that coincided with the wage cuts. To address this concern, we use the 2010 survey that was administered to a subset of the same households but during which no change in wages occurred. This counterfactual allows us to identify the causal e§ect of wage changes on the rate of inconsistencies under the assumption that the rate of inconsistencies would have evolved in the same way in 2007 and 2010 had wages remained unchanged in 2007. As described above, since the primary objective here is to control for unobserved seasonal e§ects, we select the 12-week period in the 2010 survey during which the 2007 survey was 29

The results are robust to using 3-day averages instead of week-long averages.

19

run. Section 7 below shows that the results reported here are robust to alternative 12-week periods. We Örst estimate the rate of inconsistencies in the 2010 data separately on weekly placebo wage dummies, conditioning on the same set of Öeldworker Öxed e§ects, observable controls and time trend speciÖcation as for the 2007 estimation in Table 2. In strong contrast to the 2007 estimates, the results (not shown here) indicate that none of the wage dummies is signiÖcantly di§erent from the week 6 reference. If anything, there is a slight decrease in the rate of inconsistencies during weeks 10, 11 and 12 of the 2010 survey. Next, we stack the 2007 and 2010 data and estimate the e§ect of the wage changes in 2007 as a di§erence-in-di§erence between the two surveys. Column (1) of Table 3 shows the panel estimate with wild cluster bootstrap-t standard errors. Columns (2) and (3) report the p-values for the Ibragimov-Muller large-T procedure and the permutation test applied to the di§erence between the two samples for a given coe¢cient.30 As is readily apparent, the placebo wage dummies in 2010 have no signiÖcant e§ect on the rate of inconsistencies; and the increase in the rate of inconsistencies in weeks 10, 11 and 12 of the 2007 sample when the wage cuts occurred remains signiÖcant. In fact, controlling with the 2010 sample leads to a small increase in the magnitude of the point estimates in weeks 10-12 relative to the estimates reported in Table 2. The counterfactual conÖrms the above results, adding weight to a causal interpretation. If the week dummies spuriously captured seasonal factors, the e§ect of the placebo wage dummies should be similar in the 2010 data collection.

6

E§ect of wage changes on monitored e§ort

As discussed towards the end of the theory section, if wages are su¢ciently high, explicit constraints imposed by monitoring may prevent workers from reciprocating a wage considered as unfair with lower e§ort. Hence the importance of having a measure of e§ort that workers consider as unmonitored. To illustrate this point, we consider our second (inverse) measure of e§ort, blanks and mistakes. As explained above, blanks and mistakes were controlled regularly by the supervisors and one Öeldworker was Öred in the beginning of the employment relationship because of unsatisfactory performance along this dimension. It is therefore reasonable to assume that Öeldworkers perceived blanks and mistakes as monitored with the potential for punishment when caught shirking. Table 4 shows the panel estimates for the rate of blanks and mistakes from the 2007 survey. The estimates are conditional on the same set of control variables as above; and standard errors and p-values are computed using the same procedures. As the estimates 30

See the appendix for details on the permutation test as applied in this case.

20

show, the point estimates on the wage dummies in weeks 10-12 when the wage cuts occurred are small and insigniÖcant, suggesting that wages were indeed high enough for the no-shirking constraint to outweigh workerís negative reciprocity behavior with respect to blanks and mistakes. This underlines the importance of having an unmonitored measure of e§ort when testing for reciprocity in ongoing employment relationships.31 At the same time, we observe a signiÖcant change in the rate of blanks and mistakes in week 5, although the point estimate is small and the p-values associated with the IbragimovMueller procedure and the permutation test are above the 10% mark. Nevertheless, this suggests that monitored e§ort may in fact be sensitive to changes in the environment. Under the assumption that the e§ort required to resolve inconsistencies and the e§ort required to avoid blanks and mistakes are related, this makes blanks and mistakes a potential alternative to control for unobserved shocks that coincided with the wage cuts. We therefore perform a di§erence-in-di§erence estimation of inconsistencies and blanks and mistakes. As Table 5 shows, all of the results reported in Table 2 are robust to this check, providing further evidence against the hypothesis that the increase in rate of inconsistencies during the weeks of the wage cuts was driven by coincidental adverse shocks.

7

Alternative explanations and robustness checks

In this section, we consider a number of alternative explanations of our results and report a series of robustness checks.

7.1

Coincidental changes speciÖc to resolving inconsistencies

We Örst consider coincidental adverse shocks that would have a§ected Öeldworker e§ort. Thereafter, we consider coincidental adverse shocks that would have a§ected the quality of householdsí answers. 7.1.1

Fatigue and end-of-employment e§ects

A Örst alternative explanation is that Öeldworkersí e§ort in resolving inconsistencies decreased over time because of fatigue or end-of-employment e§ects. This would imply a gradual increase in the rate of inconsistencies towards the end of the data collection that is unrelated to the wage cuts. As shown in the appendix, however, this alternative explanation 31

Another interesting implication about the absence of signiÖcant changes in the rate of blanks and mistakes in response to the wage changes is that it Öeldworkers in our case did not trade o§ unmonitored e§ort spent on resolving inconsistencies with monitored e§ort avoiding blanks and mistakes, as the theory of Holmstrom and Milgrom (1991) would predict. One of the reasons for this absence of trade-o§ may again be that the wage is su¢ciently high for the no-shirking constraint to bind during all times of the experiments. This type of non-di§erentiability consists a potentially interesting extension of the Holmstrom and Milgrom theory.

21

is contradicted by a basic characteristic of the data: the rate of inconsistencies gradually decreases over time, interrupted by jumps at the time of the wage cuts. Likewise, inspection of the 2010 data shows no evidence of an upward trend in the rate of inconsistencies throughout the entire collection period. To the contrary, the Örst few weeks show clear evidence of a downward trend, as in 2007.32 Based on these results, we conclude that fatigue and end-of-employment e§ects are not a plausible explanation for our results. Instead, the gradual decrease in the rate of inconsistencies for both surveys suggests that workers beneÖtted from learning-by-doing e§ects. This in turn implies that the rate of inconsistencies is ináuenced actively by the Öeldworkers ability / willingness to resolve inconsistencies, as assumed in the theory presented above. 7.1.2

Loss of conÖdence in employer

A related alternative explanation is that the sequence of exogenous wage changes made Öeldworkers lose conÖdence in the ability of the employer to manage the project. This could have led Öeldworkers to believe that correcting surveys for inconsistencies to assure ìgood qualityî is not important. This alternative explanation is unlikely for the same reason as above; i.e. it would imply that the rate of inconsistencies gradually increased starting in week 7 when the Örst wage announcement occurred and remained high until the end. This is contradicted by the data which shows a gradual decrease in the rate of inconsistencies over time, interrupted by jumps at the time of the wage cuts. Furthermore, as explained in Section 3, we were careful to thoroughly justify the wage cuts so as to prevent Öeldworkers from thinking that the project was mismanaged. DebrieÖng with some of the Öeldworkers after the end of the survey collection conÖrmed that they did not think the project was mismanaged. Based on these arguments, we conclude that loss of conÖdence is not responsible for the increase in inconsistencies either. 7.1.3

Coincidental changes in the number of surveys completed

A third alternative explanation is that during the three weeks of the 200/200 wage treatment, Öeldworkers formed a new reference income target that they tried to maintain after the return to the 150/100 treatment and then the 100/100 treatment by increasing the number of surveys per day. If true, such an increase in surveys per day could have led Öeldworkers to put less e§ort into resolving inconsistencies. As already noted in Section 2, however, the average daily number of surveys per Öeldworker remained stable throughout the experiment. We conÖrm this formally with a panel regression of the number of daily surveys per Öeldworker on the di§erent wage dummies and 32

We also run di§erence-in-di§erence regressions using either the 12 Örst weeks or the 12 last weeks of the 2010 survey to formally control for fatigue and end-of-employment e§ects, respectively. All results are robust to these alternative di§erence-in-di§erence speciÖcations.

22

other control variables in (4) and Önd all coe¢cients to be far from signiÖcant. Moreover, all of our panel estimates control for the number of daily surveys done per Öeldworkers. Column (1) of Table 6 conÖrms that estimates without this control are very similar to the baseline results in Table 2.33;34 Based on these results, we conclude that there are no coincidental changes in the number of surveys.

7.2

Coincidental changes to frequency of inconsistent answers

Another alternative explanation for our results is that the rate of inconsistencies increased in weeks 10-12 because of a coincidental drop in the quality of householdsí answers. Since the order in which the households were interviewed in 2007 is not the same as in 2010, it is conceivable that the di§erence-in-di§erence estimates with the 2010 data does not adequately control for this possibility. For example, suppose Öeldworkers systematically interviewed Örst the households for which it was easy to meet with the household head (the preferred individual to interview) and only later interviewed the households for which the household head was unavailable and instead a next-of-kin provided the answers. As next-of-kins may not be aware of as many details about the household than household heads, this could have resulted in a drop in the quality of respondentsí answers towards the end of the survey. To address this concern, we control in all regressions for 18 dichotomous variables that code the relationship of the respondent to the household head. Furthermore, robustness checks reported in Column (2) of Table 6 show that the results are very similar independent of whether these respondent controls are included or not. Alternatively, one might be concerned that surveys became systematically harder to interview over time because Öeldworkers selected to interview the ìeasiestî respondents at the beginning of the data collection, and kept the ìhardestî ones for the end. This is unlikely for a number of reasons. First, the list of households to interview throughout the experiment were organized according to sublocations, and within a sublocation, supervisors assigned households randomly to Öeldworkers. Second, we made sure that the di§erent wage changes did not coincide with a change in sublocation. Third, sublocations were chosen randomly. Fourth, all of our regressions control for sublocation Öxed e§ects. Column (3) of Table 6 conÖrms that the results are not dependent on the inclusion of these controls. Based on these considerations, we conclude that a coincidental drop in the quality of householdsí answers is unlikely to provide an alternative explanation for our results. 33

For space reasons, Table 6 does not show the coe¢cients for the initial weeks of the wage experiment. The coe¢cients are, however, included in all regressions and turn out insigniÖcant in line with the results reported in Table 2. 34 P-values for the Ibragimov-Mueller procedure and permutation tests for the di§erent coe¢cients are reported in the appendix. They conÖrm all the results reported in Table 6.

23

7.3

Other robustness checks

Columns (4) ñ (7) of Table 6 report a number of additional robustness checks for our panel regression in Table 2. Column (4) shows that none of the results change when the reference week is changed to week 5. Additionally, this column shows no signiÖcant change in inconsistencies for week 6, indicating that Öeldworkers did not anticipate the wage increase in week 7. Column (5) shows that none of the results change when the two training weeks prior to the regular work relationship are included (the weeks when one of the PIs was present). Column (6) shows that the main conclusions of the paper are robust to the replacement of the quadratic time trend with a linear time trend. With a linear time trend, the estimates are, however, somewhat less strong since the linear time trend picks up less of the secular decrease in inconsistencies that follows the jump in response to the wage cuts. By contrast, all results are robust to introducing a time trend of order three more. Column (7), Önally, checks the robustness of our results by using only the Örst three surveys for each day. All results are conÖrmed: (i) there is no signiÖcant reaction in week 7 when the wage per survey is increased to the 200/200 treatment; (ii) inconsistencies increase signiÖcantly as the wage returns to the baseline 150/100 treatment in week 10; and (iii) inconsistencies increase even further as the wage drops to the 100/100 treatment in week 11. Table 7 further explores the particular wage structure of the survey. The Örst part tests whether, during weeks 1 to 6 when the initial 150/100 treatment was in place, there were possible e§ects on the di§erence between inconsistencies and blanks and mistakes of wage changes within each day. As the estimates show there is no signiÖcant di§erence between the third survey (paid 150 Ksh) and the fourth survey and beyond (paid 100 Ksh). Hence, the negative reciprocity e§ects found for wage changes across time in Table 2 do not apply to wage changes within each day. This suggests that workersí reciprocity depends on changes in the wage contract as opposed to the details of a given contract, lending further support to Bewleyís (2002) conclusion that employees have little notion of a fair or market value in absolute terms. The second part of Table 7 shows that there is also a strong and signiÖcant increase in inconsistencies for the Örst three surveys per day in week 11, paid 100 Ksh each, relative to the fourth survey per day in weeks 1 to 6 even though this fourth survey was paid the same 100 Ksh and was administered at the end of the day. This test provides further conÖrmation of the wage entitlement e§ect discussed above.

8

Conclusion

This paper tests for wage entitlement in reciprocity-based models of e¢ciency wages by performing a Öeld experiment in a real-world employment situation. The novelty of the paper relative to existing Öeld experiments in the literature on reciprocity in labor relations

24

is that we follow workers over a 12-week period and estimate the e§ects of a sequence of wage increases and wage cuts. To disentangle the explicit incentives that naturally arise in such longer-term relationships, we construct a measure of e§ort that workers perceived as truly unmonitored. The two main results coming out of our experiment are that (i) workers quickly adapted their fair wage reference to a new higher level in response to the wage raise, which subsequently ináuenced the e§ort response to a given wage o§er, i.e. what Bewley (2002) calls wage entitlement; and (ii) workers exhibited a pronounced negative e§ort response to wage cuts, even though wages throughout the entire experiment were several times higher than the going market wage. As discussed in the introduction, wage entitlement is a necessary condition for reciprocitybased models of e¢ciency wages to explain why Örms are typically reluctant to cut wages ñ a phenomenon known as DWR. As Bewley (1999) argues: "...resistance to pay reduction comes primarily from employers, not from workers or their representatives, though it is anticipation of negative employee reactions that makes employers oppose pay cutting. The claim that wage rigidity gives rise to unexploited gains from trade is invalid, because a Örm would lose more money from the adverse e§ects of cutting pay than it would gain from lower wages and salaries." (page 430-31). Viewed in this way, the Öeld experiment represents an example of what a Örm should not do, with the negative reaction of workers to the wage cuts conÖrming Bewleyís point.

25

References [1] Abeler, J., A. Falk, L. Goette and D. Hu§man, 2011. Reference Points and E§ort Provision. American Economic Review 101, 470-492. [2] Agell, J. and P. Lundborg, 1995. Theories of pay and unemployment: survey evidence from Swedish manufacturing Örms. Scandinavian Journal of Economics 97, 295ó307. [3] Agell, J. and P. Lundborg, 1999. Survey evidence on wage rigidity: Sweden in the 1990s. FIEF Working Paper 154. [4] Akerlof, G., and J. Yellen, 1990. The Fair-Wage E§ort Hypothesis and Unemployment. Quarterly Journal of Economics, 105 (1990), 255-283. [5] Akerlof, G. A., 1982. Labor Contracts as Partial Gift Exchange. Quarterly Journal of Economics, 97, 543- 569. [6] Al-Ubaydli, O. and J. A. List, 2012. On the generalizability of experimental results in economics, in Frechette, G. and A. Schotter (eds) The Methods of Modern Experiments, Oxford University Press. [7] Al-Ubaydli, O., S. Andersen, U. Gneezy and J. A. List, 2012. Carrots that look like sticks: Toward an understanding of multitasking incentive schemes. Southern Economic Journal, forthcoming. [8] Anderson, M., 2008 Multiple Inference and Gender Di§erences in the E§ects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects. Journal of the American Statistical Association, 103:484, 1481-1495. [9] Bandiera, O., Barankay, I., and I. Rasul, 2005. Social Preferences and the Response to Incentives: Evidence from Personnel Data. Quarterly Journal of Economics, 120(3): 917ñ62. [10] Bandiera, O., Barankay, I., and I. Rasul, 2007. Incentives for Managers and Inequality Among Workers: Evidence From a Firm-Level Experiment. The Quarterly Journal of Economics, MIT Press, vol. 122(2), pages 729-773, 05. [11] Bandiera, O., Barankay, I., and I. Rasul, 2009. Social Connections and Incentives in the Workplace: Evidence From Personnel Data. Econometrica, Econometric Society, vol. 77(4), pages 1047-1094, 07. [12] Bandiera, O., Barankay, I., and I. Rasul, 2011. Field Experiments with Firms. Journal of Economic Perspectives, American Economic Association, vol. 25(3), pages 63-82, Summer. 26

[13] Bellemare, C., and B. Shearer, 2009. Gift Giving and Worker Productivity: Evidence from a Firm Level Experiment. Games and Economic Behavior, vol. 67, pp. 233-244 [14] Bewley, T. F., 1999. Why Wages Donít Fall During a Recession. Cambridge: Harvard University Press. [15] Bewley, T. F., 2002. Fairness, Reciprocity, andWage Rigidity. Cowles Foundation Discussion Paper No. 1383. [16] Blinder, A. S., and D. H. Choi, 1990. A Shred of Evidence on Theories of Wage Stickiness. The Quarterly Journal of Economics, 105 (1990), 1003-1015. [17] Bloom, N., B. Eifert, A. Majahan, D. McKenzie and J. Roberts, 2013. Does Management Matter? Evidence from India. Quarterly Journal of Economics 128(1), 1-51. [18] Bolton, G. E. and A. Ockenfels, 2000. A Theory of Equity, Reciprocity and Competition. American Economic Review 90, 166-193. [19] Camerer, C., L. Babcock, G. Loewenstein and R. Thaler, 1997. Labor Supply of New York City Cabdrivers: One Day at a Time. Quarterly Journal of Economics 112(2), 407-41. [20] Cameron, C., J. Gelbach, and D. Miller, 2008. Bootstrap-Based Improvements for Inference with Clustered Errors. Review of Economics and Statistics 90(3), 414ñ427. [21] Campbell, C., Kamlani, K., 1997. The Reasons for Wage Rigidity: Evidence from Survey of Firms. Quarterly Journal of Economics, 112, 759-789. [22] Charness, G., G. R. Frechette, and J. H. Kagel, 2004. How Robust Is Laboratory Gift Exchange? Experimental Economics, 7 (2004), 189-205. [23] Charness, G., and P. Kuhn, 2007. Does Pay Inequality A§ect Worker E§ort? Experimental Evidence. Journal of Labor Economics, 25 (2007), 693-723. [24] Cohn, A., E. Fehr, and L. Goette, 2013. Fair Wages and E§ort: Evidence from a Field Experiment. Management Science, forthcoming. [25] Cohn, A., E. Fehr, B. Herrmann, and F. Schneider, 2014. Social Comparison in the Workplace: Evidence from a Field Experiment. Journal of the European Economic Association 12(4), 877-898. [26] Collard, F., De la Croix, D., 2000. Gift Exchange and the Business Cycle: The Fair Wage Strikes Back. Review of Economic Dynamics, 3, 166-193. [27] Daly, Mary, Bart Hobijn and Brian Lucking. 2012. Why Has Wage Growth Stayed Strong? Federal Reserve Bank of San Francisco Economic Letter 2012-10. 27

[28] Danthine, J.-P. and J. B. Donaldson, 1990. E¢ciency Wages and the Real Business Cycle. European Economic Review 34, 1275-1301. [29] Danthine, J.-P. and A. Kurmann, 2010. The Business Cycle Implications of Reciprocity in Labor Relations. Journal of Monetary Economics, vol. 57(7), 837-850. [30] Danthine, J.-P. and A. Kurmann, 2008. The Macroeconomic Consequences of Reciprocity in Labor Relations. Scandinavian Journal of Economics, vol. 109(4), 857-881. [31] Danthine, J. P., Kurmann, A., 2004. Fair Wages in a New Keynesian Model of the Business Cycle. Review of Economic Dynamics, 7, 107-142. [32] Dickens, W., Goette, L., Groshen, E., Holden, S., Messina, J., Schweitzer, M., Turunen, J., and M. Ward, 2007. How Wages Change: Micro Evidence from the International Wage Flexibility Project. Journal of Economic Perspectives, Volume 21, N.2, Spring 2007, Pages 195ñ214. [33] Elsby, M., 2009. Evaluating the Economic SigniÖcance of Downward Nominal Wage Rigidity. Journal of Monetary Economics. [34] Eliaz, K. and R. Spiegler, 2013. Reference Dependence and Labor Market Fluctuations. NBER Macroeconomic Annual 28(1), 159-200. [35] Falk, Armin & Fischbacher, Urs, 2006. A theory of reciprocity. Games and Economic Behavior, Elsevier, vol. 54(2), pages 293-315. [36] Fallick, Bruce, Michael Lettau and William Wascher. 2011. Downward Nominal Wage Rigidity in the United States during the Great Recession. Working paper. [37] Fehr, Ernst, Georg Kirchsteiger, and Arno Riedl. 1993. Gift Exchange and Ultimatum in Experimental Markets. Vienna Economics Papers 9301, University of Vienna, Department of Economics [38] Fehr, E. and A. Falk, 1999. Wage Rigidity in a Competitive Incomplete Contract Market. Journal of Political Economy, 107, 106-134. [39] Fehr, E. and S. G‰chter, 2000a. Fairness and Retaliation: The Economics of Reciprocity. Journal of Economic Perspectives, 14, 159-181. [40] Fehr, E. and K. M. Schmidt, 1999. A Theory of Fairness, Competition, and Cooperation. Quarterly Journal of Economics 114, 817-868. [41] Gneezy, U., and J. List, 2006. Putting Behavioral Economics to Work: Field Evidence of Gift Exchange. Econometrica, 74 (2006), 1365-1384.

28

[42] Hannan, R. L., J. H. Kagel, and D. V. Moser, 2002. Partial Gift Exchange in an Experimental Labor Market: Impact of Subject Population Di§erences, Productivity Di§erences, and E§ort Requests on Beha. Journal of Labor Economics, 20 (2002), 923-951. [43] Harrison, G. W. and J. A. List, 2004. Field Experiments. Journal of Economic Literature 42, 1009-1055. [44] Hart, O. and J. Moore, 2008. Contracts as Reference Points. Quarterly Journal of Economics 113, 1-48 [45] Howitt, P., 2002. Looking Inside the Labor Market: A Review Article. Journal of Economic Literature 40, 125-138. [46] Holmstrom, B. and P. Milgrom, 1991. Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design. Journal of Law, Economics, and Organization 7, 24-52. [47] Ibragimov, R. and U. K. M¸ller, 2010. t-Statistic Based Correlation and Heterogeneity Robust Inference. Journal of Business and Economic Statistics 28(4), 453-468 [48] Ibragimov, R. and U. K. M¸ller, 2014. Inference with Few Heterogenous Clusters. Working paper. [49] Kahneman, D., Knetsch, J. L., Thaler, R., 1986. Fairness as a Constraint of ProÖt Seeking: Entitlements in the Market. American Economic Review, 76, 728-241. [50] Kaur, S., 2013. Nominal Wage Rigidity in Village Labor Markets. Working paper. [51] Kim, M. and R. Slonin, 2010. The e§ect of the gift exchange with multidimensional e§ort: Evidence from a hybrid Öeld-lab experiment. [52] Kˆszegi, B. and M. Rabin, 2006. A Model of Reference-Dependent Preferences. Quarterly Journal of Economics 121(4), 1133-1165. [53] Kube, S., M. MarÈchal, and C. Puppe, 2013. Do Wage Cuts Damage Work Morale? Evidence from a Natural Field Experiment. Journal of the European Economic Association 11, 853-870. [54] Levine, D. I., 1993. Fairness, Markets, and Ability to Pay: Evidence from Compensation Executives. American Economic Review 93(5), 1241-59. [55] Levitt, S.D., & List, J.A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? The Journal of Economic Perspectives, 21(2), 153-174.

29

[56] MacLeod, B. W. and J. Malcomson, 1989. Implicit Contracts, Incentive Compatibility, and Involuntary Unemployment. Econometrica 57, 447-480. [57] MacLeod, B. W. and J. Malcomson, 1998. Motivation and Markets. American Economic Review 99, 112-145. [58] McKenzie, D., 2012. Beyond Baseline and Follow-up: The Case for More T in Experiments. Journal of Development Economics 99, 210-221. [59] Rabin, M., 1993. Incorporating Fairness into Game Theory and Economics. American Economic Review, 83, 1281-1302. [60] Rotemberg, J., 2006. Altruism, Reciprocity and Cooperation in the Workplace, in Serge Christophe Kolm and Jean Mercier Ythier, eds., Handbook on the Economics of Giving, Reciprocity and Altruism, vol. 2, North Holland, pp 1371-1407. [61] Shapiro, C., Stiglitz, J. E., 1984. Equilibrium Unemployment as a Worker Discipline Device. American Economic Review, 74, 433-444. [62] Shi, L., 2010. Incentive E§ect of Piece-Rate Contracts: Evidence from Two Small Field Experiments. The B.E. Journal of Economic Analysis & Policy, Berkeley Electronic Press, vol. 10(1), pages 61. [63] Solow, R. M., 1979. Another Possible Source of Wage Rigidity. Journal of Macroeconomics, 1, 79-82. [64] Williamson, O., 1985. The Economic Institutions of Capitalism: Firms, Markets, Relational Contracting. New York, Free Press, 1985.

30

Figure 1: Testing reciprocity and wage entitlement

31

Figure 2: Timing of wage changes.

32

5.5 Inconsistencies 4.5 5 4

Week 7 (200/200) Week

Week 10 Week 11 (150/100) (100/100)

0

Blanks and mistakes .5 1

1.5

Figure 3, Panel A: Average rate of inconsistencies per week in 2007 survey.

Week 7 (200/200) Week

Week 10 Week 11 (150/100) (100/100)

Figure 3, Panel B: Average rate of blanks and mistakes per week in 2007 survey.

33

1.5 Inconsistencies .5 1 0

Week 7 Week

Week 10 Week 11

Figure 3, Panel C: Average rate of inconsistencies per week in exact same weeks of 2010 survey.

34

Table 1: descriptive statistics Inconsistencies Blanks and Mistakes Inconsistencies 2007 2007 2010 Average total potential number per survey 93.8 911.6 153 Average rate across surveys 4.44 1.31 1.08 Standard deviation 2.44 2.03 1.24 Maximum rate 23 33.56 8 Minimum rate 0 0.09 0

35

Table 2: E§ect of wage changes on rate of inconsistencies (Reference period: Week 6; 150/100 treatment)

Hypothesis testing method: Week 1; 150/100 treatment ( 1 ) Week 2; 150/100 treatment ( 2 ) Week 3; 150/100 treatment ( 3 ) Week 4; 150/100 treatment ( 4 ) Week 5; 150/100 treatment ( 5 ) Week 7; 200/200 treatment ( 7 ) Week 8; 200/200 treatment ( 8 ) Week 9; 200/200 treatment ( 9 ) Week 10; 150/100 treatment ( 10 ) Week 11; 100/100 treatment ( 11 ) Week 12; 100/100 treatment ( 12 )  10   9 (p-values)  11   10 (p-values) Fieldworker Öxed e§ects Sublocation of interview Öxed e§ects Respondent controls Time trend, Time trend squared Number of surveys per day Observations R-squared

(1) Wild cluster bootstrap-t -0.207 (1.122) -0.354 (0.949) -0.338 (0.702) -0.311 (0.422) -0.088 (0.288) -0.058 (0.356) 0.171 (0.345) 0.559 (0.392) 1.375 (0.543)** 1.996 (0.630)*** 2.412 (0.982)** 0.02 0.06 Yes Yes Yes Yes Yes 2864 0.17

(2) Ibragimov-Mueller (p-value) 0.43

(3) Permutation test (p-value) 0.45

0.91

0.91

0.97

0.98

0.90

0.92

0.74

0.74

0.82

0.84

0.87

0.86

0.13

0.13

0.02

0.01

0.01

0.004

0.04

0.02

0.01 0.07

0.01 0.06

OLS regressions. * signiÖcant at 10%; ** signiÖcant at 5%; *** signiÖcant at 1% according to the Wild cluster bootstrapt method. Robust standard errors in parentheses, clustered at the Öeldworker level. The dependent variable is the rate of inconsistencies (number of inconsistencies in a survey divided by the total number of potential inconsistencies, multiplied by 100). The reference category is the 6th week where the wage was set at 150. A time trend, and a time trend squared, are included to take into account learning e§ects. Fieldworker Öxed e§ects are included. Respondentsí controls (sublocation Öxed e§ects, and relationship to household head) are included. Beta10-Beta9 reports the p-value of the Wild cluster bootstrap-t method comparing Beta10 to Beta9 (see appendix for greater details). Column (2) presents the Ibragimov-Mueller p-values. We compute the Öeldworker-by-Öeldworker parameter estimates treating the estimates as draws from independent (but not identically distributed) normal distributions, and report the p-value of a one sample t-test. To compare Beta10 to Beta9, we report the p-value of a two sample t-test of the 11 Beta10 compared to the 11 Beta9. Column (3) presents the permutation test p-values for testing the null hypothesis that the wage of 150 Ksh in week 10 has no e§ect on inconsistency rate relative to the wage of 150 Ksh in week 6, by constructing a permutation distribution of the 11 estimates for each Öeldworker in week 10 compared to the 11 estimates for each Öeldworker in week 6 (equal to zero), stratifying on the Öeldworkers (to allow permutation of data only within Öeldworkers). To compare Beta10 to Beta9, we permute the 11 beta10 and the 11 beta9, stratifying on the Öeldworkers. These tests have exact Önite sample size.

36

Table 3: Di§erence-in-di§erence estimation with 2010 survey data (Reference period: Week 6; 150/100 treatment)

Hypothesis testing method: Week 7; 200/200 treatment ( 7 ) * Data 2007 Week 8; 200/200 treatment ( 8 ) * Data 2007 Week 9; 200/200 treatment ( 9 ) * Data 2007 Week 10; 150/100 treatment ( 10 ) * Data 2007 Week 11; 100/100 treatment ( 11 ) * Data 2007 Week 12; 100/100 treatment ( 12 ) * Data 2007  10   9 * Data 2007 (p-values)  11   10 * Data 2007 (p-values) Week 7; 200/200 treatment ( 7 ) Week 8; 200/200 treatment ( 8 ) Week 9; 200/200 treatment ( 9 ) Week 10; 150/100 treatment ( 10 ) Week 11; 100/100 treatment ( 11 ) Week 12; 100/100 treatment ( 12 ) Data 2007 Öxed e§ect Fieldworker Öxed e§ects Sublocation of interview Öxed e§ects Respondent controls Time trend squared, interacted with 2007 No surveys day, interacted with 2007 Observations R-squared

(1) Wild cluster bootstrap-t 0.345 (0.420) 0.612 (0.694) 0.811 (0.676) 2.049 (0.930)* 2.831 (1.177)* 3.399 (1.631)* 0.1 0.07 -0.404 (0.228)* -0.441 (0.604) -0.252 (0.553) -0.674 (0.759) -0.835 (0.998) -0.986 (1.309) Yes Yes Yes Yes Yes Yes 3302 0.37

(2) Ibragimov-Mueller (p-value) 0.16

(3) Permutation test (p-value) 0.27

0.25

0.58

0.06

0.06

0.05

0.01

0.04

0.01

0.05

0.03

0.01 0.01

0.01 0.06

OLS regressions. * signiÖcant at 10%; ** signiÖcant at 5%; *** signiÖcant at 1% according to the Wild cluster bootstrap-t method. Robust standard errors in parentheses, clustered at the Öeldworker level. The dependent variable is the rate of inconsistencies (number of inconsistencies in a survey divided by the total number of potential inconsistencies, multiplied by 100). The data is from 2007, and the exact same weeks in 2010. A ìData 2007 Öxed e§ectî, i.e. a dichotomous variable equal to 1 if the data was collected in 2007, is included. The wage dummies, time trend, time trend squared, number of surveys per day are interacted with the ìData 2007 Öxed e§ectî. The reference category is the 6th week where the wage was set at 150. Fieldworker Öxed e§ects are included. Respondentsí controls (sublocation Öxed e§ects, and relationship to household head) are included. Beta10-Beta9 reports the p-value of the Wild cluster bootstrap-t method comparing Beta10 to Beta9 (see appendix for greater details). Column (2) presents the Ibragimov-Mueller p-values. We compute the Öeldworker-by-Öeldworker parameter estimates treating the estimates as draws from independent (but not identically distributed) normal distributions, and report the p-value of a two sample t-test between the set of coe¢cients estimated in 2007 versus the set of coe¢cients estimated in 2010. To compare Beta10 to Beta9, we report the p-value of a two sample t-test of the 11 Beta10 compared to the 11 Beta9 in 2007. Column (3) reports the p-values for testing the null hypothesis that the wage of 150 Ksh in week 10 2007 has no e§ect on inconsistency rate relative to the same week in 2010, by constructing a permutation distribution of the 11 estimates for each Öeldworker in week 10 2007 compared to the 10 estimates for each Öeldworker in week 10 2010. To compare Beta10 to Beta9, we permute the 11 beta10 and the 11 beta9, stratifying on the Öeldworkers. These tests have exact Önite sample size.

Table 4: E§ect of wage changes on rate of blanks and mistakes (Reference period: Week 6; 150/100 treatment)

Hypothesis testing method: Week 1; 150/100 treatment ( 1 ) Week 2; 150/100 treatment ( 2 ) Week 3; 150/100 treatment ( 3 ) Week 4; 150/100 treatment ( 4 ) Week 5; 150/100 treatment ( 5 ) Week 7; 200/200 treatment ( 7 ) Week 8; 200/200 treatment ( 8 ) Week 9; 200/200 treatment ( 9 ) Week 10; 150/100 treatment ( 10 ) Week 11; 100/100 treatment ( 11 ) Week 12; 100/100 treatment ( 12 )  10   9 (p-values)  11   10 (p-values) Fieldworker Öxed e§ects Sublocation of interview Öxed e§ects Respondent controls Time trend, Time trend squared Number of surveys per day Observations R-squared

(1) Wild cluster bootstrap-t 0.879 (0.713) 0.104 (0.473) -0.019 (0.374) 0.143 (0.336) 0.377 (0.114)** -0.049 (0.237) 0.055 (0.364) -0.085 (0.361) -0.096 (0.620) -0.159 (0.686) -0.203 (0.865) 0.96 0.83 Yes Yes Yes Yes Yes 2864 0.08

(2) Ibragimov-Mueller (p-value) 0.35

(3) Permutation test (p-value) 0.36

0.62

0.58

0.98

0.99

0.54

0.58

0.15

0.15

0.49

0.50

0.92

0.91

0.92

0.92

0.90

0.91

0.89

0.90

0.80

0.81

0.90 0.90

0.91 0.89

OLS regressions. * signiÖcant at 10%; ** signiÖcant at 5%; *** signiÖcant at 1% according to the Wild cluster bootstrap-t method. The dependent variable is the rate of blanks and mistakes. Columns (2) and (3) presents the Ibragimov-Mueller and permutation tests similar to table2.

38

Table 5: E§ect of wage changes on di§erence between inconsistencies and blanks and mistakes (reference period: Week 6; 150/100 treatment) (1) Hypothesis testing method: Week 1; 150/100 treatment Week 2; 150/100 treatment Week 3; 150/100 treatment Week 4; 150/100 treatment Week 5; 150/100 treatment Week 7; 200/200 treatment Week 8; 200/200 treatment Week 9; 200/200 treatment Week 10; 150/100 treatment Week 11; 100/100 treatment Week 12; 100/100 treatment Beta10-Beta9 Beta11-Beta10 Fieldworker Öxed e§ects Sublocation of interview Öxed e§ects Respondent controls Time trend, Time trend squared Number of surveys per day Observations R-squared

Wild cluster bootstrap-t -1.086 (1.529) -0.458 (1.245) -0.319 (0.902) -0.454 (0.574) -0.465 (0.341) -0.009 (0.275) 0.116 (0.361) 0.645 (0.457) 1.471 (0.732)* 2.154 (0.833)** 2.615 (1.282)* .09 .04 Yes Yes Yes Yes Yes 2864 0.15

(2) Di§erence Ibragimov-Mueller (p-value) 0.96

(3) Permutation test (p-value) 0.97

0.75

0.73

0.77

0.75

0.72

0.73

0.77

0.76

0.47

0.49

0.43

0.44

0.16

0.16

0.05

0.04

0.03

0.03

0.06

0.06

0.05 0.03

0.06 0.04

OLS regressions. * signiÖcant at 10%; ** signiÖcant at 5%; *** signiÖcant at 1% according to the Wild cluster bootstrap-t method. The dependent variable is the di§erence between the rate of inconsistencies and the rate of blanks and mistakes. Columns (2) and (3) presents the Ibragimov-Mueller and permutation tests similar to table 2.

40 2864 0.15

-0.030 (0.332) 0.200 (0.333) 0.610 (0.369) 1.431 (0.520)** 2.047 (0.611)** 2.474 (0.953)** .01 .03 Yes Yes Yes Yes Yes Yes Yes Yes 2873 0.14

-0.108 (0.343) 0.106 (0.329) 0.488 (0.393) 1.247 (0.537)** 1.841 (0.632)** 2.215 (0.989)** .02 .03 Yes Yes

No respondent controls Yes

No number of surveys per day Yes

Yes Yes Yes Yes 2864 0.14

-0.077 (0.217) 0.127 (0.225) 0.460 (0.315) 1.219 (0.491)** 1.892 (0.645)** 2.130 (0.944)** .01 .06 Yes

No sublocation controls Yes

(3)

0.088 (0.288) 0.029 (0.395) 0.257 (0.503) 0.645 (0.549) 1.462 (0.680)** 2.076 (0.768)** 2.492 (1.111)** .01 .03 Yes Yes Yes Yes Yes Yes 2864 0.15

Other reference category Yes

(4)

-0.059 (0.356) 0.169 (0.344) 0.557 (0.392) 1.374 (0.543)** 1.988 (0.630)*** 2.404 (0.982)** .01 .03 Yes Yes Yes Yes Yes Yes 3012 0.14

With Örst two weeks Yes

(5)

Yes 2864 0.15

-0.002 (0.357) 0.219 (0.346) 0.522 (0.391) 1.169 (0.504)** 1.540 (0.592)** 1.603 (0.842)* .02 .23 Yes Yes Yes Yes

No time trend squared Yes

(6)

0.287 (0.293) 0.675 (0.443) 1.155 (0.615)* 2.283 (0.921)** 3.121 (1.087)** 3.449 (1.346)** 0 0.08 Yes Yes Yes Yes Yes Yes 2100 0.17

First three surveys

(7)

OLS regressions. * signiÖcant at 10%; ** signiÖcant at 5%; *** signiÖcant at 1% according to the Wild cluster bootstrap-t method. Robust standard errors in parentheses, clustered at the Öeldworker level. The dependent variable in all columns is the rate of inconsistencies. Fixed e§ects for previous weeks are included. Column (1) replicates column (3) of Table 2. In column (2), the respondentsí controls (relationship to household head) are excluded. In column (3), the sublocation Öxed e§ects are excluded. In column (4), the number of surveys per day is excluded. In column (5), the reference category is the 5th week where the wage was set at 150. In column (6), the initial two weeks of training are included. In column (7) the time trend squared is excluded.

 10   9 (p-values)  11   10 (p-values) Fieldworker Öxed e§ects Sublocation of interview Öxed e§ects Respondent controls Time trend Time trend squared Number of surveys per day Observations R-squared

Week 12; 100/100 treatment ( 12 )

Week 11; 100/100 treatment ( 11 )

Week 10; 150/100 treatment ( 10 )

Week 9; 200/200 treatment ( 9 )

Week 8; 200/200 treatment ( 8 )

Week 7; 200/200 treatment ( 7 )

Week 6; 150/100 treatment ( 6 )

Fixed e§ects for previous weeks

(2)

(1)

Table 6: Robustness checks (Reference period: Week 6; 150/100 treatment)

41

Yes Yes Yes Yes Yes Yes

(1) Third and fourth surveys in weeks 1-6 0.444 (0.284) 0.23 0.14 0.424 (0.285) 1.983 (.526)*** .04 0.01 0.01 Yes Yes Yes Yes Yes Yes

(2) All

OLS regressions. * signiÖcant at 10%; ** signiÖcant at 5%; *** signiÖcant at 1% according to the Wild cluster bootstrap-t method. Robust standard errors in parentheses, clustered at the Öeldworker level. The dependent variable in all columns is the rate of inconsistencies. In column (1), the sample is restricted to the third and fourth surveys of the Örst six weeks. The variable of interest is ìFourth surveyî, a dichotomous variable equal to 1 if the survey is the fourth one of the day, 0 otherwise. All other control variables of Table 2 are included. For the Ibragimov-Mueller test, we compute the Öeldworker-by-Öeldworker parameter estimates of ìFourth surveyî, and report the p-value of a one sample t-test. For the permutation test, we construct a permutation distribution of the 11 estimates for each Öeldworker for the fourth survey compared to the 11 estimates for each Öeldworker for the third survey (equal to zero), stratifying on the Öeldworkers (to allow permutation of data only within Öeldworkers). In column (2), the reference category are the Örst three surveys in weeks 1 to 6. We include ìFourth survey in Örst six weeksî, a dichotomous variable equal to 1 if the survey is the fourth of the day in the Örst six weeks, 0 otherwise. We replace the ìWeek 11; 100/100 treatmentî dichotomous variable by two dichotomous variables: ìFirst three surveys in week 11î, and ìFourth survey in week 11î. All other variables from Table 2 are included. The Wild cluster bootstrap-t p-value indicates the signiÖcance of the di§erence of these two coe¢cients. For the Ibragimov-Mueller test, we compute the Öeldworker-by-Öeldworker parameter estimates of ìFourth survey in Örst six weeksî and ìFirst three surveys in week 11î, and report the p-value of a two sample t-test. For the permutation test, we construct a permutation distribution of the 11 estimates for each Öeldworker for ìFourth survey in Örst six weeksî compared to the 11 estimates for each Öeldworker for ìFirst three surveys in week 11î, stratifying on the Öeldworkers (to allow permutation of data only within Öeldworkers).

Wild cluster bootstrap-t test of equality Ibragimov Mueller (p-value) Permutation test (p-value) Fieldworker Öxed e§ects Sublocation of interview Öxed e§ects Respondent controls Time trend Time trend squared Number of surveys per day

First three surveys per day in week 11

Ibragimov Mueller (p-value) Permutation test (p-value) Fourth survey per day in Örst six weeks

Sample Fourth survey per day

Table 7: Robustness checks (Reference period: Week 6; 150/100 treatment)

A Field Experiment

Temptation and Productivity: A Field Experiment with ...

Incentive Design on MOOC: A Field Experiment

Nudging Retirement Savings: A Field Experiment on Supplemental ...

Gender differences: evidence from field tournaments

Fredrik M. Sjoberg MAKING VOTERS COUNT: Evidence from Field ...

Nudging Retirement Savings: A Field Experiment on Supplemental ...

A Field Experiment on Labor Market Discrimination

Selection in a field experiment with voluntary participation

autumn spider, Metellina segmentata: a field experiment

A Field Experiment in Deprived Schools. - Student Social Support ...

Nudging Retirement Savings: A Field Experiment ... - Christelle Khalaf

1 Potential Collusion and Trust: Evidence from a Field ...

Evidence from Head Start

Evidence from Goa

Evidence from Ethiopia

A Laboratory Experiment

Lost in the Mail: A Field Experiment on Crime

Luck or Cheating? A Field Experiment on Honesty with ...

A field experiment on climatic and herbivore impacts on ...

Lost in the Mail: A Field Experiment on Crime

Registration Costs and Voter Turnout: Evidence from a ...

Is warrant really a derivative? Evidence from the ...

The miracle of microfinance? Evidence from a ... - Semantic Scholar