A Practitioner’s Guide to Randomization Inference∗ Serkan Aglasan North Carolina State University

Raymond Guiteras† North Carolina State University

Giordano Palloni IFPRI Preliminary and Incomplete March 2017 Abstract We describe tools for randomization inference in cluster-randomized trials, including the construction of confidence intervals, estimation with covariates, quantile regression and nonlinear estimators.

Keywords: randomization inference, cluster-randomized trials, quantile regression



We are grateful to our collaborators on the previous paper that lead to this work: Ariel BenYishay, Neil Buddy Shah, Stuart Shirrell, and Paul Wang. We received helpful comments from Thomas Braun, Ivan Canay, Doug Miller, Brian Quistorff, Dylan Small and Chris Udry. Thomas Braun and Dylan Small graciously shared Splus and R code from their papers. All code and data used in this paper is posted at http://go.ncsu.edu/rpguiter.ri. † Corresponding author. Mail: Department of Agricultural and Resource Economics; North Carolina State University; Raleigh, NC 27695-8109 USA; email: [email protected]; tel: 919-515-4542.

Introduction The use of randomization inference (RI) in economics is is increasingly popular. However, its use has largely been limited to estimating p-values for raw differences in means between treatment and controls.1 In this paper, we provide an exposition of the basic RI procedure as used in economics papers to date, and then present a broader set of tools extending RI in additional directions: construction of confidence intervals,2 adding covariates as controls, and estimators other than mean regression (quantile regression, nonlinear estimators, instrumental variables). Our exposition is based on Braun and Feng (2001), Rosenbaum (2002), Small et al. (2008), Rosenbaum (2010) and Imbens and Rubin (2015), to which we refer the interested reader for more detail. The intuition for randomization inference dates back to Fisher (1935): the researcher specifies a sharp null hypothesis, imposes this null hypothesis on the data, generates the distribution of the test statistic of interest under this null over all or many combinations of treatment assignments, and then compares the actual, observed value of the test statistic to its generated distribution to estimate how extreme the observed value is when the null is true (Rosenbaum 2010; Imbens and Rubin 2015). This procedure provides p-values directly when the null hypothesis is one of no effect. To obtain confidence intervals of level α, we test a series of null hypotheses (in our application, effects of −0.40, −0.39, . . . , +0.80) and construct the confidence interval as all values that are not rejected at the α level. For our application, we use a cluster-randomized trial of the effect of micro-loans on willingness-to-pay (WTP) for sanitation in rural Cambodia. The main experimental results are reported in Ben Yishay et al. (2017). 1 For example, Bloom et al. (2006), Cohen and Dupas (2010), Bloom et al. (2013), and Gertler et al. (2014). 2 Accurate confidence intervals are important for policymakers because policy decisions usually depend on the magnitude of an effect, not just whether an effect exists. However, use of randomization inference to construct confidence intervals is rare in economics. Applications in statistics and political science include Ho and Imai (2006), Small et al. (2008), Hansen and Bowers (2009). To our knowledge, the only economics papers that use randomization inference to construct confidence intervals are Barrios et al. (2012) and Quistorff (2015), both observational studies.

1

In Section 1, we summarize the setting and the experiment. We begin our exposition in Section 2, with the simplest case of randomization inference (RI) for mean effects. This simple case is useful for discussing the intuition for RI as well as describing the mechanics in some detail. In Section 3, we discuss how to add control variables when calculating mean effects. In Section 4, we describe RI for nonlinear estimators, using logit as a working example. In Section 5, we describe and implement RI for effects on quantiles of WTP. Finally, we describe the use of RI in the case of instrumental variables, in particular estimation of treatment effects with partial compliance. In each case, we compare our p-values and confidence intervals to those obtained from more standard cluster-robust variance matrices as well as bootstrap methods (Cameron and Miller 2015; Kline and Santos 2012; Hagemann 2016).

1

Background

1.1

Description of Experiment

The data in this paper were obtained from a cluster-randomized trial of microfinance for sanitation in rural Cambodia (Ben Yishay et al. 2017). The randomization was conducted at the village level, with 15 treatment villages (households could finance their latrine with a micro-loan) and 15 control villages (households were required to pay cash on delivery). The main outcome of interest was household willingness to pay (WTP). [To come.]

1.2

Summary Statistics

Table 1 presents baseline summary statistics and measures of balance for the sample. Column 1 presents means for the entire sample with standard deviations shown below in parentheses. Columns 2 and 3 present the means and standard deviations for Non-financing and Financing households, again with standard deviations shown in parentheses below each 2

mean. Column 4 presents the difference in means between the Non-financing and Financing groups with standard errors presented in brackets below each difference. Finally, Column 5 

 q

¯1 − X ¯ 0 / (s20 + s21 ) between the two means (Imbens presents the normalized difference X and Wooldridge 2009). 81% of respondents are female, and nearly 50% live in a household with a child five years old or younger (25% live in household with child two years old or younger). By design, just under 30% of households are IDPoor, and mean household monthly income is just over $120. Many households had been exposed to microfinance prior to the study: 41% had taken out a loan from a formal source in the previous year. Open defecation is extremely common among sample households: 70% of all individuals and 90% of children under the age of five primarily defecated in the open over the fifteen days preceding the survey. Despite this, households are clearly familiar with sanitation options: nearly 95% had previously considered purchasing a latrine in the past. While the Financing and Non-financing groups appear generally well balanced in terms of baseline characteristics, there are a few significant differences. Episodes of diarrhea over the week preceding the survey are significantly lower in the Financing group relative to the control group, both among all individuals and among children age five and younger. The BDM latrine offer price is, on average, $2 higher for Financing households than non financing households. To the extent that these baseline characteristics are predictive of successful latrine purchase through the BDM procedure, we would expect them to bias our results against finding any significant impact of Financing; that is, higher latrine offer prices should decrease latrine purchases in the Financing group relative to the control group. Similarly, the lower frequency of diarrhoeal episodes in Financing households may suggest they have smaller expected health returns to improved sanitation. Section 3 explores whether our main results are sensitive to these baseline differences.

3

2

Randomization Inference for Mean Differences

For simplicity, we will refer to Financing villages as treatment villages (Tv = 1) and NonFinancing villages (Tv = 0) as controls. Let yiv be the WTP of household i in village v. The estimated treatment effect βˆ = y 1 − y 0 is the difference in mean WTP between households in treatment villages (y 1 ) and households in control villages (y 0 ). We weight households equally but nothing substantial changes if we weight villages equally. For inference, we turn to a potential outcomes framework. Let the potential outcome when treated be yiv1 and the potential outcome when not treated be yiv0 , so the individual-specific treatment effect is βiv = yiv1 − yiv0 . There was no stratification in our randomization; see Bugni et al. (2016) for a discussion of the complications that can arise with stratification.

2.1

Testing

We first consider testing the null hypothesis of no treatment effect, i.e. H0 : βiv = 0 for all i and v.3 By specifying a “sharp” null hypothesis, we can write down potential outcomes in both the observed and counterfactual states under the null hypothesis. To illustrate, consider Table 2. The first three columns identify the village, household and village-level treatment status. The fourth column lists the observed outcome variable, yiv , for each household. The fifth and sixth columns list potential outcomes for each household in the treated state, yiv1 , and untreated state, yiv0 , respectively. For households in treatment villages (Tv = 1), the potential outcome corresponding to the treated state is equal to the observed outcome, while the potential outcome corresponding to the untreated state is unknown. That is, for all households in villages with Tv = 1, yiv1 = yiv , while yiv0 is unobserved. For households in 3

Note that this is stronger than the standard hypothesis of no average treatment effect, i.e., E [βiv ] = 0. Randomization inference provides exact p-values in finite sample for the stronger null, and provides consistent p-values for the weaker null of no average treatment effect (Lehmann and Romano 2005; Chung and Romano 2013).

4

control villages (Tv = 0), the situation is the reverse: yiv0 = yiv , while yiv1 is unobserved. Under the sharp null hypothesis of no treatment effect, we can fill in the unobserved values in Table 2: if the null hypothesis is true, then the potential outcome in the unobserved state is simply equal to the observed outcome, since under the null the treatment had no effect on anyone. In Table 3, we impose this null hypothesis, so yiv = yiv1 = yiv0 for all households. Next, we implement R = 100, 000 repetitions of the following placebo experiment.4 In each repetition r, we generate a random treatment assignment vector vr , in which we randomly assign 15 villages to Tvr = 1 and 15 villages to Tvr = 0, as in Table 4. We then compute the observed treatment effect for the repetition as βˆr = y r1 − y r0 , the difference in the mean WTP across the randomly assigned placebo treatment and control. n

We then collect the simulated test statistics βˆr

o

n

= βˆ1 , . . . , βˆR

o

over all repetitions.

This simulates the distribution of estimates we would expect to see if the null hypothesis were true, since it is true in our simulation. We then assess the plausibility of the null hypothesis by observing where the actual observed value βˆ falls in this distribution, i.e. the share of ˆ This gives us an empirical p-value for the null H0 : β0 = 0. Figure repetitions r with βˆr > β. n

o

1 shows the distribution of βˆr from our data, as well as the observed βˆ = 21.9. Clearly, the observed βˆ is highly unlikely under the null hypothesis of no treatment effect – in fact, none of the 100,000 placebo experiments produced a βˆr as large as 21.9.

2.2

Confidence Intervals

To construct confidence intervals, we test the observed treatment effect against a set of sharp null hypotheses {H0m : β0 = m}. A 1 − α confidence interval is the set of hypotheses that are not rejected at the 1 − α level – that is, the set of hypotheses that produce p-values greater than α. For each value m of the null hypothesis, we can generate potential outcomes in the treated  30 ≈ 1.551 · 108 possible assignment 15 vectors. With R random draws, the standard error of the estimated p-value, se (ˆ p), is less than or equal to √ 1/2 R (Imbens and Rubin 2015), With R = 100, 000, se (ˆ p) ≤ 0.0016, which is tolerable. 4



Computing true exact p-values would require enumerating all

5

and untreated state under the null β0 = m – that is, we can again fill in the missing data in Table 2 for the counterfactual state. Consider first potential outcomes when treated, yiv1 . In treatment villages, these are equal to the observed outcome, i.e., yiv1 = yiv if Tv = 1. In control villages, these are equal to the observed outcome plus the value of the null, i.e., yiv1 = yiv + m if Tv = 0. Potential outcomes when not treated, yv0 , are the opposite of the above: in treatment villages, the potential outcome is the observed outcome minus the value of the null (yiv0 = yiv − m if Tv = 1); in control villages, the potential outcome is the observed outcome (yiv0 = yiv if Tv = 0). Table 5 illustrates for the null hypothesis β0 = +10. Having imposed the null hypothesis β0 = m by filling in outcomes in the counterfactual state, we obtain a p-value for this null hypothesis, p (m), using the same procedure as in Section 2.1 above. Again, we implement R placebo experiments. In each repetition r, we randomly assign placebo treatments and calculate the value of the test statistic, here the r , simple difference in means βˆr = y¯1r − y¯0r , where the “observed” outcome for each unit, yiv

corresponds to the potential outcome for the placebo, rather than the actual, assignment. n

o

(See Table 6 for an illustration.) We collect these simulated test statistics βˆr and compare our observed test statistic βˆ to this distribution. Again, the intuition is that we are assessing how surprising or unusual our observed test statistic βˆ would be if the null hypothesis is n

o

true, since by construction it is true for the simulated βˆr . n

o

Figure 2 shows the distribution of βˆr under the null hypothesis β0 = +10 and shows the place of the observed βˆ in that distribution. The p-value, i.e. the share of βˆr greater ˆ is 0.010. than β, Finally, having tested a set of null hypotheses {H0m : β0 = m} as above and obtained the corresponding p-values {p (m)}, we construct a 1 − α confidence interval by retaining those null hypotheses that are not rejected at the 1 − α level. That is, the confidence interval is {m : p (m) > α}. Figure 3 illustrates using our data. The figure plots p-values for each null m = 0, 0.1, . . . , 39.9, 40. The bounds of the 95% confidence interval (15.3, 28.6) are found at the intersection of this curve with the horizontal line at α = 0.05.

6

2.3

Computation: Adjusted Response

In the previous two subsections, we have explicitly written down both vectors of potential outcomes under the null hypothesis and created our simulated results using these two vectors, y1 and y0 . This is useful for transparency and developing intuition. However, in this subsection, we discuss a slightly different procedure that, while somewhat less intuitive, is computationally more convenient, generalizes more readily to, e.g., including covariates or quantile estimation, and in fact produces identical results. First, we introduce some notation. Let y be the observed outcome data and T be the vector of actual treatment assignments. The null hypothesis is H0 : β = β0 . For subject i in village v, define the adjusted response as Aβiv0 = yiv − β0 Tv , i.e., the potential outcome in the untreated state under the null. Note that this is equal to yiv0 in all the examples above, but the notation makes explicit the dependence on the null hypothesis β0 . Let Aβ0 = y − β0 T be 



the vector of adjusted responses for the study population, i.e. Aβ0 = Aβiv0 . If β0 = 0 (i.e., testing the null of no effect), then Aβ0 = y. 



Let t Aβ0 , Z be our test statistic, which is a function of the adjusted responses Aβ0 and the assignment vector Z, which may be the actual treatment assignments T or a placebo draw T r . In our simple case this is the difference in means between treated and control. Let 

tˆβ0 = t Aβ0 , T



be the observed value of the test statistic using the actual assignment T .

Note that we are using only the adjusted responses, i.e., potential outcomes in the untreated state. If the null hypothesis is correct, then tˆβ0 should be small. In other words, if the null hypothesis is true, then there should be no systematic difference in adjusted responses between units that were actually treated and the controls – by adjusting the data we have pulled out any effect of treatment. Randomization inference allows us to assess how “small” an observed tˆβ0 is by comparing it to its randomization distribution. Operationally, we generate R placebo assignments {T r } and for each placebo assignment 



T r we compute the value of the test statistic tˆrβ0 = t Aβ0 , T r . This yields the randomization n

o

distribution tˆrβ0 , and we compare the observed tˆβ0 to this distribution to obtain a p-value 7

for β0 . Since by construction there is no systematic relationship between adjusted response n

and (placebo) treatment in the simulated experiments, the place of the observed tˆβ0 in tˆrβ0

o

tells us how surprising tˆβ0 would be if the null hypothesis were true. As above, we construct 1−α confidence intervals by testing a series of null hypotheses and retaining those with p-values greater than α. Finally, the value of the null hypothesis that minimizes the test statistic is called the Hodges-Lehmann point estimate, which we denote βˆHL = arg minβ {t (Aβ , T )}. In the case of differences in means, as well as other method of moments-type estimators, the Hodges-Lehmann point estimate will typically be equal to the ˆ up to numerical precision. While this is not necessarily the case usual estimator (here, β) for other classes of estimators (e.g., quantiles as in Section 5 below), we have found, at least in our application, the differences to be small.

2.4

Other Computational Issues

In this subsection, we collect several points that can affect precision and computation time.

Choice of test statistic The choice of test statistic can be important. In the current example, a number of test statistics are available, including: the difference in means; difference in mean ranks; studentized difference in means (i.e. the t-statistic). In our current context, the difference in means is the most intuitive test statistic, especially given its link to regression. Imbens and Rubin (2015) provide simulations showing that the difference in mean ranks between treatment and control is more robust to outliers (see especially Chapter 5.6). Finally, Chung and Romano (2013) emphasize that studentizing the test statistic (e.g., using the t-statistic rather than the raw difference in means) is useful when testing non-sharp hypotheses, i.e., average treatment effects rather than uniform treatment effects. In our case, we found results to be very similar whether we used the simple difference in means, the difference in mean ranks, or a studentized difference in means. For simplicity, our exposition and the results presented 8

here primarily use differences in means, but our posted code includes analysis using the other variables.

Reducing computation time Randomization inference can be computationally demanding, and a few tricks are available that can reduce computation time. First, matrix operations are typically faster than loops, so thinking of ways to “vectorize” code can increase speed dramatically. For example, we cut our computation time by more than a factor of 10 by computing a large set of regressions using a few matrix operations in Mata rather than looping over Stata commands. Second, when constructing confidence intervals it is typically not necessary to test every null hypothesis with maximal precision. For example, suppose that the upper and lower bounds for the confidence interval are known to be somewhere in the interval [0, 40]. It is not necessary to conduct 100,000 repetitions to obtain a precisely estimated p-value at every H0 : 0, 0.1, . . . , 39.9, 40. Instead, the researcher could first test on a rougher grid, e.g. 0, 1, . . . , 39, 40, in addition possibly using fewer repetitions at each point, and then examine a finer grid with the full 100,00 repetitions only in a neighborhood of points with p-values close to the critical value for the 1 − α confidence interval.5 Similarly, the researcher might use results from cluster-robust regression to obtain an inital neighborhood over which to search. So, if cluster-robust regression gives an upper bound of UBCRVE = βˆ + 1.96 ∗ sˆCRVE , then the researcher might initially search in the neighborhood UBCRVE ± sˆCRVE , and similarly for the lower bound. Finally, in some cases it is not necessary to explicitly recompute the test statistic at every possible value of the null hypothesis. For example, holding the treatment assignment fixed, the difference in means is a linear function of the null hypothesis, so it is only necessary to compute the test statistic twice for each of the R random assignments, e.g. at H0 = 0 and 1, and then the value of the test statistic can be immediately extrapolated for any other null 5

A more sophisticated approach would be to use a root-finding algorithm, e.g., bisection.

9

hypothesis. In our code, because computation times are reasonable (usually under 2 hours on a standard laptop), we have tended to use brute force, but in other contexts the ability to speed up computation may be more important.

3

Randomization Inference with Covariates

In this subsection, we discuss adding covariates, again in the simple context of testing a uniform effect using the difference in mean WTP across treatments. Our discussion is based on Rosenbaum (2002) and Small et al. (2008). The intuition is familiar from the FrischWaugh theorem – first, we partial out the association of the outcome variables with the covariates of interest, then we use the residuals from this partialling out in essentially the same procedure as in the previous subsection. Specifically, suppose our null hypothesis is H0 : β = β0 , our adjusted responses are Aβ0 = y − β0 T , and we wish to control for a set of covariates X. Let PX = X (X 0 X)−1 X 0 and MX = I − PX be the usual least-squares projection and residualizer matrices, and let e = MX Aβ0 be the residuals from the least-squares projection of the adjusted responses Aβ0 on the covariates X.6 We now proceed exactly as in 2.3 but using e as our data instead of Aβ0 . That is, we obtain the observed value of the test statistic tˆβ0 = t (e, T ) and compare this to the n

randomization distribution tˆrβ0

o

= {t (e, T r )}. This provides a p-value for H0 : β = β0 ,

and we can obtain confidence intervals and the Hodges-Lehmann point estimate through the same inversion method as in 2.3. If the test statistic of interest is computed via regression (i.e., difference in means or the t-statistic testing the equality of means), then maintaining an exact analogy with the FrischWaugh theorem requires partialling out the association of the treatment with the covariates. 6

Rosenbaum (2002) and Small et al. (2008) discuss other means of obtaining residuals, such as a Lowess smoother, m-estimates, GLM and projection pursuit. We use least-squares for simplicity.

10

That is, in addition to using the residuals e = MX Aβ0 on the left-hand side of the regression, we use T˜ = MX T on the right-hand side of the regression.7 Of course, this is not sensible if the test statistic of interest is the difference in mean ranks, since T˜ is no longer a binary variable. Also, it is important to note that we compute T˜ only once, using the vector of actual treatment assignments T , and take each placebo draw T r from this distribution. That is, we do not generate a placebo draw T r and then partial out X from T r .

4

Nonlinear Estimators

In this subsection, we describe how to adapt the RI methods described above to nonlinear estimators. In Ben Yishay et al. (2017), our interest was in the share of households purchasing at a given price. In that paper, we collapsed our data to village-level shares of households purchasing, and then proceeded with RI for mean differences as in Section 2 above. We provide details on these procedures in Appendix A below. However, to take advantage of household-level covariates, it is natural to use a binary choice estimator such as logit or probit. In our example, we use logit, but nothing substantial changes if the researcher wishes to use probit instead. It is important to acknowledge that using a model for outcomes, such as logit in this case, rather than a pure difference in observed means, comes at a cost. The procedure we describe here is not pure randomization inference, which, strictly speaking, may use only the null hypothesis and experimental randomization, without any assuption made about the data-generating process. See Small et al. (2008) for further discussion. The tools we use were developed in Braun and Feng (2001) for generalized linear mixed models (GLMM). We use their quasi-score statistic, which applies to linear exponential families (e.g., logit, probit, Poisson). Braun and Feng (2001) also provide a more general methodology that is not limited to the linear exponential family but is more computationally 7

This is the approach of Imbens and Rubin (2015); see especially their equation 5.12 and the preceding discussion.

11

demanding. See their paper, especially Section 2.2.2, for details. [To come.]

5

Quantiles of WTP

In this subsection, we discuss randomization inference for estimates of the effect of finance on quantiles of WTP, following Small et al. (2008). First, the theory requires that the treatment be rank-preserving: for any two units i and j, if yi0 > yj0 , then, on average, yi1 > yj1 . That is, quantile treatment effects can be increasing, constant, or even decreasing, but cannot be so sharply decreasing that quantile expectation functions cross.8 We can verify this condition by examining quantile plots of WTP among treated (financing) and untreated (no financing) subjects, as in Figure 4a, as well as plotting the estimated quantile treatment effects for deciles τ = 0.1, 0.2, . . . , 0.9, as in Figure 4b. Quantile treatment effects appear to be increasing, although perhaps not monotonically increasing. Small et al. (2008) show that, for rank-preserving treatments, we can test the hypothesis H0 : β (τ ) = β0 as follows. Let Aβ0 = y −β0 T be the adjusted responses given the hypothesis β (τ ) = β0 . Let ρ be the τ -quantile of Aβ0 , i.e., ρ : FˆAβ0 (ρ) = τ , where FˆAβ0 is the empirical CDF of the adjusted responses Aβ0 . Let qvβ0 be the share of subjects in village v with adjusted response greater than or equal to ρ. Given an assignment vector Z, let the statistic H = q βv 0 (Z = 1) − q βv 0 (Z = 0) be the mean difference in shares qvβ0 between villages with ˆ as the observed value of the test statistic for the true Z = 1 and Z = 0.9 We denote H assignment vector T , and H r for an arbitrary placebo assignment vector T r . 8

Small et al. (2008) assume a “dilated effect,” i.e a nondecreasing quantile treatment effect, but this is stronger than necessary to preserve ranks. All that is required is that the quantile treatment effect is not so strongly decreasing that units at lower quantiles in the untreated state systematically overtake higherquantile units when both are treated. (This is the “rank similarity” condition of Chernozhukov and Hansen (2005).) Note that the key condition of Small et al. (2008) on page 276, sign {Rski − (1 − Zski ) ∆ρ − ρ} = sign {rT ski − ρ} using their notation or sign {yiv − Tv β (τ ) − ρ} = sign {yiv0 − ρ} using the notation developed below (we abstract from stratification, thus the absence of the strata index s), holds even under the weaker assumption of rank preservation. See Appendix B for a proof. 9 In our empirical analysis, we weight these village-level shares by the number of households, so that each household receives equal weight, but the results are very similar if we weight villages equally.

12

Given this setup, randomization inference is conducted exactly as in the previous secˆ to the randomization tions. To obtain a p-value for H0 : β (τ ) = β0 , we compare H distribution {H r }. Testing H0 : β (τ ) = 0, i.e., the hypothesis of no effect at the τ quantile, consists of obtaining a p-value with β0 = 0. To construct a 1 − α confidence interval for the effect at the τ quantile, we test a series of non-zero nulls (in our application, β0 = −20, −19.9, . . . , +59.9, +60) and retain those with p-values greater than α/2. Finally, we can obtain a Hodges-Lehmann point estimate βˆHL (τ ) as the value of β that produces the minimum H-statistic, since, when the null hypothesis is true, H should be zero up to randomness. This minimizing value of β may not be unique, in particular if the grid of β values is very fine, in which case we define βˆHL (τ ) as the midpoint of the set of minimizing values. Figure 5 shows how we obtain a 95% confidence interval for the τ = 0.6 quantile. Our quantile treatment effects estimates are plotted in Figure 6. The point estimates from quantile regression, plotted as the solid line, are positive at all deciles and are generally increasing. The long-dashed lines (labeled “RI”) show the pointwise 95% confidence interval from randomization inference, as described above. For comparison, we also plot a 95% confidence interval using the wild gradient bootstrap (short-dashed lines, labeled “WGB”) method of Hagemann (2016). Reassuringly, the results are generally similar between the two methods.

6

Partial Compliance and Instrumental Variables

[To come.]

7

Discussion

[To come.]

13

14

References Thomas Barrios, Rebecca Diamond, Guido W. Imbens, and Michal Kolesár. Clustering, Spatial Correlations, and Randomization Inference. Journal of the American Statistical Association, 107(498):578–591, June 2012. doi: 10.1080/01621459.2012.682524. Ariel Ben Yishay, Andrew Fraker, Raymond Guiteras, Giordano Palloni, Neil Buddy Shah, Stuart Shirrell, and Paul Wang. Microcredit and willingness to pay for environmental quality: Evidence from a randomized-controlled trial of finance for sanitation in rural Cambodia. Journal of Environmental Economics and Management, 2017. doi: 10.1016/j. jeem.2016.11.004. Erik Bloom, Indu Bhushan, David Clingingsmith, Rathavuth Hong, Elizabeth King, Michael Kremer, Benjamin Loevinsohn, and J. Brad Schwartz. Contracting for health: Evidence from Cambodia. 2006. N. Bloom, B. Eifert, A. Mahajan, D. McKenzie, and J. Roberts. Does Management Matter? Evidence from India. The Quarterly Journal of Economics, 128(1):1–51, February 2013. doi: 10.1093/qje/qjs044. Thomas M Braun and Ziding Feng. Optimal Permutation Tests for the Analysis of Group Randomized Trials. Journal of the American Statistical Association, 96(456):1424–1432, December 2001. doi: 10.1198/016214501753382336. Federico Bugni, Ivan Canay, and Azeem Shaikh. Inference under covariate-adaptive randomization. cemmap Working Paper CWP21/16, May 2016. A. Colin Cameron and Douglas L. Miller. A Practitioner’s Guide to Cluster-Robust Inference. Journal of Human Resources, 50(2):317–372, March 2015. doi: 10.3368/jhr.50.2.317. Victor Chernozhukov and Christian Hansen. An IV Model of Quantile Treatment Effects. Econometrica, 73(1):245–261, 2005. EunYi Chung and Joseph P. Romano. Exact and asymptotically robust permutation tests. The Annals of Statistics, 41(2):484–507, April 2013. doi: 10.1214/13-AOS1090. Jessica Cohen and Pascaline Dupas. Free Distribution or Cost-Sharing? Evidence from a Randomized Malaria Prevention Experiment. Quarterly Journal of Economics, 125(1): 1–45, February 2010. doi: 10.1162/qjec.2010.125.1.1. Ronald A. Fisher. Design of Experiments. Hafner, New York, 1935. Paul J. Gertler, James Heckman, Rodrigo Pinto, Arianna Zanolini, Christel Vermeersch, Susan Walker, Susan M. Chang, and Sally Grantham-McGregor. Labor market returns to an early childhood stimulation intervention in Jamaica. Science, 344(6187):998–1001, May 2014. doi: 10.1126/science.1251178. Andreas Hagemann. Cluster-Robust Bootstrap Inference in Quantile Regression Models. Journal of the American Statistical Association, pages 1–30, February 2016. doi: 10.1080/ 01621459.2016.1148610. 15

Ben B. Hansen and Jake Bowers. Attributing Effects to a Cluster-Randomized Get-Outthe-Vote Campaign. Journal of the American Statistical Association, 104(487):873–885, September 2009. doi: 10.1198/jasa.2009.ap06589. Daniel E. Ho and Kosuke Imai. Randomization Inference With Natural Experiments: An Analysis of Ballot Effects in the 2003 California Recall Election. Journal of the American Statistical Association, 101(475):888–900, September 2006. doi: 10.1198/ 016214505000001258. Guido W. Imbens and Donald B. Rubin. Causal Inference in Statistics, Social, and Biomedical Sciences An Introduction. Cambridge University Press, 2015. Guido W. Imbens and Jeffrey M. Wooldridge. Recent Developments in the Econometrics of Program Evaluation. Journal of Economic Literature, 47(1):5–86, February 2009. doi: 10.1257/jel.47.1.5. Patrick Kline and Andres Santos. A Score Based Approach to Wild Bootstrap Inference. Journal of Econometric Methods, 1(1):23–41, 2012. doi: 10.1515/2156-6674.1006. E.L. Lehmann and Joseph P. Romano. Testing Statistical Hypotheses. Springer Texts in Statistics. Springer New York, New York, NY, 2005. Brian Quistorff. Capitalitis? Effects of the 1960 Brazilian Capital Relocation. University of Maryland Working Paper, March 2015. doi: 10.2139/ssrn.2588620. Paul R. Rosenbaum. Covariance Adjustment in Randomized Experiments and Observational Studies. Statistical Science, 17(3):286–327, August 2002. doi: 10.1214/ss/1042727942. Paul R. Rosenbaum. Design of Observational Studies. Springer Series in Statistics. Springer New York, New York, NY, 2010. Dylan S. Small, Thomas R. Ten Have, and Paul R. Rosenbaum. Randomization Inference in a Group–Randomized Trial of Treatments for Depression. Journal of the American Statistical Association, 103(481):271–279, March 2008. doi: 10.1198/016214507000000897.

16

Figure 1: Testing H0 : β = 0

Randomization distribution of difference in means

0

.02

Density .04

.06

.08

H0 : β = 0; Observed diff. in means: 21.9; p−value: 0.000

−20

−10

0 Difference in means

10

20

kernel = epanechnikov, bandwidth = 0.4681

Notes: This figure illustrates the method of testing the null hypothesis n ofono effect using randomization inference. The graph is a kernel density plot of 100,000 simulated test statistics βˆr as in Table 4. The horizontal line plots the observed test statistic (difference in means) of 21.9. The RI p-value, 0.000, is the share of βˆr greater than or ˆ equal to the observed β.

17

Figure 2: Testing H0 : β = 10

Randomization distribution of difference in means

0

.02

Density .04

.06

.08

H0 : β = 10; Observed diff. in means: 21.9; p−value: 0.010

−10

0

10 Difference in means

20

30

kernel = epanechnikov, bandwidth = 0.4681

Notes: This figure illustrates the method of testing the hypothesisnthat o β = 10 using randomization inference. The r ˆ graph is a kernel density plot of 100,000 simulated test statistics β as in Table 6. The horizontal line plots the observed test statistic (difference in means) of 21.9. The RI p-value, 0.010, is the share of βˆr greater than or equal ˆ to the observed β.

18

Figure 3: Confidence Interval from Randomization Inference Effect of Financing on WTP 1.0 0.9 0.8 0.7 p−value

0.6 0.5 0.4 0.3 0.2 0.1 βest

0.0 0

10

20 β0: effect on WTP (USD, NPV)

30

40

Notes: This figure illustrates the method of constructing a randomization inference confidence interval for the effect of finance on willingness to pay (WTP). The vertical line indicates the point estimate, 21.9. The horizontal line at 0.05 indicates the cutoff p-value for rejecting the hypothesis β = β0 with 95% confidence. The dashed line plots, for each value of the sharp null hypothesis β0 = 0, 0.1, . . . , 39.9, 40, the randomization inference p-value for the test β = β0 . The 95% confidence interval (15.3, 28.6) consists of the set of β0 that are not rejected, i.e. all β0 for which the dashed line is at or above the horizontal line.

19

Figure 4: Quantiles of WTP and Quantile Treatment Effects (a) Quantiles of WTP by Treatment Arm

WTP (USD, NPV)

80

60

40

20

0.1

0.2

0.3

0.4

0.5 0.6 Quantile

Financing

0.7

0.8

0.9

No financing

(b) Quantile Treatment Effects on WTP

Effect on WTP (USD, NPV)

40

30

20

10

0 0.1

0.2

0.3

0.4

0.5 0.6 Quantile

20

0.7

0.8

0.9

Figure 5: Quantile regression confidence interval by randomization inference τ = 0.6 1.0 0.9 0.8

p−value

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0

10

20 30 40 β0: effect on WTP (USD, NPV)

50

60

Notes: This graph plots randomization inference (RI) p-values for a series of null hypotheses β0 . The solid vertical line at 25.0 indicates the Hodges-Lehmann point estimate βˆHL . The thin vertical lines at 13.3 and 28.3 indicate the 95% confidence interval from randomization inference: β0 inside the 95% CI have RI p-values ≥ 0.050, and so are not rejected.

21

Figure 6: Impacts on Quantiles of WTP

Effect on WTP (USD, NPV)

50

40

30

20

10

0 0.1

0.2 Estimate

0.3

0.4

0.5 0.6 Quantile

95% CIs:

RI

0.7

0.8

0.9 WGB

Notes: This graph plots quantile treatment effects, by decile, with 95% confidence intervals from randomization inference (long dashes, “RI”) and the wild gradient bootstrap (short dashes, “WGB”). Randomization inference is conducted using the procedure described in Section 5. Households are weighted equally; i.e., the village-level shares used in the H test statistic are weighted by the number of households.

22

Table 1: Summary Statistics and Balance

Female respondent Household Size Number of women in household Any children under age five Any children under age two # members who earn income Total monthly household income (USD) Household owns livestock Household grows crops Any formal loan in past year Any informal loan in past year Any current formal savings Any current informal savings ID poor household Likelihood <2 USD a day Primarily defecate in the open # of diarrhoeal episodes Children <=5 defecate in the open Children <=5 # of diarrhoeal episodes Has considered latrine purchase Group sales meeting Latrine offer price (USD)

All households (1)

Non financing (2)

Financing (3)

Diff. (4)

Norm. diff. (5)

0.811 (0.391) 4.382 (1.781) 2.243 (1.139) 0.453 (0.498) 0.252 (0.434) 1.693 (1.165) 122.815 (431.070) 0.825 (0.380) 0.833 (0.373) 0.414 (0.493) 0.655 (0.475) 0.017 (0.131) 0.841 (0.366) 0.275 (0.447) 25.922 (21.698) 0.703 (0.457) 0.232 (0.285) 0.904 (0.295) 0.390 (0.465) 0.946 (0.226) 0.769 (0.422) 35.764 (10.062)

0.806 (0.396) 4.294 (1.783) 2.178 (1.125) 0.473 (0.500) 0.253 (0.435) 1.781 (1.229) 134.800 (560.843) 0.826 (0.380) 0.865 (0.342) 0.405 (0.491) 0.643 (0.480) 0.019 (0.137) 0.795 (0.404) 0.273 (0.446) 24.833 (21.373) 0.688 (0.464) 0.274 (0.296) 0.894 (0.309) 0.472 (0.476) 0.942 (0.234) 0.787 (0.410) 34.734 (11.012)

0.816 (0.388) 4.467 (1.776) 2.305 (1.149) 0.433 (0.496) 0.251 (0.434) 1.607 (1.093) 111.029 (243.357) 0.825 (0.380) 0.806 (0.396) 0.423 (0.494) 0.667 (0.472) 0.015 (0.123) 0.886 (0.318) 0.277 (0.448) 26.970 (21.971) 0.718 (0.450) 0.201 (0.273) 0.914 (0.280) 0.323 (0.446) 0.950 (0.218) 0.753 (0.431) 36.752 (8.956)

0.010 [0.027] 0.173 [0.170] 0.126 [0.090] -0.040 [0.028] -0.002 [0.028] -0.175 [0.130] -23.772 [37.409] -0.001 [0.040] -0.058 [0.054] 0.018 [0.051] 0.024 [0.032] -0.004 [0.009] 0.091 [0.069] 0.004 [0.031] 2.138 [1.806] 0.030 [0.074] -0.072*** [0.020] 0.021 [0.035] -0.149*** [0.039] 0.008 [0.014] -0.034 [0.037] 2.018*** [0.617]

0.018 0.069 0.079 -0.056 -0.003 -0.106 -0.039 -0.002 -0.111 0.026 0.036 -0.021 0.178 0.006 0.070 0.046 -0.180 0.050 -0.229 0.025 -0.057 0.142

Notes: Table displays summary statistics for the whole sample (Column 1) and by treatment arm (Columns 2 and 3). Column 4 displays the difference between the mean in the Non-financing and Financing arms while Column 5  p ¯1 − X ¯ 0 / (s2 + s2 ) . Standard deviations appear in displays the normalized difference between the two means X 0 1 parentheses while standard errors appear in brackets. *** p<=0.01, ** p<=0.05, * p<=0.10. Child age cutoffs are inclusive of the cutoff age. The number of individuals who contribute income and total household income include non-resident members. The likelihood that the household lives on less than $2 per day is defined using the 2011 Progress out of Poverty Index (PPI). For variables in the PPI that were not included in the baseline survey we impute the mean value from the 2008 census in Cambodia for rural households in Kampong Thom province.

23

Table 2: Randomization Inference for No Effect on WTP – Actual Treatment Assignment, Observed Outcome and Potential Outcomes Village 1 1 .. .

Household Tv 1-01 1 1-02 1 .. .

yiv 26 17

yiv1 26 17

yiv0 ? ? .. .

1 2 2 .. .

1-50 2-01 2-02

1 1 1 .. .

19 14 16

19 14 16

? ? ? .. .

2 .. .

2-50

1 .. .

22

22

?

29 29 .. .

29-01 29-02

0 0 .. .

10 22

? ?

10 22

29 30 30 .. .

29-50 30-01 30-02

0 0 0 .. .

11 12 21

? ? ?

11 12 21

30

30-50

0

6

?

6

Notes: we only only observe the potential outcome corresponding to the actual treatment status. The potential outcome corresponding to the other, counterfactual treatment status is unobserved and unknown.

24

Table 3: Randomization Inference – Null Hypothesis β0 = 0 Actual Treatment Assignment, Observed Outcome and Potential Outcomes Village 1 1 .. .

Household Tv 1-01 1 1-02 1 .. .

yiv 26 17

yiv1 26 17

yiv0 26 17 .. .

1 2 2 .. .

1-50 2-01 2-02

1 1 1 .. .

19 14 16

19 14 16

19 14 16 .. .

2 .. .

2-50

1 .. .

22

22

22

29 29 .. .

29-01 29-02

0 0 .. .

10 22

10 22

10 22

29 30 30 .. .

29-50 30-01 30-02

0 0 0 .. .

11 12 21

11 12 21

11 12 21

30

30-50

0

6

6

6

Notes: under the sharp null hypothesis of no treatment effect, we can specify the potential outcome for each unit in both the actual and counterfactual treatment state.

25

Table 4: Randomization Inference – Null Hypothesis β0 = 0; Placebo Experiment Actual Treatment Assignment, Observed Outcome, Potential Outcomes and “Observed” Outcomes with Placebo Randomized Treatment Village 1 1 .. .

Household Tv 1-01 1 1-02 1 .. .

yiv 26 17

yiv1 26 17

yiv0 26 17 .. .

Tvr 1 1 .. .

r yiv yiv1 = 26 yiv1 = 17

1 2 2 .. .

1-50 2-01 2-02

1 1 1 .. .

19 14 16

19 14 16

19 14 16 .. .

0 0 0 .. .

yiv1 = 19 yiv0 = 14 yiv0 = 16

2 .. .

2-50

1 .. .

22

22

22

0 .. .

yiv0 = 22

29 29 .. .

29-01 29-02

0 0 .. .

10 22

10 22

10 22

1 1 .. .

yiv1 = 10 yiv1 = 22

29 30 30 .. .

29-50 30-01 30-02

0 0 0 .. .

11 12 21

11 12 21

11 12 21

1 0 0 .. .

yiv1 = 11 yiv0 = 12 yiv0 = 21

30

30-50

0

6

6

6

0

yiv0 = 6

Notes: in each repetition r, we generate placebo random assignments and assign each unit the potential outcome corresponding to its placebo assignment. Then we calculate the observed treatment effect, βˆr = y r1 − y r0 . This is repeated R times, simulating the distribution of estimates we would expect to see when the null hypothesis is true.

26

Table 5: Randomization Inference – Null Hypothesis β0 = +10 Actual Treatment Assignment, Observed Outcome and Potential Outcomes Village 1 1 .. .

Household Tv 1-01 1 1-02 1 .. .

yiv 26 17

yiv1 26 17

yiv0 26 − 10 = 16 17 − 10 = 7 .. .

1 2 2 .. .

1-50 2-01 2-02

1 1 1 .. .

19 14 16

19 14 16

19 − 10 = 9 14 − 10 = 4 16 − 10 = 6 .. .

2 .. .

2-50

1 .. .

22

22

22 − 10 = 12

29 29 .. .

29-01 29-02

0 0 .. .

10 22

10 + 10 = 20 22 + 10 = 32

10 22

29 30 30 .. .

29-50 30-01 30-02

0 0 0 .. .

11 12 21

11 + 10 = 21 12 + 10 = 22 21 + 10 = 31

11 12 21

30

30-50

0

6

6 + 10 = 16

6

Notes: under a sharp null hypothesis (here, β0 = +10), we can specify the potential outcome for each unit in both the actual and counterfactual treatment state.

27

Table 6: Randomization Inference – Null Hypothesis β0 = +10; Placebo Experiment Actual Treatment Assignment, Observed Outcome, Potential Outcomes and “Observed” Outcomes with Placebo Randomized Treatment Village 1 1 .. .

Household Tv 1-01 1 1-02 1 .. .

yiv 26 17

yiv1 26 17

yiv0 26 − 10 = 16 17 − 10 = 7 .. .

Tvr 1 1 .. .

r yiv yiv1 = 26 yiv1 = 17

1 2 2 .. .

1-50 2-01 2-02

1 1 1 .. .

19 14 16

19 14 16

19 − 10 = 9 14 − 10 = 4 16 − 10 = 6 .. .

0 0 0 .. .

yiv1 = 19 yiv0 = 4 yiv0 = 6

2 .. .

2-50

1 .. .

22

22

22 − 10 = 12

0 .. .

yiv0 = 12

29 29 .. .

29-01 29-02

0 0 .. .

10 22

10 + 10 = 20 22 + 10 = 32

10 22

1 1 .. .

yiv1 = 20 yiv1 = 32

29 30 30 .. .

29-50 30-01 30-02

0 0 0 .. .

11 12 21

11 + 10 = 21 12 + 10 = 22 21 + 10 = 31

11 12 21

1 0 0 .. .

yiv1 = 21 yiv0 = 12 yiv0 = 21

30

30-50

0

6

6 + 10 = 16

6

0

yiv0 = 6

Notes: in each repetition r, we generate placebo random assignments and assign each unit the potential outcome corresponding to its placebo assignment. Then we calculate the observed treatment effect, βˆr = y r1 − y r0 . This is repeated R times, simulating the distribution of estimates we would expect to see when the null hypothesis is true.

28

A

Differences in Shares

In this Appendix, we describe how we adapt the RI methods described above to produce confidence intervals for the inverse demand curves that comprise the main results of Ben Yishay et al. (2017). We begin by aggregating our WTP data to the village level, creating an outcome variable yv (p) that reflects the share of households in village v who would purchase the latrine at price p. Using this village-price level dataset, we implement the following procedure: Given a price p, we want to compute a 1 − α confidence interval for the treatment effect at that price. Let βˆ (p) = y 1 − y 0 =

X

wv yv (p) −

X

wv yv (p)

(1)

v:Tv =0

v:Tv =1

be the observed treatment effect, where y 1 represents the mean share of households in treatment villages purchasing at the given price, and y 0 is the share in control villages. (wv indicates that we are weighting each village by the number of households.) We test this observed treatment effect against a set of sharp null hypotheses {H0m : β0 = m} , m = −0.40, −0.39, . . . , +0.79, +0.80. To obtain confidence intervals, we implement these sharp nulls and use the distribution of placebo treatment effects under these nulls, as follows. For each value m of the null hypothesis, we generate potential outcomes in the treated and untreated state under the null β0 = m. Potential outcomes when treated, yv1 , are equal to the observed share of households purchasing at the given price in treatment villages (i.e., yv1 = sv if Tv = 1). For control villages, this is the observed share plus the value of the null (yv1 = sv + m if Tv = 0). Potential outcomes when not treated, yv0 , are the opposite of the above: in treatment villages, the potential outcome is the observed share minus the value of the null (yv0 = sv − m if Tv = 1); in control villages, the potential outcome is the observed share (yv0 = sv if Tv = 0).10 See Tables A1-A2 for an illustrative example. Next, we implement 100,000 repetitions of a placebo experiment. In each repetition r, we generate a random treatment assignment vector T r , in which we randomly assign 15 villages to Tvr = 1 and 15 villages to Tvr = 0. We then assign each village the potential outcome corresponding to its (placebo) random assignment. That is, if village i is randomized to Tvr = 1, it is assigned outcome yvr = yv1 , and if it is randomized to yvr = yv0 , it receives Tvr = 0. We then compute the observed treatment effect for the repetition as βˆr = y r1 − y r0 (as in Equation 1 above), the difference in the mean shares of households purchasing across 10

We restrict all shares to the interval [0, 1].

29

the randomly assigned placebo treatment and control. Table A3 continues the illustrative example of Tables A1-A2. n o n o We then collect the observed treatment effects βˆr = βˆ1 , . . . , βˆR over all repetitions. This simulates the distribution of estimates we would expect to see if the null hypothesis were true, since it is true in our simulation. We then assess the plausibility of this null hypothesis by observing where the actual observed value βˆ falls in this distribution. For a two-sided test ˆ ˆr of the hypothesis β0 = m, we calculate the share of repetitions r with β − β0 > β − β0 . This gives us an empirical p-value for the null H0m : β0 = m. Now for each H0m , we have an associated p-value p (m), i.e. p (−0.40), p (−0.39), ..., p (−0.79), p (−0.80). The 1 − α confidence interval for βˆ is the set of m with p-values greater than α. That is, the lower bound of the CI is mLB = min {m : p (m) > α} and the upper bound is mUB = max {m : p (m) > α}. This is the set of values for β that are not rejected with 95% confidence. Figure A1 illustrates the method.

B

Rank Preservation

Small et al. (2008, “STHR” hereafter) state that their method of randomization inference for quantile regression applies to “dilated treatment effects,” i.e., treatment effects that are increasing across quantiles. In fact, the method applies more broadly, to any “rank-preserving” treatment. Treatment effects could be constant or even decreasing across quantiles, as long as they are not so strongly decreasing that low-ranked units systematically overtake higherranked units. Formally, rank-preservation, or rank similarity in Chernozhukov and Hansen (2005), is defined as yi0 > yj0 ⇐⇒ yi1 > yj1 . This is equvalent to sign (yi0 − yj0 ) = sign (yi1 − yj1 ) . In words, ranks are the same in untreated (LHS) and treated (RHS) states. Note that yj0 here is written as ρ in STHR. β Let βi = yi1 − yi0 , βj = yj1 − yj0 . Let Ai j = yiobs − Zi βj be the adjusted response for unit i under the hypothesis βj . (Note that βj0 is ∆ρ in STHR, although there are some differences in sign because STHR effect is a reduction.) We want to show rank preservation implies that sign







yiobs − Zi βj − yj0 = sign (yi0 − yj0 ) .

30

(This is equivalent to STHR’s sign {Rski − (1 − Zski ) ∆ρ − ρ} = sign {rT ski − ρ} on their page 276, again keeping in mind that some things are flipped.)   If Zi = 0, then yiobs − Zi βj − yj0 = (yi0 − 0) − yj0 = yi0 − yj0 , so the equality holds trivially.   If Zi = 1, then yiobs − Zi βj − yj0 = (yi1 − βj ) − yj0 = yi1 − (yj0 + βj ) = yj1 − yi1 , and, by assumption of rank preservation, sign (yi0 − yj0 ) = sign (yi1 − yj1 ). 

31

Figure A1: Confidence Interval from Randomization Inference Effect of Financing on Share of Households Purchasing at USD 45 1.0 0.9 0.8 0.7 p−value

0.6 0.5 0.4 0.3 0.2 0.1 βest

0.0 −0.5

0.0

β0

0.5

1.0

Notes: This figure illustrates the method of constructing a confidence interval via randomization inference. The vertical line indicates the point estimate, 0.440, for the price of 45. The horizontal line at 0.025 indicates the cutoff p-value for rejecting the hypothesis β = β0 with 95% confidence, i.e. p < 0.025 for a two-sided test. The dashed line plots, for each value of the sharp null hypothesis β0 = −0.40, −0.39, . . . , +0.79, +0.80, the randomization inference p-value for the test β = β0 . The 95% confidence interval consists of the set of β0 that are not rejected, i.e. all β0 for which the dashed line is at or above the horizontal line. For this price, this is the set (0.34, 0.53).

32

Table A1: Randomization Inference – Actual Treatment Assignment, Observed Outcome and Potential Outcomes Village Tv 1 1 2 1 .. .. . . 15 1 16 0 17 0 .. .. . . 30

0

sv yv1 0.5 0.5 0.6 0.6

yv0 ? ?

0.4 0.4 ? 0.3 ? 0.3 0.1 ? 0.1 0.2

?

0.2

Notes: we only only observe the potential outcome corresponding to the actual treatment status. The potential outcome corresponding to the other, counterfactual treatment status is unobserved and unknown.

33

Table A2: Randomization Inference – Hypothesis β0 = 0.2 Actual Treatment Assignment, Observed Outcome and Potential Outcomes Village Tv 1 1 2 1 .. .. . . 15 1 16 0 17 0 .. .. . . 30

0

si 0.5 0.6

yi1 0.5 0.6

0.4 0.4 0.3 0.3 + 0.2 = 0.5 0.1 0.1 + 0.2 = 0.3 0.2

0.2 + 0.2 = 0.4

yi0 0.5 − 0.2 = 0.3 0.6 − 0.2 = 0.4 0.4 − 0.2 = 0.2 0.3 0.1 0.2

Notes: under a sharp null hypothesis (here, β0 = 0.2), we can specify the potential outcome for each unit in both the actual and counterfactual treatment state.

34

Table A3: Randomization Inference – Hypothesis β0 = 0.2; Placebo Randomized Treatment Actual Treatment Assignment, Observed Outcome, Potential Outcomes, Observed Outcomes With Placebo Randomized Treatment Village Tv 1 1 2 1 .. .. . . 15 1 16 0 17 0 .. .. . . 30

0

sv 0.5 0.6

yv1 0.5 0.6

0.4 0.4 0.3 0.3 + 0.2 = 0.5 0.1 0.1 + 0.2 = 0.3 0.2

0.2 + 0.2 = 0.4

yv0 Tvr 0.5 − 0.2 = 0.3 0 0.6 − 0.2 = 0.4 1

yvr 0.3 0.6

0.4 − 0.2 = 0.2 0.3 0.1

0 0 1

0.2 0.3 0.3

0.2

1

0.4

Notes: in each repetition r, we generate a placebo random assignment and assign each unit the potential outcome corresponding to its placebo assignment. Then we calculate the observed treatment effect, βˆr = y r1 − y r0 . This is repeated R times, simulating the distribution of estimates we would expect to see when the null hypothesis is true.

35

A Practitioner's Guide to Randomization Inference - NCSU Go Links

in Section 2, with the simplest case of randomization inference (RI) for mean effects. This simple case is useful for ...... the plausibility of this null hypothesis by observing where the actual observed value. ˆ β falls in this distribution. For a two-sided test of the hypothesis β0 = m, we calculate the share of repetitions r with. ∣. ∣.

512KB Sizes 6 Downloads 167 Views

Recommend Documents

A Practitioner's Guide to Randomization Inference - NCSU Go Links
and 90% of children under the age of five primarily defecated in the open over the fifteen days preceding the survey. Despite this, households are clearly familiar with sanitation ..... to an early childhood stimulation intervention in Jamaica. Scien

Randomization Inference in the Regression ...
Download Date | 2/19/15 10:37 PM .... implying that the scores can be considered “as good as randomly assigned” in this .... Any test statistic may be used, including difference-in-means, the ...... software rdrobust developed by Calonico et al.

pdf-1312\mastering-attribution-in-finance-a-practitioners-guide-to ...
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. pdf-1312\mastering-attribution-in-finance-a-practitione ... investment-returns-financial-times-series-by-andrew.pdf. pdf-1312\mastering-attribution-in-fin

pdf-1870\disciplined-agile-delivery-a-practitioners-guide-to-agile ...
... apps below to open or edit this item. pdf-1870\disciplined-agile-delivery-a-practitioners-guide-to-agile-software-delivery-in-the-enterprise-ibm-press.pdf.

pdf-1875\practitioners-guide-to-psychoactive-drugs-for-children-and ...
... the apps below to open or edit this item. pdf-1875\practitioners-guide-to-psychoactive-drugs-for-children-and-adolescents-sciences-300-from-springer.pdf.

man-117\the-practitioners-guide-to-investment-banking-mergers ...
... EBOOK PDF FROM OUR ONLINE LIBRARY. Page 3 of 6. man-117\the-practitioners-guide-to-investment-banking-mergers-acquisitions-corporate-finance.pdf.

pdf-1874\the-practitioners-guide-to-psychoactive-drugs-by ...
... the apps below to open or edit this item. pdf-1874\the-practitioners-guide-to-psychoactive-drug ... n-j-gelenberg-ellen-l-bassuk-stephen-c-schoonover.pdf.

pdf-175\practitioners-guide-to-empirically-supported-measures-of ...
... apps below to open or edit this item. pdf-175\practitioners-guide-to-empirically-supported-me ... lence-abct-clinical-assessment-series-by-george-f-r.pdf.

pdf-14108\a-cbt-practitioners-guide-to-act-how-to-bridge ...
... the apps below to open or edit this item. pdf-14108\a-cbt-practitioners-guide-to-act-how-to-bri ... al-therapy-acceptance-commitment-therapy-pb2008-f.pdf.

What a Way to Go!
life when many adults slip into neutral and glide toward the sunset, Abraham stayed engaged, remarrying after Sarah died, starting a new family, and gener- ously using his resources to encourage the next generation. Enthusiastic participation in life

What a Way to Go!
Aging Abraham kept active right up to the end of his journey. In a season of life when ... 137 years old when his dear wife passed and the light of his life went out.