Bounding Average Treatment Effects using Linear Programming Luk´aˇs Laff´ers Department of Economics, NHH Norwegian School of Economics

March 13, 2013

This paper is about Identification. What can we learn from data and assumptions? What drives our results? Which assumptions are important? Which assumptions are not important?

Here I present a general framework for studying bounds on Average Treatment Effects. I show how to relax various assumptions to see their identification strength. Demonstrate it on an application of an effect of mother’s on child’s schooling.

Setup and Notation Individual

i: yi (.) : T → Y (individuals do not interact)

(Potential) Treatment: t ∈ T (mutually exclusive and exhaustive) (Potential) Outcome: yi (t) ∈ Y Observed Treatment: zi ∈ T Observed Outcome: yi ≡ yi (zi ) ∈ Y Monotone instrument: vi ∈ V Fundamental problem: yi (t) is not observed for t 6= zi Distribution of (yi , zi , vi ) is observed.

Our Goal Learn about the probability distribution of counter-factual outcomes P(y (t1 ), y (t2 ), . . . , y (tm )) What are we interested in? • E [y (t)] - average treatment response

• E [y (t)] − E [y (s)] - average treatment effect

Examples • Effect of parental schooling on child’s schooling • Effectiveness of a labor participation program • Effect of a medical intervention

Assumptions have to made in order to learn something about properties of an unobserved counter-factual distribution. These assumptions may or may not be strong enough be point identify the quantity of interest. If only weak assumptions are made, the quantity of interest may be partially identified.

Examples (Manski)

Say we are interested in E [y (t)] E [y (t)] = E [y |z = t].P(z = t) + E [y (t)|z 6= t].P(z 6= t)

Example

Say we are interested in E [y (t)] E [y (t)] = E [y |z = t].P(z = t) + E [y (t)|z 6= t].P(z 6= t) Observed quantities Unobserved quantities

Example (exogenous selection)

If we assume that E [y (t)|z = t] = E [y (t)|z 6= t] E [y (t)] = E [y |z = t].P(z = t) + E [y (t)|z 6= t].P(z 6= t) = E [y |z = t].P(z = t) + E [y |z = t].P(z 6= t)

= E [y |z = t]

Under this assumption, E [y (t)] is point identified.

Example (bounded support) Suppose that ymin ≤ yi (t) ≤ ymax LBE [y (t)] = E [y |z = t].P(z = t) + ymin .P(z 6= t) ≤

E [y (t)] = E [y |z = t].P(z = t) + E [y (t)|z 6= t].P(z 6= t) ≤

UBE [y (t)] = E [y |z = t].P(z = t) + ymax .P(z 6= t) Under this assumption, E [y (t)] is partially identified and the interval (LBE [y (t)] , UBE [y (t)] ) is called an identified set.

Schooling example Suppose we are interested in an effect of mother’s education of child’s education (de Haan 2012). Outcome - College degree of child i : yi (.) : {0, 1} → {0, 1} (Potential) Treatment - Mother’s college: t ∈ {0, 1} (Potential) Outcome - Child’s college: yi (t) ∈ {0, 1} Observed Treatment: Observed mother’s college zi ∈ {0, 1} Observed Outcome: Observed child’s college yi ≡ yi (zi ) ∈ {0, 1} Monotone instrument - Father’s schooling level: vi ∈ {1, 2, 3, 4} Data: Wisconsin Longitudinal Study

Different assumptions • Monotone Treatment Response (MTR) assumption

∀i, t2 ≥ t1 : yi (t2 ) ≥ yi (t1 )

• Monotone Treatment Selection (MTS) assumption

∀t, z2 ≥ z1 : E [y (t)|z = z2 ] ≥ E [y (t)|z = z1 ]

• Conditional Monotone Treatment Selection (cMTS)

assumption ∀t, z2 ≥ z1 , ∀m : E [y (t)|z = z2 , v = m] ≥ E [y (t)|z = z1 , v = m]

• Monotone Instrumental Variable (MIV) assumption

∀t, v2 ≥ v1 : E [y (t)|v = v2 ] ≥ E [y (t)|v = v1 ]

Analytical bounds on E [y (t)] under MTR, MTS, MIV, MTR+MTS and MTR+cMTS+MIV are available. These then translate to bounds on E [y (t)] − E [y (s)].

Results

Bounds on Effect of Mother’s College Increase on the Probability of Child Has College Degree Assumptions [Lower Bound, Upper Bound] No Assumptions [-0.358, 0.641] MTS [-0.358, 0.365] cMTS [-0.358, 0.214] MTR [0, 0.641] MTR + MTS [0, 0.365] MTR + cMTS [0, 0.214] MTR + MTS + MIV [0, 0.365] MTR + cMTS + MIV [0, 0.214] Note: Estimates are not bias corrected, n = 16912

How did I calculate these numbers?

Unobserved (y(0), y(1))

The Joint Support of (y(0), y(1), y, z, v)

(1,1)

(1,0)

(0,1)

(0,0)

(0,0,1) (0,0,2) (0,0,3) (0,0,4) (0,1,1) (0,1,2) (0,1,3) (0,1,4) (1,0,1) (1,0,2) (1,0,3) (1,0,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4)

Observed (y, z, i)

Compatibility with Observed Data: ∀i, t : zi = t ⇒ yi = yi (t)

Unobserved (y(0), y(1))

Points compatible with observed data Points not compatible with observed data

(1,1)

(1,0)

(0,1)

(0,0)

(0,0,1) (0,0,2) (0,0,3) (0,0,4) (0,1,1) (0,1,2) (0,1,3) (0,1,4) (1,0,1) (1,0,2) (1,0,3) (1,0,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4)

Observed (y, z, i)

Monotone Treatment Response: ∀i : yi (0) ≤ yi (1)

Unobserved (y(0), y(1))

Points compatible with observed data Points not compatible with the observed data Points ruled out by MTR assumption

(1,1)

(1,0)

(0,1)

(0,0)

(0,1,1) (0,1,2) (0,1,3) (0,1,4) (0,2,1) (0,2,2) (0,2,3) (0,2,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4) (1,2,1) (1,2,2) (1,2,3) (1,2,4)

Observed (y, z, i)

Monotone Treatment Selection: E[y(t)|z = 1] ≥ E[y(t)|z = 0]

Unobserved (y(0), y(1))

z=0

z=1

z=0

z=1

(1,1)

(1,0)

(0,1)

(0,0)

(0,0,1) (0,0,2) (0,0,3) (0,0,4) (0,1,1) (0,1,2) (0,1,3) (0,1,4) (1,0,1) (1,0,2) (1,0,3) (1,0,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4)

Observed (y, z, i)

Monotone Instrumental Variable: E[y(t)|v = 2] ≥ E[y(t)|v = 1]

Unobserved (y(0), y(1))

v=1

v=2

v=1

v=2

v=1

v=2

v=1

v=2

(1,1)

(1,0)

(0,1)

(0,0)

(0,0,1) (0,0,2) (0,0,3) (0,0,4) (0,1,1) (0,1,2) (0,1,3) (0,1,4) (1,0,1) (1,0,2) (1,0,3) (1,0,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4)

Observed (y, z, i)

In the empirical application Joint Distribution - Max Upper Bound on ATE under MTR+MTS+MIV = 0.3646 Observed and unobserved component are compatible Observed and unobserved component are not compatible Ruled out by the MTR assumption

Unobserved (y(0), y(1))

(1,1) 0

0

0

0

0

0

0

0

0.155

0.055

0.054

0.047

0.017

0

0.001

0.055

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.196

0.052

0.029

0.017

0

0

0

0

0

0

0

0

0

0.018

0.042

0.01

0.2

0.003

0

0

0.013

0.01

0.013

0.012

0

0

0

0

0

0

0

0

(1,0)

(0,1)

(0,0)

(0,0,1) (0,0,2) (0,0,3) (0,0,4) (0,1,1) (0,1,2) (0,1,3) (0,1,4) (1,0,1) (1,0,2) (1,0,3) (1,0,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4) 0.397

0.055

0.029

0.017

0.013

0.01

0.013

0.012

0.155

0.055

Observed (y, z, v)

0.054

0.047

0.017

0.018

0.043

0.065

Corresponding Linear Program

Pencil and Paper vs Computer? Y = {0,1}, T = {0,1,2,3}, I ={1,2,3}

Unobserved (y(0), y(1), y(2), y(4))

(1,1,1,1)

Points compatible with observed data Points not compatible with the observed data Points ruled out by MTR assumption

(1,1,1,0) (1,1,0,1) (1,1,0,0) (1,0,1,1) (1,0,1,0) (1,0,0,1) (1,0,0,0) (0,1,1,1) (0,1,1,0) (0,1,0,1) (0,1,0,0) (0,0,1,1) (0,0,1,0) (0,0,0,1) (0,0,0,0) (0,0,1)(0,0,2)(0,0,3)(0,1,1)(0,1,2)(0,1,3)(0,2,1)(0,2,2)(0,2,3)(0,3,1)(0,3,2)(0,3,3)(1,0,1)(1,0,2)(1,0,3)(1,1,1)(1,1,2)(1,1,3)(1,2,1)(1,2,2)(1,2,3)(1,3,1)(1,3,2)(1,3,3)

Observed (y, z, i)

How Robust are these results?

Using weaker assumptions our goal is to get more robust results. But are they? Relax different assumptions and see their identification strength. • • • • •

Mis-measurement of Outcomes or Treatments (MOT) Relaxed Monotone Treatment Response (rMTR) Relaxed Monotone Treatment Selection (rMTS) Relaxed Monotone Instrumental Variable (rMIV) Missing Data

Mis-measurement of Outcomes or Treatments (MOT)

P[zi = t ⇒ yi = yi (t)] ≥ 1 − αMOT

Data collected mostly using phone interviews. Instead of assuming that every individual’s outcome or treatment was recorded correctly, we assume that not more than 100αMOT % could have been recorded incorrectly.

Compatibility with Observed Data: ∀i, t : zi = t ⇒ yi = yi (t)

Unobserved (y(0), y(1))

Points compatible with observed data Points not compatible with observed data

(1,1)

(1,0)

(0,1)

(0,0)

(0,0,1) (0,0,2) (0,0,3) (0,0,4) (0,1,1) (0,1,2) (0,1,3) (0,1,4) (1,0,1) (1,0,2) (1,0,3) (1,0,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4)

Observed (y, z, i)

Relaxed Monotone Treatment Response (rMTR)

P[t2 ≥ t1 ⇒ yi (t2 ) ≥ yi (t1 )] ≥ 1 − αMTR ,

Behrman and Rosenzweig (AER, 2002) suggest that more educated woman spend less time with their children. We allow that at most 100αMTR % childrens’ potential education is not increasing function of mother’s education.

Monotone Treatment Response: ∀i : yi (0) ≤ yi (1)

Unobserved (y(0), y(1))

Points compatible with observed data Points not compatible with the observed data Points ruled out by MTR assumption

(1,1)

(1,0)

(0,1)

(0,0)

(0,1,1) (0,1,2) (0,1,3) (0,1,4) (0,2,1) (0,2,2) (0,2,3) (0,2,4) (1,1,1) (1,1,2) (1,1,3) (1,1,4) (1,2,1) (1,2,2) (1,2,3) (1,2,4)

Observed (y, z, i)

Relaxed Monotone Treatment Selection (rMTS)

∀z2 ≥ z1 : E [y (t)|z = z1 ] − E [y (t)|z = z2 ] ≤ δMTS

We assume that children of lower educated women may have higher potential schooling but not more than by δMTS .

Relaxed Monotone Instrumental Variable (rMIV)

∀v2 ≥ v1 : E [y (t)|v = v1 ] − E [y (t)|v = v2 ] ≤ δMIV

We assume that children of lower educated men may have higher potential schooling but not more than by δMIV .

Missing Data

Responsiveness’ rates very good, around 90%. But • Data not missing-at-random.

• Systematic Non-responsiveness (Hauser 2005) in our dataset.

Hence cannot be ignored. We remain agnostic about the process that drives the missingness.

Results - Robustness Bounds on Effect of Mother’s College Increase on the Probability of Child Has College Degree MTR+cMTS+MIV [Lower bound, Upper Bound] = [0, 21.44%]

Optimistic Pessimistic

Lower Bound αMTR 0.01 -1% 0.05 -5%

αMOT 0.001 23.36% 0.01 35.66%

Upper Bound αMTS αMIV 0.01 0.01 22.44% 21.44% 0.05 0.05 26.44% 21.44%

Optimistic

[−1%, 28.62%]

Pessimistic

[−5%, 44.10%]

Note: Estimates are not bias corrected, n = 16912

αMISS 0.01 27.31% 0.10 38.15%

What is the source of identification? cMTS: ∀t, z2 ≥ z1 , ∀m : E [y (t)|z = z2 , v = m] ≥ E [y (t)|z = z1 , v = m] Bounds on ATE MTR + cMTS [0, 21.44%] MTR + cMTS + MIV [0, 21.44%]

If cMTS holds for v ∈ {2, 3, 4} only: Bounds on ATE MTR + cMTS [0, 46.71%] MTR + cMTS + MIV [0, 27.54%]

What is the source of identification? (2) Binding constraints under MTR+cMTS+MIV and Lagrange multipliers:

cMTS

 E [y (0)|z     E [y (1)|z     E [y (0)|z    E [y (1)|z E [y (0)|z      E [y (1)|z     E [y (0)|z   E [y (1)|z

= 1, v = 1, v = 1, v = 1, v = 1, v = 1, v = 1, v = 1, v

= 1] = 1] = 2] = 2] = 3] = 3] = 4] = 4]

≥ ≥ ≥ ≥ ≥ ≥ ≥ ≥

E [y (0)|z E [y (1)|z E [y (0)|z E [y (1)|z E [y (0)|z E [y (1)|z E [y (0)|z E [y (1)|z

= 0, v = 0, v = 0, v = 0, v = 0, v = 0, v = 0, v = 0, v

Non-binding constraints:

MIV

 E [y (0)|v     E [y (1)|v    E [y (0)|v E [y (1)|v     E [y (0)|v    E [y (1)|v

= 2] = 2] = 3] = 3] = 4] = 4]

≥ ≥ ≥ ≥ ≥ ≥

E [y (0)|v E [y (1)|v E [y (0)|v E [y (1)|v E [y (0)|v E [y (1)|v

= 1] = 1] = 2] = 2] = 3] = 3]

0 0 0 0 0 0

= 1] = 1] = 2] = 2] = 3] = 3] = 4] = 4]

0.0303 0.5505 0.0282 0.1106 0.0554 0.0823 0.0766 0.0637

What is the source of identification? (3) Binding constraints under MTR+cMTS+MIV: (cMTS for v ∈ {2, 3, 4}) and Lagrange multipliers

cMTS

MIV

 E [y (0)|z = 1, v = 2]     E [y (1)|z = 1, v = 2]    E [y (0)|z = 1, v = 3] E [y (1)|z = 1, v = 3]     E [y (0)|z = 1, v = 4]    E [y (1)|z = 1, v = 4] E [y (1)|v = 2]

≥ E [y (0)|z = 0, v ≥ E [y (1)|z = 0, v ≥ E [y (0)|z = 0, v ≥ E [y (1)|z = 0, v ≥ E [y (0)|z = 0, v ≥ E [y (1)|z = 0, v ≥ E [y (1)|v = 1]

Non-binding constraints:

MIV

 E [y (0)|v     E [y (1)|v    E [y (0)|v E [y (1)|v      E [y (0)|v   E [y (1)|v

= 2] = 2] = 3] = 3] = 4] = 4]

≥ ≥ ≥ ≥ ≥ ≥

E [y (0)|v E [y (1)|v E [y (0)|v E [y (1)|v E [y (0)|v E [y (1)|v

= 1] = 1] = 2] = 2] = 3] = 3]

0 0 0 0 0

= 2] = 2] = 3] = 3] = 4] = 4]

0.0282 0.5768 0.0554 0.0823 0.0766 0.0637 0.5821

Statistical Inference This is work in progress. If bounds are smooth functions of observed probabilities • • • •

Bootstrap - Efron and Tibshirani (1993) Normal approximation - Imbens and Manski (2004) Bayes - Moon and Schorfheide (2012), Kitagawa (2012) ... many others

If bounds are not smooth • Intersection Bounds - Chernozhukov, Lee and Rosen (2013) • Subsampling - Romano and Shaikh (2012) • ...

My attempt: Optimize across set of observed probabilities that would not have been rejected by non-parametric test of equality of distributions.

Thank you for your attention. Any comments are very welcome.

http://sites.google.com/site/lukaslaffers [email protected]

Bounding Average Treatment Effects using Linear Programming

Mar 13, 2013 - Outcome - College degree of child i : yi (.) ... Observed Treatment: Observed mother's college zi ∈ {0,1} .... Pencil and Paper vs Computer?

553KB Sizes 2 Downloads 329 Views

Recommend Documents

Distributional treatment effects
Contact information. Blaise Melly. Department of Economics. Bern University. [email protected]. Description of the course. Applied econometrics is mainly ... Computer codes are available for most of the estimators. ... Evaluations and Social

linear programming
berg and Tarjan [11] for minimum cost network flows. Step 0. .... For instance, for solving network flow problems there is no need to write .... New York, 1976.

Distributed Average Consensus Using Probabilistic ...
... applications such as data fusion and distributed coordination require distributed ..... variance, which is a topic of current exploration. Figure 3 shows the ...

Tolerating de ance? Local average treatment e ects ...
Oct 20, 2016 - a valid instrument should be as good as randomly assigned and should not have a direct effect on the ... centers actually increased the number of late-coming parents. .... P(C)% of compliers and call them compliers-defiers:.

Using Fractional Autoregressive Integrated Moving Average (FARIMA ...
Using Fractional Autoregressive Integrated Moving Averag ... arture (Interior) in Sulaimani International Airport.pdf. Using Fractional Autoregressive Integrated ...

Identification of Average Marginal Effects Under ...
and Booth School of Business, University of Chicago. Email: [email protected]. 1 ... best of our knowledge, the second, third and fourth results are novel.

The Average and Heterogeneous Effects of ...
Nov 6, 2017 - Deichmann and Siobhan Murray for sharing their roads data, François Moriconi-Ebrard for help with data collection ..... Demographic and Health Surveys as measures of economic development, these data do not exist before the ...... Intra

The Average and Distributional Effects of Teenage ... - Jie Gong
Most inhumanely, the experience amounted to deportation from their families and homes. Some of the teenagers were sent to remote areas or border regions, and were not allowed to visit their families for several years. ..... form,” which sets the ag

The Average and Heterogeneous Effects of ...
Nov 6, 2017 - roughly constant across the first three decades of road-building, ... of a Trans-African Highway (TAH) system, and describe them as having the ..... Wikipedia, aggregating multiple administrative cities into one ...... Gwilliam, Ken, Af

The Average and Distributional Effects of Teenage ... - Jie Gong
to estimate the impact on people's physical and mental outcomes 40 years later. Our results suggest that rusticated youths were more likely to develop mental disorders but not to have worse physical ..... The monotony of life and the lack of cultural

Dynamic Treatment Regimes using Reinforcement ...
Fifth Benelux Bioinformatics Conference, Liège, 1415 December 2009. Dynamic ... clinicians often adopt what we call Dynamic Treatment Regimes (DTRs).

Dynamic Treatment Regimes using Reinforcement ...
Dec 15, 2009 - Raphael Fonteneau, Susan Murphy, Louis Wehenkel, Damien Ernst. University of Liège, University of Michigan. The treatment of chroniclike illnesses such has HIV infection, cancer or chronic depression implies longlasting treatments that

Chronic temporomandibular pain treatment using sodium diclofenac ...
Sandra Sato*; Murillo Sucena Pita**; Cássio do Nascimento*. & Vinícius Pedrazzi***. VAROLI, F. L.; SATO, S. ... After, were made a flat, full-covered and rigid. occlusal splint for each volunteer. They had ... Page 3 of 6. Main menu. Displaying Chr

The Deterrent Effects of Prison Treatment
Nevertheless, if we open the black box of prisons, we find very ...... explained by prison distance (we observe a drop in the point estimate of volunteers of 30 percent). .... the offences regulated by Book II, Section XIII, of the Italian Penal Code

Treatment Effects, Lecture 1: Counterfactual problems ...
A hard-line view is expressed by Holland (and Rubin):. “NO CAUSATION WITHOUT ... by simply adding and subtracting the term in the middle. The observed ... The ATT, on the other hand, is the average treatment effect actually experienced in ...

Effects of expanding health screening on treatment
Dec 19, 2017 - analysis, such as false reductions in measured health system performance as screening expands. Keywords: .... As a result, commonly used health system performance metrics focus on treatment and control of conditions ...... man, C. S. M

Effects of preincisional ketamine treatment on ... - Semantic Scholar
have limitations, then other pharmacological agents must be explored as ... parametric analysis of VAS data revealed that women receiving 0.5 mg/kg of ...

Treatment Effects, Lecture 3: Heterogeneity, selection ...
articulate expression, are even less sanguine: I find it hard to make any sense of the LATE. ... risk neutrality, decision-making on the basis of gross benefits alone, etc— the basic setup has applications to many .... generalized Roy model—and t

Dynamic Discrete Choice and Dynamic Treatment Effects
Aug 3, 2006 - +1-773-702-0634, Fax: +1-773-702-8490, E-mail: [email protected]. ... tion, stopping schooling, opening a store, conducting an advertising campaign at a ...... (We recover the intercepts through the assumption E (U(t)) = 0.).

Effects of preincisional ketamine treatment on ... - Semantic Scholar
If the pain of undergoing and recovering from sur- gery indeed ... parametric analysis of VAS data revealed that women receiving ..... The SD and SEM were not ...

Preferences and Heterogeneous Treatment Effects in a Public School ...
on their preferences, parents may trade-off academic achievement against other desirable ..... Priority 1: Student who had attended the school in the prior year.

Empirical Econometrics: Treatment Effects and Causal ...
to Andrew Zeitlen who taught this course over a number of years and whose notes form .... 4.1.2 Instrumental variables estimates under heterogeneous treatment effects . .... For example, we may be interested in the impact of attending secondary schoo