Dipartimento di Politiche Pubbliche e Scelte Collettive – POLIS Department of Public Policy and Public Choice – POLIS

Working paper n. 88 April 2007

The Propensity Score method in public policy evaluation: a survey Michela Bia

UNIVERSITA’ DEL PIEMONTE ORIENTALE “Amedeo Avogadro” ALESSANDRIA Periodico mensile on-line "POLIS Working Papers" - Iscrizione n.591 del 12/05/2006 - Tribunale di Alessandria

The Propensity Score Method in Public Policy Evaluation: a Survey♣

Michela Bia*



*

University of Florence – G. Parenti Statistics Department – Florence, Italy. University of Eastern Piedmont, Department of Public Policy and Public Choice - POLIS, Alessandria, Italy. E-mail: [email protected]; [email protected]

The author wishes to thank F. Mealli, A. Mattei, D. Bondonio, A. Martini, G. Imbens.

Abstract Recently, in the field of causal inference, nonparametric techniques, that use matching procedures based for example on the propensity score (Rosenbaum, Rubin, 1983), have received growing attention. In this paper we focus on propensity score methods, introduced by Rosenbaum and Rubin (1983). The key result underlying this methodology is that, given the ignorability assumption, treatment assignment and the potential outcomes are independent given the propensity score. Much of the work on propensity score analysis has focused on the case where the treatment is binary, but in many cases of interest the treatment takes on more than two values. In this article we examine an extension to the propensity score method, in a setting with a continuous treatment.

ii

Introduction Recently, in the field of causal inference, nonparametric techniques, that use matching procedures based for example on the propensity score (Rosenbaum, Rubin, 1983), have received growing attention. When the data, the type of intervention and the assignment criterion allow it, a quasiexperimental design can be assumed such as the regression discontinuity design (Thwistelthwaite, Campbell, 1960; Battistin, Rettore, 2004). Another assumption, that leads to another quasi-experimental design and may be reasonable to assume in some observational studies, is that treatment assignment is unconfounded with potential outcomes conditional on a sufficient set of covariates or pretreatment variables. The unconfoundedness assumption allows us to compare treated and control units with the same value of the covariates. Given unconfoundedness, various methods have been proposed for estimating causal effects. In this paper we focus on propensity score methods, introduced by Rosenbaum and Rubin (1983). The key result underlying this methodology is that, given the ignorability assumption, treatment assignment and the potential outcomes are independent given the propensity score. Thus, adjusting on the propensity score removes the bias associated with differences in the observed covariates in the treated and control groups. To estimate propensity scores, which are the conditional probabilities of being treated given a vector of observed covariates, we must specify the distribution of the treatment indicator given pre-treatment variables.

iii

Much of the work on propensity score analysis has focused on the case where the treatment is binary, but in many cases of interest the treatment takes on more than two values (for example, we can think to drug applied in different doses or a treatment applied over different time periods…etc). In this paper we examine an extension to the propensity score method, in a setting with a continuous treatment. The first section introduces the standard propensity score analysis (Rosenbaum and Rubin, 1983) - that is when the treatment is binary. The second section is a review of the propensity score methodology with multiple treatment. The third section deals with the propensity score method when the treatment is continuous.

iv

1

1.1

The evaluation of public policies: some statistical methods

Introduction

The evaluation of policies carried out by using quantitative tools is a tangible answer to the need to express “judgements empirically based on achievement accomplished by a public policy when facing a particular collective problem”. By “collective problem” we mean a situation that is socially perceived as inadequate and, as such, worthy of change and eventually worthy of public contribution. Think about pollution in city centres, assistance to old people or the lack of competitiveness between small and medium enterprises: these are all problems which require public involvement through allocation of

fund. When a problem is faced by an

intervention, we are referring to a public policy (Martini et al. 2005). The statistic field of reference is that of causal inference, with the reference to and the development of appropriate quantitative methods for policy effect evaluation.

The starting point of a policy effect evaluation is the

identification of the object of analysis, i.e., referring to the potential outcome approach to causal inference (Neyman, 1923; Rubin, 1974), a characteristic of the distribution of the difference between two potentially observable outcomes: Y0 (a post-intervention variable observed on a unit individual or firm - in the absence of an intervention) and Y1 (a postintervention variable observed on a unit in the presence of an intervention). Identification and estimation of such parameters present some relevant problems: a) only one of the two potential outcomes is observed on a single

1

unit, the other representing the counterfactual situation; b) the assignment to the treatment is usually not random, so estimation is based on observational data; c) it is necessary to isolate the effect of the intervention from the effects of other factors, which can influence access and results. Appropriate estimation methods (parametric or nonparametric) should be based on sensible hypotheses about the assignment rule, which allow to identify (even partially, Manski, 1995, 2003) the causal effects. In observational studies, a usual starting point consists in constructing a control group (units not receiving the treatment, but similar to units receiving it), under the unconfoundedness assumption (Rosenbaum, Rubin, 1983). In this section we intend to describe the basic principles of such an approach, continuing with a more formal discussion.

1.2

Potential results and the Rubin Causal model

The basic idea of causal inference is that of an action (or a treatment) applied to a unit, where unit means a person or a company, at a specific point in time. As a result, in the binary treatment case, for each unit and each treatment there are two potential results: one referring to the value of the outcome variable in the event of treatment, and the other in the event of non treatment. The causal effect is the result of a comparison between the two potential results. The use of the adjective ‘potential’ is motivated by the impossibility of observing the outcome both with and without treatment. This is defined as the basic problem of causal inference (Holland, 1986). In this sense it is very useful to have information about several units, analysing the distribution of the treatment effect and concentrating on summary

2

measures of such distribution, for example the average treatment effects. In order to obtain correct estimates of such quantities, it is crucial to define the assignment mechanism, described below. We now introduce some notation: consider a population of

N units. Each unit i is characterised by a k-

dimensional vector of Xi covariates, two potential results Yi(0) and Yi(1) and a variable Zi ∈ {0,1}, which denotes the assignment (Zi = 1) or not (Zi = 0) to the treatment. X indicates the matrix (N × K) of the k units’ characteristics, Y(0) and Y(1) the vector of the potential results and Z the vector for assignment to treatment 1. From the existing tie between the vectors of potential results (Y(0), Y(1)) and treatments (Z), we have two distinct relationships between the observed and unobserved results, denoted by Yi (observed) and Yi (missing) respectively: Yi (observed) = Zi⋅ Y(1) + (1-Zi)⋅ Yi(0) Yi (missing) = (1-Zi)⋅ Y(1) + Zi⋅ Yi(0) where Yi

(observed)

and Yi (missing) represent the i.th element of vectors Yi (observed)

and Y(missing). In order to identify and define causal effects, it is necessary to make some assumptions. An important assumption, that reduces the number of potential results is the following (Rubin, 1978a, 1980): Assumption 1.1

Stable

Unit

Treatment

Value

Assumption

(SUTVA), under which the potential outcomes Yi(Zi) for the ιth unit just depend on the treatment that the ιth unit received. That is, there is “no interference between units” and there are “no versions of treatments”.

3

It is necessary to emphasize how the reliability of such an assumption is neither testable, nor removable and completely based on the experience of the researcher. Identifying treatment effects relies on further assumption on the assignment mechanism, that is, the mechanism that determines which units get which treatments, formally defined as follows: Definition 1.1

Assignment mechanism

Given a population of N units, the assignment mechanism is a row exchangeable function p( ( Z ; X , Y ( 0), Y (1) ), with values included in

{0,1}N and so that ∑ Z p ( Z ; X , Y (0), Y (1)) = 1 , for each

X , Y (0), Y (1) .

The probability of unit assignment is defined as follows: Definition 1.2

Units assignment probability

The probability of assignment to treatment for unit i is given by:

z

p i ( X , Y ( 0), Y (1)) = ∑

z i =1

p( Z ; X , Y ( 0), Y (1)) .

Let X(i) indicates the matrix (N-1) × K dimension obtained removing the i.th matrix row X; and analogously for Y(i)(0) and Y(i)(1). The exchangeability of the assignment mechanism allows us to rewrite the N functions pi(.) in terms of a common function

q(.), that depends on the covariates and potential

results of unit i and on the covariates and potential results of all other units. p i ( X , Y ( 0), Y (1)) = q ( X i , Yi (0), Yi (1), X ( i) , Y (0) ( i) , Y (1) ( i) ) for any i = 1,…N Strictly connected to the assignment probability concept there is the propensity score, defined as follows:

4

Definition 1.3

Propensity score.

Given a population of N units, the propensity score is defined as:

e ( X , Y (0 ),Y (1)) = ∑ i:X i = x p i {( X , Y (0 ),Y (1) / N x )} with N x equal to the number of units with X i = x . For each x resulting in N x = 0, the propensity score is not defined. (Imbens, 2002). This definition of the propensity score, which will be examined in detail in later sections, will be useful later on for analysing our case study, where the treatment turns out to be a continuous variable and a generalized propensity score is defined to allow for treatment effect estimation with no binary treatment. The definition of probabilistic assignment follows: Definition 1.4

Probabilistic assignment

An assignment mechanism is referred to as probabilistic if for every i the assignment probability is between 0 and 1, that is: 0 < pi ( X , Y (0), Y (1) ) < 1 This assumption requires that each unit has a non-zero probability of being treated and, at the same time, there are no units with a probability equal to 1 of being treated.

5

1.3

Causal effects and identifying assumptions

Inferences about the effects of the treatments involve speculations about the effect that one treatment would have had on a unit which, actually, received an other treatment (Rosenbaum and Rubin, 1983a). If we consider the binary treatment, according to the type of intervention assigned to the N units under study, the i.th unit has both a response Yi(1), that would have resulted if it had received treatment 1 and a response Yi(0) that would have resulted if it had received treatment 0. As a result, causal effects are comparison of Yi (1) ,Yi (0) (for example, a difference Yi (1) − Yi ( 0) or a ratio Yi (1) / Yi ( 0) ). It is evident that estimating the causal effects of treatments is a missing data problem, since either Yi(1) or Yi(0) is missing. In causal inference in general – and in policy evaluation in particular – a quantity of primary interest to be estimated is the average treatment effect ATE, defined as follows: ATE = E[τ i ] = E[Yi (1) − Yi (0)] = E[Yi (1)] − E[Yi (0)] The estimation of the average treatment effect for a subpopulation (SATE) having received treatment level z - with z = 0,1 - is equal to:

[

SATE = τ T = E τ i i : Z = z

]

and when z = 1 the SATE is usually known as the ATT. In particular, in the field of policy evaluation, we are interested in the ATT estimation, because, with such an estimate, it is possible to assess how much the intervention may have produced a change in a given condition or behaviour of the policy

6

beneficiaries. Now, in the case of randomized experiments, it has been shown (Neyman, 1923) that the ATT can be easily estimated; this means that it is possible to obtain an unbiased estimate of the average causal effect through the SUTVA assumption, by a direct comparison between the average results of the two treatment groups, in which units are similar, with respect to any possible characteristics, including potential results, thanks to randomisation. In an observational field1 , however, such direct comparisons may be misleading because the units exposed to one treatment generally differ systematically from the units exposed to the other treatment. Specifically, whereas in experimental situations one can obtain a control and treatment group which are homogeneous with respect to the observable characteristics, X, this is not possible in nonexperimental studies since it is likely that the decision to be assigned to a treatment is, in this case, not independent from the observable as well as unobservable characteristics2 . This leads to a self-selection process which makes the two groups potentially different even before the policy is carried out. A possible way to address this complication in nonexperimental studies is to consider the randomized experiment as a template for the analysis of an observational (i.e., nonrandomized) study. Having the template of a randomized experiment means having to think about the underlying randomized experiment that could have been done, where in the randomized experiment underlying an observational study, the probabilities of assignment to

1

A study is considered observational when a treatment assignment is not known. With random assignment, homogeneity of the control and treatment group with respect to the unobservable characteristics is also guaranteed if the size of the groups is sufficiently large. 2

7

treatments are not equal, but are rather functions of the covariates, and so the template is actually an unconfounded assignment mechanism. To do this we make the strong ignorability or unconfoundedness assumption. Assumption 1.2

Strong

Unconfoundedness

assumption

(Rosenbaum and Rubin, 1983) Generally, we shall say treatment assignment is strongly ignorable given a vector of covariates X if Y ( 0), Y (1) ⊥ Z X and 0 < prob ( Z = 1 X ) < 1 (common support)

referring, from now on, to Y ( 0), Y (1) instead of Yi ( 0), Yi (1) for the potential results corresponding to the i.th individual. For brevity, when treatment assignment is strongly ignorable given the observed covariates X, we shall say simply that treatment assignment is strongly ignorable. The strong ignorability assumption asserts that the probability of assignment to a treatment does not depend on the potential outcomes conditional on observed covariates. In other words, within subpopulations defined by values of the covariates, we have random assignment. This assumption rules out the role of the unobservable variables. The issue of unobserved covariates should be addressed using models for sensitivity analysis (e.g., Rosenbaum and Rubin, 1983b) or using non parametric bounds for treatment effects (Manski, 1990; Manski et al., 1992). Of course, if the goal is to identify only the Average Treatment Effect for the Treated (ATT), a weaker assumption can be made:

8

Assumption 1.3

Weak Unconfoundedness assumption (Rosenbaum

and Rubin, 1983) Y ( 0) ⊥ Z X and prob ( Z = 1 X ) < 1

That is, the unconfoundedness assumption can be relaxed, requiring only that Y(0) is independent of Z given X. Also the overlap condition can be relaxed so that the support of X for the treated units is a subset of the support of X for the untreated units Both the unconfoundedness assumption and the overlap condition, may be controversial in applications. The first assumption requires that all variables that affect both outcome and the likelihood of receiving the treatment are observed or that all the others are perfectly collinear with the observed ones. Although this assumption is not testable, it is a very strong assumption, and one that need not generally be applicable. Clearly selection may also take place on the basis of unobservable characteristics. However, any alternative assumptions that not rely on unconfoundedness, while allowing for consistent estimation of the causal effects of interest, must make alternative untestable assumptions. Whereas the unconfoundedness assumption implies that the best matches are units that differ only in their treatment status, but otherwise are identical, alternative assumptions implicitly match units that differ in the pre-treatment characteristics. Often such assumptions are even more difficult to justify. For instance, the technique of instrumental variables

is

sometimes

considered

as

an

alternative

to

assuming

unconfoundedness (Heckman, 1979; Heckman and Hotz, 1989), but a disadvantage of these methodologies is the high sensitiveness with respect to the distributional hypothesis. A possible solution to this is a non or semi-

9

parametric approach through the selection of instrumental variables (Angrist et al., 1996). But, since the identification of these variables is often extremely difficult, the use of unconfoundedness assumption therefore may be a natural starting point after comparing average outcomes for treated and control units to adjust for observable pretreatment differences.

The strong ignorability assumption validates the comparison of treated and control units with the same value of covariates; in fact the average treatment effect (ATE) can be written as: τ = E[Y (1) − Y ( 0)] = E[ E (Y (1) − Y (0) X )]

= E[ E (Y (1) Z = 1, X = x)] − E[ E (Y (0) Z = 0, X = x )] = E[ E (Y Z = 1, X = x)] − E[ E (Y Z = 0, X = x)]

while the average treatment effect on the treated (ATT) formula may be rewritten as follows: τ T = E x Z =1[ E[Y (1) − Y (0) Z = 1, X = x ]] = E x Z =1 [ E[Y (1) Z = 1, X = x]] − E x Z =1[ E[Y ( 0) Z = 1, X = x]] = E x Z =1 [ E[Y Z = 1, X = x ]] − E x Z =1[ E[Y Z = 0, X = x ]] Note that in both τ and τ T , due to the unconfoundedness, what is not known:

10

E[ E[Y (1) Z = 0, X = x]] and E[ E[Y ( 0) Z = 1, X = x]] for τ ,

E x Z =1[ E[Y ( 0) Z = 1, X = x]] for τ T can be substituted with what can be actually observed: E[ E[Y Z = 1, X = x ]] and E[ E[Y Z = 0, X = x]] for τ ,

E x Z =1[ E[Y Z = 0, X = x]] for τ T Typically, there are many background characteristics that need to be controlled for estimating the average causal effect and adjusting the estimation for all these covariates can be actually unfeasible. Propensity score technology, introduced by Rosenbaum and Rubin (1983a), addresses this situation by reducing the entire collection of background characteristics to a single “composite” characteristic that appropriately summarizes the collection. In the following sections, we will focus on common variants of such method.

1.4

Propensity score: definition and properties

Theoretically if the unconfoundedness assumption is valid, the expression for the propensity score can be rewritten as follows: e( X , Y (0), Y (1)) = e( X )

11

Formally the unit propensity score is the conditional probability that a unit be assigned to treatment given pre-treatment variables: e( X ) = p( Z = 1 X = x )

The propensity score is a balancing score, that is, where propensity score is equal, distribution of covariates is the same for treatment and controls, formally we can write (Rosenbaum and Rubin,1983): Lemma 1.1 Balancing of pre-treatment variables given the propensity score X ⊥ Z e( X )

In particular, the propensity score is the coarsest balancing score, i.e., any balancing score b(X) must satisfy the relation e(X) = f(b(X)), for some function f (Rosenbaum and Rubin, 1983a). The key feature of propensity score methodology is that, given the strong ignorability assumption, treatment assignment and the potential outcomes are independent:

Y ( 0), Y (1) ⊥ Z e( X ) and

0 < p( Z = 1 e ( X )) < 1 Thus, adjusting for the propensity score removes the bias associated with differences in the observed covariates in the treated and control group. As a

12

result, given the strong ignorability assumption, if the propensity score e(X) is known, it follows that:

τ = E[(Y (1) − Y ( 0)]

= E[ E (Y (1) − Y (0) e( X )] = E[ E (Y (1) Z = 1, e( X )] − E[ E (Y ( 0) Z = 0, e( X )]

and τ T = E x Z =1[ E[Y (1) − Y (0) Z = 1, e( X )]] = E x Z =1 [ E[Y (1) Z = 1, e( X )]] − E x Z =1[ E[Y (0) Z = 0, e( X )]] where the outer expectation is over the distribution of e(X).

1.5 Matching and propensity score based methods In what

follows we will concentrate on the estimation of ATTs, although

the techniques can be easily modified and used for the estimation of ATE. As already mentioned, the quantity E x Z =1[ E[Y Z = 0, X = x]] which by the uncounfoundedness assumption is used to estimate the unknown quantity

13

E x Z =1[ E[Y ( 0) Z = 1, X = x]] may be computed using different procedures. The most appropriate way would be to use the information about the untreated units, considering eventual differences in terms of observable characteristics between the two sub-populations of treated and untreated individuals. The most common methodologies in use are the regression and matching techniques. The first ones are based on the specification of a model for the outcome variable, for example the simple linear regression or more complex models. However, it is clear that correct specification of the model is crucial for correct causal interpretation. On the contrary, matching techniques do not need any a priori functional form specification between the dependent and independent variables and, in this sense, they are more robust (Rubin, 1973a). We will now describe the most common variants of matching, that, as already mentioned, can be used together with the propensity score.

1.5.1 Matching types

A wide range of literature about matching procedure is available

(see for

example Rubin, 1973a, b, Abadie and Imbens, 2004). These methods match each treated unit/s to control unit/s according to different procedures. In general, we may suppose to have a dataset concerning a population/sample of N units. For each of the N units we observe (Yiobs, Zi, Xi) that, respectively, represent the observed potential outcome, the treatment indicator and the vector of the k covariates. Because ATT is our causal estimand, Yi(1) is observed for every treated unit, whereas Yi(0), the

14

counterfactual outcome, must be somehow estimated. Matching allows to find, in the control group, a value for Yi(0) identified on the basis of the Xi pre-treatment variables. We can define by T0 the untreated unit group, with z(j) the weight given to the unit j and with Ai = {j∈ T0 : Xj ∈ C(Xi)} the subgroup of the untreated units, which

have are used to estimate Yi(0),

following criteria C(Xi). By defining every type of matching with Ai and Zi, we obtain the following definitions:

i)

Exact matching:

Control unit/s with the same observed characteristics of the treated units are sought out: Ai = {j∈ T0 : (Xi = Xj)} The greatest problem concerning this type of matching is given by the possibility that in the control group there is no unit with this type of characteristic. The probability of such an event happening increases with the number of covariates, if covariates are continuous variables and if the sample is not too large.

ii)

Caliper matching:

This type of matching is a generalization of exact matching. Instead of requiring a perfect equality of the covariates, the (treated and untreated) units characteristics

are assumed to be “not too distant”.

This may be formalized as follows: Ai = {j∈ T0 : Xi(m) - Xj(m) < c(m), m=1,i…,k)}

15

In this case the problem is choosing the threshold c(m) for each covariate.

iii)

Nearest neighbors:

This type of method allows us to overcome the multidimensionality problem. In fact, according to this procedure, we may consider suitable metrics to reduce the distance between covariates. An appropriate solution is to choose the unit/s which are nearest, through an appropriate distance function: Ai = { j∈ T0 : minj X i − X

j

}

In this case matching type would depend on the chosen metrics, for example the euclid distance, Mahalanobis distance (Rubin,1980a), variance

covariance

matrix

(Abadie

and

Another solution could be to include in Ai

Imbens,

2002,…etc.

more than one unit,

varying appropriately the weights zij in the estimator.

1.6

Use of propensity score in matching techniques and

matching estimators Matching methods, applied in connection to propensity scores, remove the covariates multidimensionality problem. As previously mentioned, one of the most important propensity score property is, in fact, to be onedimensional summary of multidimensional covariates X, such that when the propensity scores are balanced across the treatment and control groups, the

16

distribution of all the covariates X are balanced in expectation across the two groups. Rosenbaum and Rubin (1983) showed that for a specific value of the propensity score, the difference between the treatment and control means for all units with that value of the propensity score is an unbiased estimate of the average treatment effect at that propensity score, if the treatment assignment is strongly ignorable given the covariates. Thus matching or regression (covariance) adjustment on propensity score tends to produce unbiased estimates of the treatment effects when treatment assignment is strongly ignorable. Here the basic matching techniques (Rosenbaum and Rubin, 1984, Dehejia and Wahba, 1999) and estimation based on the propensity score methods presented.

i)

Stratification matching

According to this method, the propensity score is divided in blocks so that in each layer the covariates are balanced and the assignment to treatment can be considered random. Once the stratification responding to such properties is obtained, treatment effect estimation is carried out through two steps. First, within each interval, we compute the difference between the means of the observed potential outcomes for treated and untreated units (obtaining a conditional effect estimation for that block. Second, we estimate the ATT effect weighting each difference according to treated units distribution inside each block (see Stratification matching estimator formula 1.6.1 section).

ii)

Nearest neighbor matching

17

This matching procedure matches to each treated unit that specific untreated unit that has the nearest propensity score: Ai = { j∈ T0 : minj e ( X i ) − e( X j ) }

with

∑z j ∈ Ai

ij

=1

The control group is represented by just one control unit and the selection is usually made with repetition, so it is possible to match it several times to various treated units. As a result, the number of control units, used for the intervention effect estimation, may be lower than the number of treated units. According to this method, it is possible to match some treated units to control units with a very different propensity score, in that it is the nearest among those singled out. As a result, a minimal distance between the two propensity scores needs to be set up. The group Ai may be considered, however, suitably redefined so as to include more than one neighbour for each treated (number to be defined beforehand).

iii)

Radius matching

Each treated unit is matched to control units with a propensity score interval which is minor or equal to a certain “radius” δ and the number of controls to be used for the Yi(0) identification is not defined: Ai = { j∈ T0 : e(X i) - δ ≤ e(X j) ≤ e(X i) + δ}

with

∑z j ∈ Ai

18

ij

=1

This procedure, compared to the previous one, has two basic differences: some treated units may be rejected because there is no untreated unit with a propensity score within the defined interval, more than one untreated unit can be matched to a single treated unit, as there are more untreated units with a propensity score includes in the interval. The choice of range δ has been made as a compromise between two existing requirements. In fact,

if the range is very

small, some treated units will be missed, but making comparison between “very similar units” will be an advantage; vice versa a wide range will mean a higher number of controls, but these will be “less similar” to the treated units

iv)

Kernel matching

Each treated unit is “matched” to all untreated units (Ai. = T0 ), with weights varying inversely to the distance of their propensity score from treated units propensity score. We use this type of weighting system:

1  e ( x j ) − e( xi )   k h  h  z ij = 1  e( x j ) − e ( xi )   ∑ j∈T0 h k h  where k(.)3 is a density function and h is the bandwidth parameter.

3

For example the Kernel density function:

kj =

 1  e( x ) − e( x )  2  1 1 j i    exp−    h  2π 2 h     

19

1.6.1 Matching Estimators

We list the formulas for the matching estimators introduced in the previous section and their variance: Nearest Neighbor and radius matching The average effect on the treated, applying the nearest neighbor or radius matching method, is equal to the following formula (where n stands for either nearest neighbor or radius matching and the number of units in the treated group is denoted by NT ):

τ = n

with

1 NT

∑z j ∈ Ai





∑ Y (1) − ∑ z Y (0) i ∈T

ij



i

ij i



j ∈ Ai

=1

The variance estimator is assumed to have fixed weights and indipendent outcomes accross units:

1 [ ∑Var (Yi (1)) + ( N T ) 2 i∈T

∑ z Var (Y (0)]

1 1 [ N T Var (Yi (1)) + T 2 2 (N T ) (N )

∑ z Var (Y (0)]

Var (τ n ) =

=

=

1 1 Var (Yi (1)) + T 2 NT (N )

j ∈T0

j ∈T0

2 j

2 j

∑ z Var (Y (0) j ∈T0

2 j

20

j

j

j

where T0 denotes the selected control sub-group applying the matching procedure

z j = ∑i z ij . Standard errors are obtained

analytically using the above formula, or using the bootstrap method, even if this last point appears to be controversial for nearest neighbor matching, since standard errors seem to be inconsistent according to this procedure (Imbens, 2004).

Stratification matching: By construction, the propensity score is divided in blocks so that in each layer the covariates are balanced and the assignment to treatment can be considered random. As a result, the difference between the means of the observed potential outcomes for treated and untreated units, is equal to:

τ qs =



i∈ q

Yi (1)

N 1q





j∈ q

Y j (0)

N q0

where N 1q and N q0 denote the number of treated and control units inside each block q. The estimator of the ATT is computed weighting each differences τ qs according to treated units distribution in each block. Q

τ = ∑τ s

q =1

s q

N 1q NT

21

where Q is the number of layers and NT is the total treated units. Assuming independence of outcomes across units, the variance τ s is computed by: Q N1 N1 1 q q 0 Var (τ ) = [Var (Yi (1)) + ∑ Var (Y j )] 0 NT N N q =1 T q s

Standard errors are obtained analytically using the above formula, or using the bootstrap method. Kernel matching: The kernel matching estimator is given by:

τk =

1 NT

∑ [Y (1) − ∑ z Y (0)] i ∈T

i

j∈I 0

ij

j

where z ij is computed by the formula: 1  e( x j ) − e( xi )   k h  h  zij = 1  e( x j ) − e( x i )   ∑ j∈I0 h k h  

In this case standard errors are easy to obtain using bootstrap method.

1.7

Alternative estimation methods

Alternatives to matching methodologies are outlined in this paragraph. We will focused on the Difference in Difference and Heckman selection model.

22

DID methods for estimating causal effect of policy interventions are widely used in economics, in particular when outcomes are measured in both the treatment and control group before and after the policy intervention. In the standard DID model we have N individuals (usually random sample from the population), observed in time periods Ti = (t-m),…(t-1),(t),(t+1)…(t+k), with (t-m),…(t-1) and (t+1)…(t+k) denoting the pre and post - policy intervention period, respectively, while the error terms ε i are assumed to be additive and constant over time. To account for time trends unrelated to the treatment, the change experienced by the group subject to the intervention (treatment group) is adjusted by the change experienced by the nobeneficiary group (control group). Meyer (1995), Angrist and Krueger (2000), Blundell and MaCurdy (2000) describe many applications of this methodology. In the field of Program Evaluation, the

difference in

difference method (Moffit, 1991; Heckman and Robb, 1985) involves the use of panel data to better define the control group and reduce the selection bias effect. A great number of observational units: Yi,t-1 , Yi,t-2 , Yi,t-3 … Yi,t-m , before programme intervention at time (t) - can be (potentially) considered in the model. This means that we can highlight existing systematic differences between the treated and untreated groups. Taking into account these differences allows us to obtain unbiased treatment effect estimation, since they could influence the outcome value independently to the program. It is important to underline that a greater number of observations, before program treatment, that take into consideration the differences related to the temporal trends before policy actuation, certainly improving the estimate of the unobservable conterfactual measure.

23

Note that, the interpretation of the standard DID estimator depends on the assumptions about treatment effect with respect to the individuals. It is, in fact, often assumed to be constant across individuals, but more generally the effect of the intervention might differ across individuals, then the standard DID estimator gives the average intervention effect on the treatment group. Recently Imbens and Athey (2005) proposed a different approach from the standard DID method. They allow the effects of both time and intervention differ systematically across individuals (e.g, we can think about new medical technology that differentially benefits sicker patients). The setting considered in their research is that of repeated cross-sections4 of individuals observed in a treatment group and a control group, before and after the treatment. They propose an estimator for the entire counterfactual treatment effect distribution on the treatment group as well as the treatment effects distribution on the control group, where the two distributions may differ from each other in arbitrary ways. First they propose a new model that relates

outcomes

to

an

individuals’

group,

time

and

unobservable

characteristics. Groups can differ in arbitrary ways (and, in particular, the treatment group might have individuals who experience a high treatment benefit). In DID method the mean of individual outcomes in the absence of the treatment can vary by group and by time. In contrast, in their model, time periods and groups are treated asymmetrically. Second, they provide conditions to identify the model in a non parametric way, proposing an estimation strategy based on the identification method. They use the entire control group outcome distributions pre and post intervention to make a non parametric estimation of the change occurred on the group. Assuming that the outcomes distribution in the treatment group would be the same (that is, 4

But they apply their model also to panel data.

24

with the same change), they estimate the counterfactual distribution for the treatment group in the second period. They compare this counterfactual distribution to the actual post-intervention distribution for the treatment group, yielding an estimate of the treatment effects distribution for treated units. Using a similar strategy

they define the treatment effect on the

control units. In other words, to figure out what would have happened to a treated unit in the first period with outcome Y, they look at units in the first period control group with the same outcome Y. Under weak monotonicity assumption, the distribution of their second period outcomes is possible to be derived, using that to obtain the counterfactual distribution for the second period treated units with no policy intervention. In this way it is possible to evaluate a range of economic questions suggested by policy analysis, such as, for example, which part of the distribution benefits most from a policy intervention, always basing on a consistent economic model of the outcomes. The proposed CIC model has many advantages. It allows the distribution of unobservable characteristics to vary across groups in arbitrary ways. It allows for changes of the distribution outcomes, both in mean and variance, over time and without a policy intervention. Moreover, it is possible to estimate the effects of a policy on the mean and variance of the treatment groups distribution relative to the original time trend in these moments. It is clear that the DID model is assumed to be a special case of the change in change model. One common worry (Besley and Case, 2000) is that the effects identified by DID may not be correct if the policy occurred in a “field” that derives atypical benefits from the policy intervention. It implies that the treatment group may differ from the control group not just in terms of the outcomes distribution in the absence of the treatment, but also in the effects of the

25

treatment. Athey and Imbens’ model allows for both of these eventualities across groups, because they allow the effect of the treatment to vary by unobservable

characteristics

of

an

individual

and

the

unobservable

distribution varies across groups.

Another model that is usually used to remove the hypothesis of selection on observable (unconfoundedness assumptions) is the Heckman selection model (Heckman, 1974) which can be specified in its simplest form as follows: Yi = β 0 + β1 X i + β 2 Z i + ε 1i Z i* = γ 0 + γ 1 X i + ε 2i

that is, the model includes latent dependent variables models. Yi is the outcome and Z i* is latent variable underlying the treatment indicator Z. X is a matrix ((N1 + N0 )×h) with h equal to the number of characteristics constant over time, for the i.th unit, before policy intervention. The errors components ε 1 , ε 2 are assumed to be jointly bivariate normally distributed conditional on X, with zero mean vector and variance matrixΣ , so that:

ε 1 ≈ N ( 0, Σ) , ε 2 ≈ N (0, Σ) with corr (ε 1 , ε 2 ) = ρ It is possible to remove the bivariate normality assumption of the errors in the following cases: maintaining the monotonicity assumption with the

26

availability

of

an

instrumental

variable

(semiparametric

Heckman’s

selection models, Deaton(1989), Hausman and Newey (1995)) or, if an instrumental variable is not available, introducing non parametric bounds (Lee, 2005). However, most of the recent studies, aimed to develop semiparametric versions of selection models (Newey and Vella, 2003), while keeping some of the previous assumptions: Powell (1987), Newey (1988), Ahn and Powell (1993) and Honore and Powell (2001).

27

2 2.1

Multivalued treatment

Introduction

The Rubin causal model is usually presented for binary treatments, although in principle, in many cases of interest, the treatment takes on more than two values. There are many examples of that: we can think about drug applied in different doses or a treatment applied over different time periods, as well as labour market programmes that need a more complex framework including the actual choice set of individuals, certainly characterized by more than two options. Anyway, in all these cases, the standard propensity score methodology must be modified in a non-trivial way. As a consequence, methods have been developed in order to extend the conventional two treatments framework to allow for estimation of average causal effects with multiple mutually exclusive treatments. Imbens (1999) and Lechner (2000) gave, with this respect, the major methodological contributions. They refine identification using strong and weak unconfoundedness assumptions for the case of more than two treatments. In the following sections we present and compare both approaches.

2.2

The basic framework.

In order to extend propensity score application from binary treatment to arbitrary treatment regimes, we report the basic assumptions available in the first case that we can usefully generalize also in multiple treatment. Let’s

28

summarize the conventional Rubin Causal Model. We have a binary treatment, that is Zi ∈ {0,1}1 . Associated with each unit i = 1,2…N and each value of the treatment z, there is a potential outcome Yi(z). We are interested in the average outcome, E[Y(z)] and particularly in the average causal effect of exposing units to treatment or not: E[Yi(1) - Yi(0)]. A key assumption, that we will now restate for the identification of causal effect is the uncounfoundedness assumption in its two strong and weak forms. Assumption 2.1

Strong unconfoundedness assumption

Assignment to treatment Z is strongly ignorable, given pretreatment variable X, if

{Y ( z )}z∈{0,1} ⊥ Z X In order to redefine the weaker version of unconfoundedness, we define Di(z) to be the indicator, for unit i, of receiving treatment z.

1 if  Di ( z ) =  0 

Zi = z otherwise

As a result, weak unconfoundedness assumption is defined in the following way: Assumption 2.2

1

Weak unconfoundedness assumption

See assumptions in previous section: Potential outcomes.

29

Assignment to treatment Z is weakly ignorable, given pretreatment variable X, if Y ( z ) ⊥ D ( z ) X for all Zi ∈ {0,1}.

As we can see, Rosenbaum and Rubin show how strong unconfoundedness requires the treatment Z to be independent of the entire set of potential outcomes,

while

weak

unconfoundedness

implies

only

pairwise

independence of the treament indicator with each of the potential outcomes. Moreover, weak unconfoundedness requires a local independence of the potential outcome Y(z) with respect to the considered treatment level. This means independence of the level indicator D (z ) , rather than of the entire vector of treatment values Z. In the binary treatment case, first and second condition are obviously the same thing. It is clear that the importance of the two ignorability assumptions versions is strictly related to what we are interested in estimating. Particularly, the weak unconfoundedness concept is linked to the missing data problem of causal inference. More often the concern is, infact, with the average of

Yi(z) in the sub-sample with

Di ( z ) = 1 . As a consequence, units with Di ( z ) = 0 did not receive treatment level z and the other potential outcomes Yi(0) are never observed for the units with Di ( z ) = 1 , so that they can play no role in any adjustment for differences procedures by defining subpopulations. This lack of relevance is well reflected by weakly ignorable assumption. In addition we report the following Lemma 2.1 and Lemma 2.2. Let be e( X ) the propensity score in binary treatment case, we have:

30

Lemma 2.1 Balancing property of pre-treatment variables given the propensity score (Rosenbaum and Rubin, 1983) Z ⊥ X e( X )

Lemma 2.2 Weak unconfoundedness given the propensity score with binary treatments (Rosenbaum and Rubin, 1983) Y ( z ) ⊥ D ( z ) e( X )

for all Zi ∈ {0,1}

According to this result, it is sufficient to condition on the propensity score instead of the entire set of covariates (Imbens,1999). Formally, we also report the following Theorem that will be useful in section 2.4, in order to introduce the average treatment effect estimation in multivalued treatment case. Theorem 2.1 Adjustment for propensity score given weak unconfoundedness assumption: i) µ ( z, e) ≡ E[Y ( z ) e( X ) = e] = E[Y ( z ) Z = z , e( X ) = e] ii) E[Y ( z )] = E[ E[ µ ( z , e( X ))] for all Zi ∈ {0,1}.

2.3

Multiple treatment

From now on, we allow the treatment variable to take on integer values between 0 and k. Let T be the treatment variable in the multi-valued case, so that T = {0,1,....k } and Xi the set of covariates such that X ∈ χ . It is

31

assumed that each individual i = 1,2…N

is assigned to one specific

treatment. We are interested in the population average treatment effect and, particularly, in the average causal effect of exposing units to treatment t or to treatment s, that is: ATEts = E[Y (t ) − Y ( s )] which denotes the ATE of the treatment t relative to treatment s for a participant drawn randomly from the population. The average effect of treatment t relative to treatment s, for the sub-population having received treatment level t only, can be defined as follows: ATTts = E[Y (t ) − Y ( s ) T = t ]

Imbens and Lechner refer to different versions of unconfoundedness assumptions according to the type of treatment effect that is needed to

identify

and

estimate.

The

following

weak

ignorability

assumptions can be introduced: Assumption 2.3

Weak

unconfoundedness

given

pre-treatment

variables X (version 1) (Imbens 1999) Y (t) ⊥ D(t) X

Assumption 2.4

∀ t ∈T Weak ignorable assumption (version 2)

Y ( s ) ⊥ D ( t ), D ( s) X

32

Assumption 2.5

Strong ignorability assumption (Lechner, 2000)

Y ( t ), Y ( s ) ⊥ T X = x

Assumption 2.6 Weak ignorability assumption (Lechner, 2000) Y ( s ) ⊥ T X = x, T ∈ {s, t }

Synthetically we report the average treatment effects that can be identified under each of the previous assumptions: ATEts = E[Y (t ) − Y ( s )] according to the assumption 2.3 and assumption 2.5 ATTts = E[Y ( t ) − Y ( s ) D ( t ) = 1]

according to the assumption 2.4 and assumption 2.6 Again, there are many background characteristics that need to be controlled for estimating the average causal effect and adjusting the estimation for all these covariates can be unfeasible. In this sense, the introduction of a Propensity score generalized to arbitrary treatment regimes results very useful since the propensity score summarizes the information on the background characteristics in an appropriate single summary score. As a consequence, we need to modify the standard definition of propensity score, to allow for the implementation of a generalized propensity score (Imbens, 1999): Definition 2.1

Generalized propensity score

33

The Generalized propensity score (GPS) is the conditional probability of receiving a particular level of the treatment given the pre-treatment variables: r (t , x) ≡ Pr(T = t X = x ) = E[ D (t ) X = x]

According to this notation, the propensity score in the binary treatment is equivalent to: e( x) = r (1, x )

Hence, i) the GPS defines a single random variable as a transformation of the two random variables T and X: r(T,X); ii) it defines a family of random variables indexed by t as a transformation of X alone: r(t,X) for all t ∈ T . The GPS also satisfies the balancing property, like the conventional propensity score: Lemma 2.3

Balancing property of the Generalized Propensity

Score D (t ) ⊥ X r (t , X )

for all t ∈ T .

Proof (Imbens, 1999) First we have Pr[ D (t ) = 1 X , r (t , X )] = E[ D (t ) X ] = r (t , X )

in fact by definition

34

r (t , X ) = E[ D (t ) X ]

Second Pr[ D(t ) = 1 r (t , X )] = E[ D(t ) r (t , X )] = E[ E[ D(t ) X , r (t , X )] r (t , X ] Hence Pr[ D (t ) = 1 X , r (t , X )] = Pr[( D (t ) = 1 r (t , X )] ,

that is, conditionally on r(t,X), the treatment indicator D(t) and the pretreatment variables are independent. It is important to note that the conditioning argument changes according to the level of treatment. As a result, to guarantee conditional independence of the multi-valued treatment T

T and covariates X, we need to condition on the entire set of {r (t, X )} t∈ . It T

is only in the binary treatment case that conditioning on {r (t , X )} t∈

is

identical to conditioning on a single score e(X). As a result, all previous unconfoundedness assumptions can be re-written given the generalized propensity score definition. In fact, if strong or weak ignorability assumptions given the covariates are available, then: Theorem 2.2 Weak unconfoundedness given GPS (Imbens, 1999) Suppose assignment to treatment T is weakly unconfounded given pre-treatment variables X (version 1), then: Y ( t ) ⊥ D ( t ) r (t , X )

∀t ∈ T

Proof

35

Pr[( D(t ) = 1Y ( t ), r (t , X )] = E[ D(t ) Y (t ), r (t , X )] = E[ E[ D(t ) Y ( t ), X , r (t , X )] Y (t ), r (t , X )] = E[ r (t , X ) Y ( t ), X , r (t , X )] = r (t , X )

Moreover, as shown in the proof for Lemma 2.3,

Pr[ D(t ) = 1 r (t , X )] = r (t , X ) . Hence, Pr[ D (t ) = 1Y (t ), r ( t , X )] = Pr[ D ( t ) = 1 r (t , X )] ,

so, conditionally on r(t,X), D(t) and Y(t) are independent. Assumption 2.7 Weak unconfoundedness given GPS (version 2) Y ( s ) ⊥ D ( t ), D ( s ) r (t , X ), r ( s , X )

According to Lechner’s approach we can re-write the previous assumptions 3 and 4, given the pre-treatment variables, in the following way: Assumption 2.8

Strong unconfoundedness given GPS (Lechner,

2000) If

Y ( t ), Y ( s ) ⊥ T X = x

and 0 < Pr(T = j X = x) < 1 hold for

∀x ∈ χ and for ∀j = 0,1...t , s ,....k ,

It follows that

36

Y (t ), Y ( s) ⊥ T [Pr(T = t X ) = Pr(T = t X = x), Pr(T = s X ) = Pr(T = s X = x),..., Pr(T = k X ) = Pr(T = k X = x)]

Assumption 2.9

Weak unconfoundedness given GPS (Lechner,

2000) If Y ( s ) ⊥ T X = x, T ∈ {s, t }

and 0 < Pr(T = j X = x) < 1 hold for

∀x ∈ χ and ∀j = t , s

It follows that

Y (t), Y (s) ⊥ T [Pr

s t ,s

( X ) = Pr

s t ,s

( x ), T ∈ {t , s}]

where Pr(T = s T ∈ {t , s }, X = x ) =

2.4

Pr(T = s X = x) Pr(T = s X = x) + Pr(T = t X = x)

.

Implementation of the GPS in multi-valued treatments.

Since the GPS has analogous properties to the propensity score used in binary treatment, we now apply it instead of the covariates, in order to obtain the ATEts and ATTts estimations. In the binary treatment case, the propensity score is computed using a logistic regression. In the multi-valued case could be applied multinomial logit or nested logit models (with ordered levels of treatments in the second case, for example the dose of a drug or time over which a treatment is applied, …etc). Given the generalized propensity score, we can compute the average outcomes estimation by

37

conditioning solely on the GPS. As a result, according to Theorem 2.2 and imposing smoothness of the expectation function if appropriate, the conditional expectation of the outcome can be estimated (Imbens, 1999), given the treatment t and the probability of receiving the treatment actually received, applying the following Theorem: Theorem 2.3 Estimation of Average Potential Outcomes given the generalized propensity score, supposing assignment to treatment weakly unconfounded given the pre-treatment variables. Then i)

β (t , r ) = E[Y ( t ) r ( t , X ) = r ] = E[Y T = t , r (T , X ) = r ]

ii)

E[Y ( t )] = E[ β (t , r ( t , X ))]

by iterated expectations

for all t ∈ T . Proof The proof concerns part i), since part ii) follows by applying iterated expectations E[Y T = t , r (T , X ) = r ] = E[Y ( t ) T = t , r (T , X ) = r ] E[Y T = t , r ( t , X ) = r ] = E[Y (t ) D ( t ) = 1, r (t , X ) = r ]

which by weak unconfoundedness assumption is equal to E[Y r (t , X ) = r ]

38

Note that to obtain the population average value (which, as we will show in the continuous case, is the causal effect estimation) we need to apply iterated expectations on β (t , r ) , i.e E[Y ( t )] = E[ β (t , r ( t , X ))] . We can consider the subpopulations obtained as strata of the population applying the GPS. In particular, let Y(t) be the average value for units with treatments t and r(T,X) = r, this is an unbiased estimate of the average Y(t) for the subpopulation with T = t and r(t,X) = r. The reason is that the former subpopulation with r(T,X) = r is the same as the latter one with r(t,X) = r. As a result, the average of Y(s) for units with T = s, in the same subpopulation with r(T,X) = r, is unbiased for the average of Y(s) in a different subpopulation with r(s,X) = r (that is, with a different set from subpopulation with r(t,X) = r ). Hence no causal comparison can be possible within the subpopulation defined by r(T,X) = r and the regression of the observed Y on the treatment level T and the GPS r(T,X) = r has no causal interpretation. Formally consider the difference β (t , r ) − β ( s, r ) = E[Y ( t ) T = t , r (T , X ) = r ] − E[Y ( s ) T = s, r (T , X ) = r ]

by weak ignorability assumption (version 1) this is equal to E[Y ( t ) r (t , X ) = r ] − E[Y ( s ) r ( s , X ) = r ]

but there is no causal interpretation for the comparison conditional on the GPS value, because the conditioning sets differ:

{x r (t , x ) = r } ≠ {x r ( s , x ) = r } 39

In order to obtain a causal interpretation, we need to condition the difference to the intersection of the two conditioning sets, that is: E[Y ( t ) T = t , r (t , X ), r ( s, X )] − E[Y ( s ) T = s, r (t , X ), r ( s , X )] E[Y ( t ) − Y ( s ) r (t , X ), r ( s, X )]

But, if the researcher is interested in the dose-response of a specific subpopulation or in the average effect of a specific treatment versus another one, the average should be computed over the distribution of the pretreatment variables in that particular sub-population. For example, we can estimate the expected (average) effect of treatment t relative to treatment s for the sub-population having received treatment level t only. In particular, according to the weak unconfoundedness (version 2), the ATTts is supposed to be equal to: ATTts = E[Y ( t ) − Y ( s ) D ( t ) = 1] = E[Y (t ) D (t ) = 1] − E[Y ( s ) D (t ) = 1] = E[ E[Y (t ) D ( t ) = 1, X ]] − E[ E[Y ( s ) D ( t ) = 1, X ]]

that by weak unconfoundedness is equal to: EX

D( t ) =1

[ E[Y (t ) D(t ) = 1, X ]] − E X D (t ) =1 [ E[Y ( s ) D( s) = 1, X ]]

and given the generalized propensity score, we can rewrite it as follows:

40

Er ( t, x) D( t )=1[ E[Y (t ) r (t , X ) = r ]] − E r ( s, x ) D (t ) =1[ E[Y ( s) r ( s, X ) = r ]] where the outer expectation is over the treated units having received treatment level t. Remember that, since the treatment can take on more than two values, it is important to be sure that there is sufficient overlap in the distribution of pre-treatment variables by treatment of interest. The procedure is to compare for each value of t the univariate distribution r (t , X ) conditional on T = t with the same distribution with T ≠ t. If the two

distributions are similar, then all adjustment methods can be well performed. Of course, other types of procedures can be applied in order to obtain the average treatment effects estimation. For example, we can use matching techniques, assuming that each individual is assigned to one specific treatment and that, for any participants, only one component of T

{Y (t )} t∈ can be observed, while the remaining outcomes represents the counterfactuals units. We introduce a pairwise comparison of the treatments t and s according to the following equations (Lechner, 2000): ATEts = E[Y (t ) − Y ( s )] = E[Y (t )] − E[Y ( s )] that denotes the ATE of the treatment t relative to treatment s for a participant drawn randomly from the population. Note that ATEts can be re-written in the following way: K

ATEts = E[Y (t ) − Y ( s )] = ∑ [ E(Y (t ) T = j ) −E (Y ( s) T = j )] P(T = j ) j =1

The strong unconfoundedness condition identifies all counterfactuals:

41

E (Y ( t ) T = j )

and E (Y ( s ) T = j ) , because it implies E (Y ( t ) X = x, T = j ) = E(Y (t ) X = x, T = t ) and E (Y ( s ) X = x, T = j ) = E(Y ( s ) X = x, T = s) ∀ j = 0,1,..k , .

As a result, ATEst , ATEts ATTts , ATTst , are identified. The expected effect for an individual randomly drawn from the population of participants in treatment t only is, instead, equal to: ATTts = E[Y ( t ) − Y ( s ) T = t ] = E (Y (t ) T = t ) − E (Y ( s ) T = t ) 2

The weak unconfoundedness condition identifies only the counterfactual E (Y ( s) T = t ) that is needed to compute the ATTts. Note that this last

assumption is derived from the independence and assignment in population that implies independence in any subpopulation defined by treatment participation categories. However, a stronger ignorability assumption of treatment assignment (with respect to assumption 2.5) can be also adopted for arbitrary treatment regimes, in order to model T without conditioning on potential outcomes (Van Dik and Imai, 2003). We postpone a discussion on the generalization of the propensity score, under strong ignorability assumption, in the continuous treatment case, also comparing Van Dik’s approach (2003) with respect to Hirano and Imbens’ elaboration (2004) of the propensity score method applied for the treatment effect estimation.

2 It is evident that, if the participants in treatments t and s differ in a non-random way, this can influence the outcome values: ATTts ≠ ATTst , that is to say they are not symmetric.

42

3 3.1

Continuous treatment

Introduction

We showed how, under specific assumptions, like the strong ignorability treatment assignment, multivariate adjustment methods based on the propensity score have the property of reducing the bias that arises in observational studies. In this project we implement an extension of the propensity score method in a setting with a continuous treatment, that is we refer to the generalized propensity score already introduced in multiple treatment case. We make an unconfoundedness assumption (Rosenbaum and Rubin, 1983) and adjust for the Generalized Propensity Score (function of the covariates) to remove all bias associated with differences in the covariates. The Generalized Propensity score is just a generalization of the binary treatment propensity score (Imbens and Hirano, 2004; Van Dik and Imai, 2003), with many of its characteristics and balancing property which are essential to assess the right specification of the score.

We proceed to the estimation and inference of

the causal effects of interest in a parametric way (even if a non parametric version is possible). We apply this methodology to the public contributions (treatment variable) supplied to the Piedmont enterprises, during years 2001 - 2003 . Due, infact, to the variety of funds set by public policies, the treatment turns out to be a continuous variable. We are interested in the effect of the amount of contribution on occupational level. We estimate the average effect of the contribution adjusting for the difference

in

background

characteristics

43

using

the

propensity

score

methodology and compare the results to conventional regression based methods. According to the empirical evidence (Dehejia and Wahba 1999; Imai 2004) the former methodology often leads to more robust results than the latter one or other estimation methods, such as DIDor selection model presented in section 1.7.

3.2

Framework

We consider a sample of units i=1,2,…,N and, for each unit, we have a set of potential unit-level outcomes Yi(t) for t∈ τ. In the binary treatment τ = {0,1}, but in the continuous case we have τ ⊂ [t 0,t 1 ]. We are interested in the average dose-response function µ(t)=E[Yi(t)], in correspondence with the observed vector of covariates Xi and the level of the assigned treatment t [i.e Yi = Yi(t)]. We assume {Y(t)}t∈τ , T, X defined in a common probability space, T continuously distributed with respect to Lebesgue measure on τ and Y = Y(T) a well-defined random variable; i.e Y(.) suitable measurable. We are interested in the estimation of average causal effects, which can be computed through the dose-response function µ(t) and in particular in the ATE and ATT, such as: ATE ?t,t = E[Y(t + ? t) − Y(t)] ATT?t,t = E[Y(t + ? t) − Y(t) t ]

44

that is, in the continuous treatment case we can be interested in marginal treatment effect estimation, for example with respect to a specific treatment level t. Imbens and Hirano (2004) generalize the uncounfoundness assumption available for binary treatment (Rosembaum and Rubin 1983) to the continuous case and crucial for the estimation of the above quantities. Assumption 3.1 Strong ignorability assumption of treatment assignment (Van Dyk and Imai, 2003)

{Y (t ) t∈τ } ⊥ T X Assumption 3.2

Weak unconfoundedness assumption (Imbens and

Hirano, 2004) Y(t) ⊥ T|X for all t ∈ τ. Assumption 3.2 requires a conditional independence for each value of treatment t ∈

[t 0,t 1 ] and not joint independence of all potential outcomes

{Y (t ) t∈τ } . As already underlined, there are many characteristics that need to be controlled for the average treatment effect estimation. The introduction of a generalized propensity score reduces the entire collection of background characteristics

to

a

single

“composite”

variable

that

appropriately

summarizes them. Here the GPS definition: Definition 3.1

Generalized Propensity Score

Let r(t,x) be the conditional density function of the treatment given the covariates

45

r(t,x) = f T|X (t|x) such that R(T,X) and r(t,X), for every t ∈ τ, are well-defined random variables. The conditional distribution f T|X(t|x) must be modeled and its unknown parameters must be estimated using, for example, maximum likelihood method1 . Misspecification of the model for the propensity score is possible and generally leads to biased causal inference estimation. Hence, care must be taken to identify as many covariates as possible, as well as to check for model misspecification (Drake, 1993). The generalized propensity score can be also defined through a propensity function: f (T

X )ψ

(t x ) = rψ (T , X )

where its distribution is assumed to be parameterized by ψ (Van Dik and Imai, 2003). Under these analytical framework, it is possible to derive theoretical results which extend those in Rosenbaum and Rubin (1983b). Dik and Imai (2003) show the propensity score is a balancing score even with a non binary treatment, so that it could be applied to arbitrary treatment regimes, also reducing the dimensionality of X enough to allow for the application of efficient estimation techniques. Formally we have: Lemma 3.1 Balancing of pre-treatment variables given the generalized propensity score (Van Dyk and Imai, 2003)

1

We can think to the normal

distribution for the treatment given the covariates

Ti X i ∼ N (h( X i ; β ), σ 2 ) where β is the parameter vector, h( X i , β ) is a known function of the

covariates which depends on the parameters β to estimate and σ2 is the unknown common variance of the errors.

46

Within strata with the same value of r(t,X) , the probability that T = t does not depend on the value of X: X ⊥ 1{T=t}| r(t,X) This definition does not require unconfoundedness. The following theorem establishes that the potential outcomes and the treatment assignment are conditionally independent given the generalized propensity score. Formally we write: Theorem 3.1

Strong unconfoundedness given the Generalized

Propensity Score (Van Dyk and Imai, 2003) f ({Y (t ) t ∈τ } T , r (., X )) = f ({Y (t ) t ∈τ } r (., X ))

Proof (Van Dyk and Imai, 2003) Theorem 3.2

Weak unconfoundedness given the Generalized

Propensity Score (Van Dyk and Imai, 2003) If assignment to the treatment is (weakly) unconfounded given the pre-treatment variables X, then: f T (t| r(t,X), Y(t)) = f T (t|r(t,X))

for each value of t.

Proof (Van Dyk and Imai, 2003) In other words, if the balancing hypothesis of Lemma 3.1 is satisfied, observation with the same GPS must have the same distribution of observable characteristics, independently of treatment’s value. So, just like

47

for the standard propensity score, exposure to treatment is random and treated and control units should be on average identical. Hence, having the generalized propensity score equivalent properties to the propensity score for binary treatment, it can be applied, instead of covariates, as one dimensional

score

summarizing

the

information

on

the

background

characteristics, so leading to more efficient average treatment effect estimations. The difference, here, is that the conditional density of the treatment level at t corresponds to the evaluation of generalized propensity score at the same t: this implies as many propensity scores as levels of treatment to use each at one time. In particular, using GPS in connection to smoothing techniques we have: Theorem 3.3 Bias removal with Generalized Propensity Score (Imbens and Hirano, 2004). Suppose that the assumptions of Theorem 3.3 are satisfied, then: B(t,r)=E[Y(t) | r(t,X) = r] = E[Y | T = t, R = r]= B(t,r) µ(t) = E[B(t,r(t,X)]= E[E[Y(t) | r(t,X) ]]=E[Y(t)]

(by iterated

expectations) Theorem 3.3 implies that, in order to estimate the dose-response function u(t), First, we must estimate the conditional expectation of the outcome, E[Y | T = t, R = r], is estimated as a function of a specific level of the treatment T = t and of a specific value of GPS R = r. Second, the dose-response function, µ(t) = E[B(t,r(t,X)], is estimated averaging the conditional expectation over the score r(t,X), evaluated at a certain level of the treatment t. As already underlined, it should be clear that B(t,r) does not have a causal

48

interpretation. We, infact, need to average the conditional expectation over the marginal distribution r(t,X), E[E[Y(t) | r(t,X) ]], to estimate the causal effect. Proof of Theorem 3.3 (Imbens and Hirano, 2004): Let f y ( t) T , r ( t, X ) ( y t , r ) represent the conditional density of Y(t) given T = t and r(t,X) = r. Then applying the Bayes rule and Theorem 1 we get:

f y ( t ) T , r( t , X ) ( y t , r ) =

f T (t Y (t ) = y, r ( t , X ) = r ) f Y ( t ) r( t , X ) ( y r ) f T (t r (t , X ) = r )

= f Y (t ) r (t , X ) ( y r ) So we can write

E[Y ( t ) T = t , r (t , X ) = r ] = E[Y (t ) r (t , X ) = r ] but also we have

E[Y ( t ) T = t , R = r ] = E[Y (t ) T = t , r (T , X ) = r ] = E[Y ( t ) T = t , r (t , X ) = r ] = E[Y ( t ) r ( t , X ) = r ] = β ( t , r ) Hence E[Yi (t ) r (t , X i ) = r ] = β (t , r ) that is part (i) of Theorem 3. Then we have E[ β (t , r ( t , X ) = r ] = E[ E[Y (t ) r (t , X )]] = E[Y (t )] = µ (t ).

49

Moreover, supposing to be interested in marginal treatment effect estimation with respect to treatment level t, we can write: ATE? t ,t = E[Y ( t + ? t ) − Y ( t )] = E[Y (t + ? t )] − E[Y (t )]

that denotes the ATE of the treatment (t + ∆t ) relative to treatment (t) for a participant drawn randomly from the population N. Another quantity of primary interest is represented by the treatment effect estimation ATT, in a specific sub-population:

ATT? t ,t = E[Y (t + ? t ) − Y (t ) t] = E[Y (t + ? t ) t ] − E[Y (t ) t ] = EX T=t [E[Y(t + ?t )T = t, X ]] − E X T =t [E[Y(t) T = t, X ]] that, by weak unconfoundedness, is equal to:

E r ( t + ? t , X ) T = t [ E [Y (t + ? t ) r (T = t + ? t , X )]] − E r ( t , X ) T =t [ E [Y (t ) r (t , X )]] The ATT∆t, t denotes the expected effect for an individual randomly drawn from the population of participants having treatment level t, while r(t,X) is measurable with respect to the sigma-algebra generated by X. Imbens’ procedure (2004) for the dose-response estimation - according to the previous assumptions and Theorems – is based on the regression on the propensity score technique. We will apply an extension of it since it represents a valid strategy if implemented in empirical study. A method of using the propensity score is to estimate the conditional expectation of Y given T and r(t,X). First the GPS is estimated through the conditional

50

distribution of the treatment variable given the covariates, assuming a specific functional form, for example a normal linear model2 :

Ti X i ∼ N (h ( X i ; β ), σ 2 ) or

log Ti X i ∼ N ( h( X i ; β ), σ 2 ) with the estimated GPS equal to: ∧

gps = φ (Ti ; X i )

To verify whether this specification is suitable, we investigate how it affects the balance of the covariates. Hence, we first divide the range of the treatment in an arbitrary number of intervals, we then define further blocks of the GPS, for a specific r = (t ; X i ) - computed at a certain treatment level. Then, we examine the balancing for each covariate, testing whether the mean in one of the treatment groups is different from the other treatment groups combined, inside each GPS block . We make this for each treatment interval with respect to the others groups, computing the t-tests for each covariate and treatment interval. However, the precise steps of the GPS implementation will be shown in the next chapter, in our empirical case study. After having specified and estimated the GPS, we need to model the conditional expectation of the outcome on the treatment variable and the

2

We may use more general models such as mixture of normals or heteroskedastic normal distributions, with the variance considered as a parametric function of the covariates.

51

score, E[Y | T = t, R = r] , as a flexible function of its two arguments3 . For example, we can use a linear regression or a quadratic approximation, such as:

E[Y Ti , Ri ] = β 0 + β1Ti + β 0Ti2 + β 3 Ri + β 4 Ri2 + β 5Ti Ri We estimate the parameters of the model , e.g by ordinary least squares, using the estimated GPS Rˆ i among the regressors. Hence, we estimate the ∧

average potential outcome at treatment level t: E[Y ( t )] , doing this for each level of the treatment we are interested in, to get an estimate of the entire dose-response function4 .



N





∧ ∧

∧ ∧

∧ ∧

E[Y (t )] = ∑ β 0 + β1 t + β 0 t 2 + β 3 r ( t, X i ) + β 4 r (t, X i ) 2 + β 5 r (t, X i ) t i =1

In the last step we need to average the estimated regress function over the score function in correspondence of the desired level of t. Rather than referring to the dose-response function, we can report its derivatives. In economics, this represents the marginal propensity (Imbens and Hirano, 2004) with respect to what we are interested in estimating. As we will show in our observational study, this will be a useful and an alternative strategy to estimate the dose – response, allowing for computing estimates at a specific treatment level as well as comparing the marginal propensities at different levels of intervention. 3

Remember that there is no causal interpretation for the conditional expectation of the outcome. 4 It is convenient to use bootstrap methods to compute standard errors and form confidence intervals.

52

3.3

The sub-classification procedure

The GPS can be used also with sub-classification and matching procedure, although they are usually more cumbersome than in the binary treatment case. Van Dik and Imai (2003) implement analysis techniques mostly based on sub-classification and able to balance a high-dimensional covariate adjusting for a low-dimensional propensity score. In sub-classification technique, first they model the conditional distribution of the treatment given the covariates

f ψ T X (t x ) , where ψ

parameterises the distribution.



Second they compute ψ of ψ that represents the parameters estimation. As a result, the parametric model defines the generalized propensity score as ∧

follows: r ψ∧ (t , X ) 5



.

Third they compute r ψ∧ (t , X ) for each observation and ∧

sub-classify observations with the same or similar values of r into a number of sub-classes of equal size. Within each sub-class they model the outcome ∧

distribution given the treatment f y ( t ) T , r ( t, X ) ( y t, r ) , e.g by regressing Y(t) on ∧

both t and r . To further reduce the bias Robins and Rotnitzky (2001) have suggested the inclusion of other covariates in the regression. The average causal effect can be computed as a weighted average of the within subclasses effects, with weights equal to the relative size of the sub-classes. Formally the average treatment effect can be approximated in the following way:

5

We can think to the Gaussian density function: Ti X i ∼ N (h( X i ; β ), σ 2 ) , where ψ = ( β ,σ 2 ) and β,σ 2 can be estimated by maximum likelihood.

53



S

E[Y ( t )] ≈ ∑ E[Y (t )T = t , rs ]W s s =1

where S is the number of sub-classes and Ws is the relative size of each subclasses, estimated by the proportion of observations included into sub-class s. Since results may be sensitive to the number of subclasses and sub-classes choice, Van Dick and Imai (2003) suggest to conduct a sensitivity analysis with different types of sub-classifications.

3.3.1 Lu’s matching technique applying the GPS

In contrast to sub-classification method, Lu et al. (2001) suggest matching ∧

pairs of units on r . In order to implement this procedure, in the continuous treatment case, we need to divide the range of the treatment variable in blocks, applying matching inside each strata, so proceeding in the average treatment effect estimation. However, in this context, matching procedure turns out to be more difficult to apply than in binary treatment. This because ∧

the matched pairs should not only have similar r , but they should also have different treatment levels (this is not a problem in the binary treatment case since each pairs is a unit from the treatment group and a unit from the control group). Lu et al. (2001) propose a distance measure that decreases when the propensity scores become similar and the received treatments become dissimilar. The treatment effect can be evaluated by examining the difference in response between the “high” and “low” treatment and, in order to take into account the difference in treatment, they also suggest to regress the difference in response on difference in treatment. It is evident the

54

difficult application of these techniques for a generalization to continuous treatment variables. In this sense sub-classification ad smoothing techniques represent

more

powerful

strategies,

since

they

allow

a

simpler

implementation of more complex causal effect analysis.

4

Conclusion

Propensity score methods have become one of the most important tools for analyzing causal effects in observational studies. Although the original work of Rosenbaum and Rubin (1983) considered applications with binary treatments, it can also be extended to multivalued and continuous treatments. We have discussed some of the issues involved in handling multiple and continuous treatments and emphasized how the propensity score methodology can be applied to “arbitrary” cases.

55

Bibliography [Abadie and Imbens, 2002] Abadie, A., Imbens, G. (2002) Simple and biascorrected matching estimators for average treatment effects. NBER technical working paper series, 283.http://www.nber.org/papers/T0283.

[Abadie and Imbens, 2004] Abadie, A., Imbens, G. (2004) Large sample properties of matching estimators for average treatment effeects. Working paper, 283. http://elsa.berkeley.edu/.

[Ahn and Powell, 1993] Ahn, H. and J.L. Powell (1993), Semiparametric Estimation of Censored Selection Models with a Nonparametric Selection Mechanism, Journal of Econometrics, 58, 3-29.

[Angrist and Krueger, 2000] Angrist, J., Krueger, (2000)

Empirical

Strategies in Labor Economics, in A. Ashenfelter and D. Card eds. Handbook of Labor Economics, vol. 3. New York: Elsevier Science.

[Angrist et al., 1996] Angrist, J., Imbens, G., Rubin, D. B. (1996) Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444-472.

[Athey and Imbens, 2002] Athey, S., and Imbens, G. (2002), “Identification and Inference in Nonlinear Difference-In-Differences Models,” unpublished manuscript, Department of Economics, Stanford University.

56

[Battistin et al., 2001] Battistin, E., Gavosto, A., Rettore, E. (2001) Why do subsidised firms survive longer? An evaluation of a program promoting youth enterpreneurship in Italy, in Lechner M., F. Pfeiffer (eds.), Econometric evaluation of active labour market policies, Physica/SpringerVerlag, Heidelberg

[Becker and Ichino, 2002] Becker, S. O. and Ichino, A. (2002). Estimation of average treatment effects based on propensity scores. The Stata Journal, 4, 358-377.

[Blundel and MaCurdy, 2000] Blundell, Richard, and Thomas MaCurdy, (2000): Labor Supply, Handbook of Labor Economics, O. Ashenfelter and D. Card, eds., North Holland: Elsevier, 2000, 1559-1695.

[Bryson et al., 1999] Bryson, A., Dorsett, R., Purdon, S. (2002). The use of propensity score matching in the evaluation of active labour market policies. Policy Studies Institute, U.K. Department for Work and Pensions Working Paper No. 4. http://www.dwp.gov.uk/asd/asd5/wp-index.html.

[Cox and Oakes, 1984] Cox D.R., Oakes D. (1984) Analysis of survival data. Chapman and Hall, London.89

[Dehejia and Wahba, 1999] Dehejia, R. H., Wahba, S. (1999) Causal effects in

nonexperimental

studies:

Re-evaluating

the

evaluation

of

training

programs. Journal of the American Statistical Association 94, 1053-62.

57

[DiPrete and Gangl, 2004] DiPrete, T., Gangl, M. (2004). Assessing bias in the estimation of causal effects: Rosenbaum bounds on matching estimators and

instrumental

variables

estimation

with

imperfect

instruments.

Sociological methodology.

[Drake, 1993] Drake, C. (1993) Effects of misspecification of the propensity score on estimators of treatment effects. Biometrics 49, 1231-1236.

[Frolich, 2002] Frolich, M. (2002) What is the value of knowing the propensity score for the estimation af the average treatment effects. Department of economics, University of St. Gallen.

[Greevy et al. 2004] Greevy, R., Lu, B., Silber, J. H., and Rosenbaum, P.(2004) Optimal multivariate matching before randomization. Biostatistics 5, 263-275.

[Hahn, 1998] Hahn, J. (1998) On the role of the propensity scores in e±cient semiparametric estimation of average treatment effects. Econometrica 66, 315-331.

[Hausman and Newey, 1995] Hausman, J., and Newey, W., (1995) Nonparametric Estimation of Exact Consumer Surplus and Deadweight Loss, Econometrica, 63, 1445-1476.

[Heckman, 1979b] Heckman, J. J. (1979) Sample selection bias as a specification error. Econometrica 41(1), 153-161.

58

[Heckman and Hotz, 1989] Heckman, J. J., Hotz, V. J. (1989) Choosing among alternative nonexperimental methods for estimating the impact of social programs: the case of manpower training. Journal of the American Statistical Association 84, 408, 862-874.

[Heckman et al., 1998b] Heckman, J. J., Ichimura, H., Todd, P. (1998b) Matching as an econometric evaluation estimator. Review of Economic Studies 65, 261-294.

[Heckman and Robb, 1985] Heckman, J. and R. Robb, (1985), Alternative Methods for Evaluating the Impact of Interventions, in J. Heckman and B. Singer, eds., Longitudinal Analysis of Labor Market Data, New York: Cambridge University Press.

[Heckman and Todd, 1999] Heckman, J. J., Todd, P. (1999) Adopting propensity score matching and selection models of choice-based samples. University of Chicago.

[Hirano et al., 2003] Hirano, K., Imbens, G. W., Ridder, G. (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4).

[Holland 1986] Holland, P. (1986) Statistics and causal inference. Journal of the American Statistical Association, 81.

59

[Honorè and Powell, 1994] Honorè, B. and Powell, J., (1994)

Pairwise

Difference Estimators for Censored and Truncated Regression Models, Journal of Econometrics, 64: 241-278.

[Imbens, 1999] Imbens, G. W., (1999) The Role of the Propensity Score in Estimating Dose-Response Functions, NBER Working Paper No. T0237. Available at SSRN: http://ssrn.com/abstract=226648.

[Imbens and Hirano, 2004] Imbens, G. and Hirano, K., (2004) The Propensity score with continuous treatment, chapter for Missing data and Bayesian Method in Practice: Contributions by Donald Rubin Statistical Family.

[Imbens et al., 2001] Imbens, G. (2001) Implementing Matching Estimators for Average Treatment Effects in Stata, The Stata Journal (2001).

[Joffe and Rosenbaum, 1999] Jo_e, M. M. and Rosenbaum, P. R. (1999). Propensity scores. American Journal of Epidemiology 150, 327-333.

[Koisuke and Dyk, 2003] Koisuke, I. and van Dyk, D. A., (2003) Causal treatment with general treatment regimes: Generalizing the Propensity Score, Revised for the Journal of the American Statistical Association.

[Lechner, 2001] Lechner, M., (2001), Identification and Estimation of Causal Effects of Multiple Treatments under the Conditional Independence Assumption, in Lechner and Pfeiffer (eds.), Econometric Evaluations of Active Labor Market Policies in Europe, Heidelberg, Physica.

60

[Lee, 2005] Lee W-S, (2005) Propensity Score Matching and Variations on the Balancing Test, Working Paper - 3rd Conference on Policy Evaluation, Mannheim.

[Lu et al., 2001] Lu et al. (2001). Matching with doses in an observational study of a media campaign against drug abuse. Journal of the American Statistical Association 96, 1245-1253.

[Mealli e Pagni, 2001 ] Mealli, F., Pagni, R. (2001) Analisi e valutazione delle politiche per le nuove imprese. Il caso della L.R. Toscana n.27/93.IRPET.

[Meyer, 1995] Meyer, B, (1995), Natural and Quasi-experiments in Economics, Journal of Business and Economic Statistics, 13 (2), 151-161.

[Meini, 2001] Meini, M.C. (2001) Politiche per l'occupazione a scala locale. Valutazione del ruolo degli interventi per lo start-up d'impresa. Provincia di Massa, Osservatorio Provinciale Mercato del Lavoro, IRPET.

[Ming and Rosenbaum, 2000] Ming, K., Rosenbaum, P. R. (2000). Substantial gains in bias reduction from matching with a variable number of controls. Biometrics 56

[Moffit, 1991] Moffit R. (1991), Program evaluation with nonexperimental data, Evaluation Review, 15: 291-314.

61

[Newey, 1988] Newey W. K. (1988) Two Step Series Estimation of Sample Selection Models, Working Paper, MIT Department of Economics.

[Newey and Vella, 2003] Newey and Vella (2003), Non-parametric Estimation of Sample Selection. Models, Review of Economic Studies, 2003, Vol 70, pp 33-58.

[Neyman, 1923] Neyman, J. (1923) On the application of probability theory to agricultural experiments. Essay on principles. Section 9.

[Pellegrini, 2001] Pellegrini G. (2001). La struttura produttiva delle piccole e medie imprese italiane: il modello dei distretti." Banca Impresa Società, 20 (2001), n. 2.

[Powell J. L., 1987] Powell J. L. (1987) Semiparametric Estimation of Bivariate Latent Variable Models, Working Paper No. 8704, Social Systems Research Institute, University of Wisconsin-Madison.

[Powell, 1987] Powell, J. L., (1987) Semiparametric Estimation of Employment Duration Models, Econometric Reviews, 6: 65-78.

[Purdon, 2002] Purdon, S. (2002) The use of propensity score matching in the evaluation of active labour market policies, Rosenbaum, P. R. (2002).Observational Studies, 2nd Edition. Springer Verlag, New York, NY.

62

[Rettore and Gavosto, 2001] Rettore E. and Gavosto A. (2001) Why do subsidised firms survive longer? An evaluation of a program promoting youth entrepreneurship in Italy, Econometric Evaluation of Active Labour Market Policies, Physica/Springer-Verlag, Heidelberg.

[Rosenbaum, 2002] Rosenbaum, P. R. (2002) Observational Studies, 2nd Edition. Springer Verlag, New York, NY.

[Rosenbaum and Rubin, 1983] Rosenbaum, P.R., Rubin, D. B. (1983) The central role of the propensity score in observational studies for causal effects. Biometrika, 70.

[Rosenbaum and Rubin, 1983b] Rosenbaum, P.R., Rubin, D. B. (1983b) Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome Journal of the Royal Statistical Society, 45, 212.

[Rosenbaum and Rubin, 1984] Rosenbaum, P. R., Rubin, D. B. (1984) Reducing bias in observational studies using sub-classification on the propensity score. Journal of the American Statistical Association 79,516524.

[Rubin, 1973a] Rubin, D. B. (1973a) Matching to remove bias in observational studies. Biometrics 29, 159-184.

[Rubin, 1973b] Rubin, D. B. (1973b) The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 29, 159-184.

63

[Rubin, 1974] Rubin, D. B. (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688-701.

[Rubin, 1980a] Rubin, D. B. (1980a) Bias reduction using Mahalanobis's metric matching. Biometrics, 36, 295-298.

[Rubin and Thomas, 1992b] Rubin, D. B., Thomas, N. (1992a) Characterizing the effect of matching using linear propensity score methods with normal distributions. Biometrika 79, 797-809.

[Rubin and Thomas, 1996] Rubin, D. B., Thomas, N. (1996) Matching using estimated propensity scores, relating theory to practice. Biometrics 52, 249264.

[Zaho, 2004] Zaho, Z. (2004) Using matching to estimate treatment effects: data requirements, matching metrics, and Monte Carlo evidence. Review of Economics and Statistics 86(1), 91-107.

64

Working Papers The full text of the working papers is downloadable at http://polis.unipmn.it/ *Economics Series

**Political Theory Series

ε

Al.Ex Series

2007 n.88*

Michela Bia: The Propensity Score method in public policy evaluation: a survey

2007 n.87*

Luca Mo Costabella and Alberto Martini: Valutare gli effetti indesiderati dell’istituto della mobilità sul comportamento delle imprese e dei lavoratori.

2007 n.86ε

Stefania Ottone: Are people samaritans or avengers?

2007 n.85*

Roberto Zanola: The dynamics of art prices: the selection corrected repeat-sales index

2006 n.84*

Antonio Nicita and Giovanni B. Ramello: Property, liability and market power: the antitrust side of copyright

2006 n.83*

Gianna Lotito: Dynamic inconsistency and different models of dynamic choice – a review

2006 n.82**

Gabriella Silvestrini: Le républicanisme genevois au XVIIIe siècle

2006 n.81*

Giorgio Brosio and Roberto Zanola: Can violence be rational? An empirical analysis of Colombia

2006 n.80*

Franco Cugno and Elisabetta Ottoz: Static inefficiency of compulsory licensing: Quantity vs. price competition

2006 n.79*

Carla Marchese: Rewarding the consumer for curbing the evasion of commodity taxes?

2006 n.78**

Joerg Luther: Percezioni europee della storia costituzionale cinese

2006 n.77ε

Guido Ortona, Stefania Ottone, Ferruccio Ponzano and Francesco Scacciati: Labour supply in presence of taxation financing public services. An experimental approach.

2006 n.76*

Giovanni B. Ramello and Francesco Silva: Appropriating signs and meaning: the elusive economics of trademark

2006 n.75*

Nadia Fiorino and Roberto Ricciuti: Legislature size and government spending in Italian regions: forecasting the effects of a reform

2006 n.74**

Joerg Luther and Corrado Malandrino: Letture provinciali della costituzione europea

2006 n.73*

Giovanni B. Ramello: What's in a sign? Trademark law and economic theory

2006 n.72*

Nadia Fiorino and Roberto Ricciuti: Determinants of direct democracy across Europe

2006 n.71*

Angela Fraschini and Franco Oscultati: La teoria economica dell'associazionismo tra enti locali

2006 n.70*

Mandana Hajj and Ugo Panizza: Religion and gender gap, are Muslims different?

2006 n.69*

Ana Maria Loboguerrero and Ugo Panizza: Inflation and labor market flexibility: the squeaky wheel gets the grease

2006 n.68*

Alejandro Micco, Ugo Panizza and Monica Yañez: Bank ownership and performance: does politics matter?

2006 n.67*

Alejandro Micco and Ugo Panizza: Bank ownership and lending behavior

2006 n.66*

Angela Fraschini: Fiscal federalism in big developing countries: China and India

2006 n.65*

Corrado Malandrino: La discussione tra Einaudi e Michels sull'economia pura e sul metodo della storia delle dottrine economiche

2006 n.64ε

Stefania Ottone: Fairness: a survey

2006 n.63*

Andrea Sisto: Propensity Score matching: un'applicazione per la creazione di un database integrato ISTAT-Banca d'Italia

2005 n.62*

P. Pellegrino: La politica sanitaria in Italia: dalla riforma legislativa alla riforma costituzionale

2005 n.61*

Viola Compagnoni: Analisi dei criteri per la definizione di standard sanitari nazionali

2005 n.60ε

Guido Ortona, Stefania Ottone and Ferruccio Ponzano: A simulative assessment of the Italian electoral system

2005 n.59ε

Guido Ortona and Francesco Scacciati: Offerta di lavoro in presenza di tassazione: l'approccio sperimentale

2005 n.58*

Stefania Ottone and Ferruccio Ponzano, An extension of the model of Inequity Aversion by Fehr and Schmidt

2005 n.57ε

Stefania Ottone, Transfers and altruistic punishment in Solomon's Game experiments

2005 n. 56ε

Carla Marchese and Marcello Montefiori, Mean voting rule and strategical behavior: an experiment

2005 n.55**

Francesco Ingravalle, La sussidiarietà nei trattati e nelle istituzioni politiche dell'UE.

2005 n. 54*

Rosella Levaggi and Marcello Montefiori, It takes three to tango: soft budget constraint and cream skimming in the hospital care market

2005 n.53*

Ferruccio Ponzano, Competition among different levels of government: the reelection problem.

2005 n.52*

Andrea Sisto and Roberto Zanola, Rationally addicted to cinema and TV? An empirical investigation of Italian consumers . Luigi Bernardi and Angela Fraschini, Tax system and tax reforms in India

2005 n.51* 2005 n.50*

Ferruccio Ponzano, Optimal provision of public goods under imperfect intergovernmental competition.

2005 n.49*

Franco Amisano e Alberto Cassone, Proprieta’ intellettuale e mercati: il ruolo della tecnologia e conseguenze microeconomiche

2005 n.48*

Tapan Mitra e Fabio Privileggi, Cantor Type Attractors in Stochastic Growth Models

2005 n.47ε

Guido Ortona, Voting on the Electoral System: an Experiment

2004 n.46ε

Stefania Ottone, Transfers and altruistic Punishments in Third Party Punishment Game Experiments.

2004 n.45*

Daniele Bondonio, Do business incentives increase employment in declining areas? Mean impacts versus impacts by degrees of economic distress.

2004 n.44**

Joerg Luther, La valorizzazione del Museo provinciale della battaglia di Marengo: un parere di diritto pubblico

2004 n.43*

Ferruccio Ponzano, The allocation of the income tax among different levels of government: a theoretical solution

2004 n.42*

Albert Breton e Angela Fraschini, Intergovernmental equalization grants: some fundamental principles

2004 n.41*

Andrea Sisto, Roberto Zanola, Rational Addiction to Cinema? A Dynamic Panel Analisis of European Countries

2004 n.40**

Francesco Ingravalle, Stato, groβe Politik ed Europa nel pensiero politico di F. W. Nietzsche

2003 n.39ε

Marie Edith Bissey, Claudia Canegallo, Guido Ortona and Francesco Scacciati, Competition vs. cooperation. An experimental inquiry

2003 n.38ε

Marie-Edith Bissey, Mauro Carini, Guido Ortona, ALEX3: a simulation program to compare electoral systems

2003 n.37*

Cinzia Di Novi, Regolazione dei prezzi o razionamento: l’efficacia dei due sistemi di allocazione nella fornitura di risorse scarse a coloro che ne hanno maggiore necessita’

2003 n. 36*

Marilena Localtelli, Roberto Zanola, The Market for Picasso Prints: An Hybrid Model Approach

2003 n. 35*

Marcello Montefiori, Hotelling competition on quality in the health care market.

2003 n. 34*

Michela Gobbi, A Viable Alternative: the Scandinavian Model of Democracy”

2002 n. 33*

Mario Ferrero, Radicalization as a reaction to failure: an economic model of islamic extremism

2002 n. 32ε

Guido Ortona, Choosing the electoral system – why not simply the best one?

2002 n. 31**

Silvano Belligni, Francesco Ingravalle, Guido Ortona, Pasquale Pasquino, Michel Senellart, Trasformazioni della politica. Contributi al seminario di Teoria politica

2002 n. 30*

Franco Amisano, La corruzione amministrativa in una burocrazia di tipo concorrenziale: modelli di analisi economica.

2002 n. 29*

Marcello Montefiori, Libertà di scelta e contratti prospettici: l’asimmetria informativa nel mercato delle cure sanitarie ospedaliere

2002 n. 28*

Daniele Bondonio, Evaluating the Employment Impact of Business Incentive

“Social

Programs in EU Disadvantaged Areas. A case from Northern Italy 2002 n. 27**

Corrado Malandrino, Oltre il compromesso del Lussemburgo verso l’Europa federale. Walter Hallstein e la crisi della “sedia vuota”(1965-66)

2002 n. 26**

Guido Franzinetti, Le Elezioni Galiziane al Reichsrat di Vienna, 1907-1911

2002 n. 25ε

Marie-Edith Bissey and Guido Ortona, A simulative frame to study the integration of defectors in a cooperative setting

2001 n. 24*

Ferruccio Ponzano, Efficiency wages and endogenous supervision technology

2001 n. 23*

Alberto Cassone and Carla Marchese, Should the death tax die? And should it leave an inheritance?

2001 n. 22*

Carla Marchese and Fabio Privileggi, Who participates in tax amnesties? Self-selection of risk-averse taxpayers

2001 n. 21*

Claudia Canegallo, Una valutazione delle carriere dei giovani lavoratori atipici: la fedeltà aziendale premia?

2001 n. 20*

Stefania Ottone, L'altruismo: atteggiamento irrazionale, strategia vincente o amore per il prossimo?

2001 n. 19*

Stefania Ravazzi, La lettura contemporanea del cosiddetto dibattito fra Hobbes e Hume

2001 n. 18*

Alberto Cassone e Carla Marchese, Einaudi e i servizi pubblici, ovvero come contrastare i monopolisti predoni e la burocrazia corrotta

2001 n. 17*

Daniele Bondonio, Evaluating Decentralized Policies: How to Compare the Performance of Economic Development Programs across Different Regions or States.

2000 n. 16*

Guido Ortona, On the Xenophobia of non-discriminated Ethnic Minorities

2000 n. 15*

Marilena Locatelli-Biey and Roberto Zanola, The Market for Sculptures: An Adjacent Year Regression Index

2000 n. 14*

Daniele Bondonio, Metodi per la valutazione degli aiuti alle imprse con specifico target territoriale

2000

n. 13* Roberto Zanola, Public goods versus publicly provided private goods in a two-class economy

2000 n. 12**

Gabriella Silvestrini, Il concetto di «governo della legge» nella tradizione repubblicana.

2000 n. 11**

Silvano Belligni, Magistrati e politici nella crisi italiana. Democrazia dei guardiani e neopopulismo

2000 n. 10*

Rosella Levaggi and Roberto Zanola, The Flypaper Effect: Evidence from the Italian National Health System

1999 n. 9*

Mario Ferrero, A model of the political enterprise

1999 n. 8*

Claudia Canegallo, Funzionamento del mercato del lavoro in presenza di informazione asimmetrica

1999 n. 7**

Silvano Belligni, Corruzione, malcostume amministrativo e strategie etiche. Il ruolo dei codici.

1999 n. 6*

Carla Marchese and Fabio Privileggi, Taxpayers Attitudes Towaer Risk and Amnesty Partecipation: Economic Analysis and Evidence for the Italian Case.

1999 n. 5*

Luigi Montrucchio and Fabio Privileggi, On Fragility of Bubbles in Equilibrium Asset Pricing Models of Lucas-Type

1999 n. 4**

Guido Ortona, A weighted-voting electoral system that performs quite well.

1999 n. 3*

Mario Poma, Benefici economici e ambientali dei diritti di inquinamento: il caso della riduzione dell’acido cromico dai reflui industriali.

1999 n. 2*

Guido Ortona, Una politica di emergenza contro la disoccupazione semplice, efficace equasi efficiente.

1998 n. 1*

Fabio Privileggi, Carla Marchese and Alberto Cassone, Risk Attitudes and the Shift of Liability from the Principal to the Agent

Department of Public Policy and Public Choice “Polis” The Department develops and encourages research in fields such as: • theory of individual and collective choice; • economic approaches to political systems; • theory of public policy; • public policy analysis (with reference to environment, health care, work, family, culture, etc.); • experiments in economics and the social sciences; • quantitative methods applied to economics and the social sciences; • game theory; • studies on social attitudes and preferences; • political philosophy and political theory; • history of political thought. The Department has regular members and off-site collaborators from other private or public organizations.

Instructions to Authors Please ensure that the final version of your manuscript conforms to the requirements listed below:

The manuscript should be typewritten single-faced and double-spaced with wide margins. Include an abstract of no more than 100 words. Classify your article according to the Journal of Economic Literature classification system. Keep footnotes to a minimum and number them consecutively throughout the manuscript with superscript Arabic numerals. Acknowledgements and information on grants received can be given in a first footnote (indicated by an asterisk, not included in the consecutive numbering). Ensure that references to publications appearing in the text are given as follows: COASE (1992a; 1992b, ch. 4) has also criticized this bias.... and “...the market has an even more shadowy role than the firm” (COASE 1988, 7). List the complete references alphabetically as follows: Periodicals: KLEIN, B. (1980), “Transaction Cost Determinants of ‘Unfair’ Contractual Arrangements,” American Economic Review, 70(2), 356-362. KLEIN, B., R. G. CRAWFORD and A. A. ALCHIAN (1978), “Vertical Integration, Appropriable Rents, and the Competitive Contracting Process,” Journal of Law and Economics, 21(2), 297-326. Monographs: NELSON, R. R. and S. G. WINTER (1982), An Evolutionary Theory of Economic Change, 2nd ed., Harvard University Press: Cambridge, MA. Contributions to collective works: STIGLITZ, J. E. (1989), “Imperfect Information in the Product Market,” pp. 769-847, in R. SCHMALENSEE and R. D. WILLIG (eds.), Handbook of Industrial Organization, Vol. I, North Holland: Amsterdam-London-New York-Tokyo. Working papers: WILLIAMSON, O. E. (1993), “Redistribution and Efficiency: The Remediableness Standard,” Working paper, Center for the Study of Law and Society, University of California, Berkeley.

The Propensity Score method in public policy evaluation

When the data, the type of intervention and the assignment criterion allow it, a quasi- ... The evaluation of policies carried out by using quantitative tools is a tangible ..... the unobservable characteristics is also guaranteed if the size of the groups is sufficiently large. ...... Under these analytical framework, it is possible to derive.

483KB Sizes 1 Downloads 207 Views

Recommend Documents

ABILITY AND EDUCATION IN THE POLICY EVALUATION ...
biases) and on the shape and variability of marginal returns to education. .... In the simplest model, we ignore option values of continuing studying after the level ...

ABILITY AND EDUCATION IN THE POLICY EVALUATION ...
choice with hindsight. The correct economic incentives are provided if the economic system allows individuals to have a sorting gain. In other words, the ...

Propensity Score Estimation with Boosted Regression ...
Daniel F. McCaffrey, Greg Ridgeway, and Andrew R. Morral ...... 4 R is a full-featured, freely available language and environment for statistical ... packages have some different functions for programming and conducting statistical analyses.

Propensity Score Estimation with Boosted Regression ...
methods account for differences between treatment and control groups by modeling the selection process. The propensity score is the probability that a study ...

Transport policy evaluation in metropolitan areas: The role ... - CiteSeerX
more complicated and data-intensive tools unless it is proven that the new model ...... elasticity models, and Cost Benefit Analysis or Cost Recovery Analysis). ... Many participants raised the issue that they are having a hard time finding skilled .

Transport policy evaluation in metropolitan areas: The ...
E-mail addresses: [email protected] (M. Hatzopoulou), ..... Columbia Greater Vancouver Transportation Authority Act in 1998. Translink is ...... Engineers and planners; the TDM coordinator has a marketing background.

Transport policy evaluation in metropolitan areas: The role ... - CiteSeerX
In fact, the decentralization of the Canadian government structure, involving three levels of government ...... jobs that must be promoted (''For the master plan, we have developed a ..... Computers, Environment and Urban Systems 28, 9–44.

Public Service Score Sheet.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Public Service ...

Studies in the Political Economy of Public Policy
Book synopsis. The World Bank and New Mining Regimes in Asia critically investigates the particular role played by the World. Bank Group (WBG) in both ...

What Drives Heterogeneity in the Marginal Propensity ...
Dec 3, 2016 - In order to understand the mechanisms that drive MPC heterogeneity, I adopt the dichotomy laid out in Parker (2015) between classes of models that can explain the re- lationship between cash on hand and the MPC. In the first class of mo

A novel discriminative score calibration method for ...
For training, we use single word samples form the transcriptions. For evaluation, each in- put feature sequence is the span of the keyword detection in the utterance, and the label sequence is the corresponding keyword char sequence. The CTC loss of

A Computer-Aided Method to Expedite the Evaluation ...
treated too heavily even if they are highly curable with a small dosage or less chemical treatment. .... mercial systems including Magiscan (Joyce Loebl, Gateshead. UK), Cytoscan (Image ... number of chromosomes is equal to 46; (b) (f) Cat- egory II: