A Flexible Correlated Random Effects Approach to Identification and Estimation of Partial Effects with the Logit Fixed Effects Model Valentin Verdier∗ September 1, 2016

Abstract The Logit fixed effects model for binary outcomes with panel data relies on a linear model for a latent variable which includes an additive unobserved heterogeneity term and additive transitory shocks that are assumed to be serially independent and to follow the logistic distribution. This model has proved to be very popular in empirical work, mainly due to the fact that a conditional maximum-likelihood estimator, the Logit fixed effects estimator, has been shown to be consistent for the coefficients on observed covariates in the model for the latent variable, even in the absence of restrictions on the distribution of unobserved heterogeneity conditional on the covariates. While these coefficients determine the sign of the effects of the covariates on the outcome variable, they are not sufficient to calculate the magnitude of these effects. Here I show that the distribution of partial effects conditional on covariates is identified as long as the distribution of unobserved heterogeneity conditional on covariates is restricted to belong to a fairly general class of distributions and restrictions on the support of the observed covariates hold. I also show that results coming from the proof of identification can be used to motivate a relatively simple estimator for average partial effects. The results are shown to extend to the ordered Logit fixed effects model. I conclude with a simple empirical example that studies the effect of number of children and husband’s income on women’s labor force participation. Keywords: Panel Data, Binary Response, Partial Effects, Correlated Random Effects, Non-Parametric Identification ∗ Department of Economics, University of North Carolina, Chapel Hill, NC 27599, United States. Tel.: +1 919-966-3962. E-mail address: [email protected].

1

1

Introduction

In this paper I consider a model for the dependence of a binary outcome y on a vector of observed covariates x at times t = 1, ..., T given by:

yt = 1[xt β0 + c + ut ≥ 0]

(1.1)

where c contains unobserved characteristics that do not change over time. The Logit fixed-effects model imposes restrictions on the distribution of {ut }t=1,...,T , namely independence from x = {x1 , ..., xT } and c, serial independence, and following a logistic distribution, but not on the distribution of unobserved heterogeneity conditional on the covariates, Dc|x . Under the additional assumption of random sampling in the cross-section and observing several time periods (T ≥ 2), β0 can be estimated consistently in a large n, fixed T asymptotic framework, where n is the size of the sample in the cross-section, by a conditional maximum-likelihood estimator called the Logit fixed-effects estimator. This is discussed in Chamberlain (1984) for instance. β0 determines the sign of the effects of variables in x on y, but the magnitude of these effects cannot be obtained from β0 alone. In many cases, the magnitude of the effects might be of as much importance as their sign. For instance consider the special case of a policy evaluation, where for simplicity xt ∈ {0, 1} is the only covariate. In many cases, obtaining information on features of the distribution of treatment effects conditional on unobserved heterogeneity: T E = Λ(β0 + c) − Λ(c)

(1.2)

where Λ is the logistic cumulative distribution function, or features of the marginal distributions of the outcome variable in the presence and absence of the policy, will be a critical part of the evaluation of a policy.1 We see that determining the magnitude of features of the 1

In addition coefficients in linear index models cannot be compared across model specifications, but features of the distribution of treatment effects, such as average or quantiles of treatment effects, can. In many empirical applications researchers wish to compare results obtained from different models (e.g. linear probability model, correlated random effects Probit, Logit fixed effects) as a robustness check, which provides an additional justification for being interested in the distribution of treatment effects rather than in the coefficients of the

2

distribution of T E would require information on the distribution of c. In general one might be interested in features of the distribution of treatment effects conditional on the covariates x, so that the distribution of unobserved heterogeneity conditional on the covariates, Dc|x , would be needed. Here I study conditions under which Dc|x is identified, taking as given the assumptions of the Logit fixed-effects model, and with a large n, fixed T asymptotic framework. I show that if the dependence of Dc|x on x is captured by two unknown scalar functions of x that take the role of location and scale parameters, then the conditional distribution of c conditional on x is identified without imposing further restrictions on Dc|x , as long as the support of x satisfies some additional restrictions. The estimation of features of the distribution of partial effects with panel data using a large n, fixed T asymptotic framework has been a long-standing question in econometric research, and it is interesting to contrast previous work with the results presented in this paper. A parametric approach that has been widely used in empirical research is the parametric correlated random effects (CRE) model proposed in Chamberlain (1984), which consists in specifying a parametric model for Dc|x . Here restrictions are imposed on the form of Dc|x but are not so strong that the model for Dc|x becomes parametric. Previous work has also studied the identification of features of the distribution of partial effects with unknown non-separable response functions. Altonji and Matzkin (2005) for instance consider the identification of conditional expected values of treatment or partial effects under the assumption that Dc|x only depends on some known sufficient statistics z such that Dc|x = Dc|z . Bester and Hansen (2009) have also studied the identification of conditional expected values of treatment or partial effects under the assumption Dc|x = Dc|µ1 (x1 ),...,µk (xk ) where k = dim(xt ), {µj }j=1,...,k are unknown scalar functions, and xj = {xjt }t=1,...,T , xt = [xjt ]j=1,...,k . Here I consider identification of the conditional distribution of treatment or partial effects with the specific structure of the Logit fixed effects model instead of unknown non-separable response functions. The restrictions imposed on Dc|x considered here and presented in Sections 3 and 4 also imply that sufficient statistics capture the linear index.

3

dependence of the distribution of c on x, but they do not impose that these sufficient statistics are known or that each covariate enters Dc|x through its own index function µj (.). Another approach to estimating distributional features of partial effects with non-separable models of panel data consists in imposing an assumption of marginal stationarity, which implies that the response function linking observable covariates and unobserved heterogeneity does not change over time. With this assumption, the expected value or quantiles of treatment effects can be estimated for subpopulations of “movers”, who switched into or out of treatment over the period of observation. This approach was used for example in Chernozhukov et al. (2013) for the case where all covariates are discrete and in Hoderlein and White (2012) and Chernozhukov et al. (2015) for the case where covariates are continuous. While these papers study models with unknown non-separable response functions that are more general than the logit fixed-effects model studied here, being able to allow for changes in the response function over time due to unobserved aggregate shocks is very desirable in empirical applications. For instance, unobserved aggregate shocks could otherwise confound the evaluation of a policy that was rolled out over the period of observation. This would be accomodated in the Logit fixed effects model by using indicator variables for all time periods in the list of covariates. In Chernozhukov et al. (2015), additively or multiplicatively separable time schocks are considered, while here the time shocks are additively or multiplicatively separable in the model for the linear index. In addition, under the stronger assumptions used here, identification of the distribution of partial effects is shown over the entire support of the observed covariates instead of conditional on having the same value of the covariates over time as in Chernozhukov et al. (2015)2 . Finally, an approach which to my knowledge has not been applied in the context of models of binary choice with strictly exogenous covariates, but could be constrasted to the approach used here, would be to impose the restriction that the support of c is finite, which is mentioned in Arellano and Bonhomme (2011). If c has K points of support, then without further restrictions Dc|x would depend on K − 1 unknown functions of x, as well as K unknown points of support. Browning and Carro (2014) have studied this question in the context of a 2

This is the analogue for continuous covariates of “being a mover” for discrete covariates

4

dynamic model with binary outcome.3 It is likely that with the model for binary outcome with strictly exogenous covariates considered here, as with the dynamic model of Browning and Carro (2014), the number of points of support that c can take, K, would have to be restricted for identification, but could be increased as the number of time periods, T , increases. Since this paper imposes the restriction that Dc|x depends on only two unknown functions of x that take the role of location and scale parameters, it is also important to relate it to previous work on flexible correlated random effects models of panel data. There are several papers that have considered imposing the restriction of additive separability on Dc|x considered in Section 34 in order to estimate the finite dimensional coefficients of linear index models of panel data for which no consistency results existed in the absence of restrictions on the dependence between unobserved heterogeneity and the covariates. Examples of such work include Chen and Khan (2001) for the case of censored data, Gayle and Viauroux (2007) for dynamic sample selection models, and Gayle (2013) for a general class of linear index models. Here the identification and consistent estimation of β0 is already guaranteed by the assumptions of the Logit fixed effects model. The objective of this paper is to study the identification of the distribution of partial effects, taking as given the identification of β0 . In Section 2 I formally define the Logit fixed-effects model and the objects of interest. In Section 3 I consider the case where Dc|x is restricted to correspond to an additively separable model for c conditional on x. In Section 4 I consider the more general case where the dependence of Dc|x on x is restricted to depend on two unknown functions of x that play the role of location and scale parameters. In Section 5 I show that the identification results apply in a straightforward fashion to the ordered fixed effects logit model. In Section 6 I define a simple estimator for average partial effects that makes use of some of the results found in the previous sections. In Section 7 I present a brief illustration of the estimation methods defined in the previous section by comparing different estimates of the average partial effects of number of children and husband’s income on married women’s labor force participation. 3

Browning and Carro (2014) consider the introduction of discrete covariates in their dynamic model of binary outcomes, but since the emphasis is on the dynamics of the model, the results one would obtain if suppressing the dynamics from their model could not directly be compared with the results presented here. 4 I am not aware of work using the more flexible restriction of location and scale distribution as in Section 4.

5

2

The Logit Fixed-Effects Model and Partial Effects

The Logit fixed effects model is described by the following assumption: Assumption 1 (Logit fixed-effects model) {yit , xit }i=1,...,n,t=1,...,T is an observed random sample across i. T ≥ 2 and:

yit = 1[xit β0 + ci + uit ≥ 0] ∀ t = 1, ..., T i.i.d.

{uit }t=1,...,T |xi , ci ∼ LOG(0, 1)

(2.1) (2.2)

where xi = [xi1 , ..., xiT ]. Under Assumption 1 and standard regularity conditions that guarantee the convergence of maximum likelihood estimators, β0 can be estimated consistently under large n fixed T asymptotics. This is discussed in Chamberlain (1984) or Wooldridge (2010) for instance. Under Assumption 1, we have:

E(yit |xit , ci ) = Λ(xit β0 + ci )

(2.3)

Hence, as in Wooldridge (2005) for instance, the partial effects of xt on yt , keeping c constant, for observation i, are given by:

P Eit = β0 λ(xit β0 + ci )

(2.4)

when all variables in xt are continuous. The difference in Λ(xt β0 + ci ) across different values of xt could also be considered for the case where xt includes some discrete variables. Hence we see that β0 will determine the signs of the partial effects, but will not be sufficient for determining the magnitude of the partial effects of xt on yt . In empirical applications, researchers will often be interested in summary measures of the conditional or unconditional distribution of the partial effects. For instance the object of interest might be defined to be

6

conditional average partial effects:

AP Et (x) = E(P Eit |xi = x)

(2.5)

or conditional quantiles of partial effects:

QP Et (τ, x) = Qτ (P Eit |xi = x)

(2.6)

where Qτ (Z|X) is the τ th quantile of the distribution of the random variable Z conditional on X. We see that these objects of interest depend on the distribution of ci conditional on xi , Dci |xi . In the next two sections I show that if Dci |xi is restricted to belong to a particular class of conditional distribution functions, it is non-parametrically identified inside that class in the context of the Logit fixed-effects model given by Assumption 1.

3

Identification with an Additively Separable Correlated Random Effects Model

The first assumption I consider restricts Dci |xi to be affected by changes in xi only through a location parameter.

Assumption 2 (Additively separable correlated random effects model)

Dci |xi =x (c|x) = G(c − µ(x))

(3.1)

where µ is an unknown function and G is an unknown continuous probability distribution function such that its derivative with respect to the Lebesgues measure, g, exists. Assumption 2 can be considered as a non-parametric version of the so-called Chamberlain’s device introduced in Chamberlain (1984), who considered the assumption ci |xi ∼ N (xi δ0 , σc2 ) in the context of a linear index model where transitory shocks follow the normal distribution

7

instead of the logistic distribution. Here Assumption 2 is significantly weaker than this parametric correlated random effect model since it does not specify a specific functional form for the function µ(x) or for the function g. On the other hand the strong restriction of additive separability is still embedded in Assumption 2. The next section relaxes this assumption by considering the more general case of a location and scale distribution. Let X be the support of xi . The next assumption imposes restrictions on the support of xi . Assumption 3 (Full and banded support of xit β0 + µ(xi )) 1. The distribution of xit β0 +µ(xi ) has support equal to the real line for some t ∈ {1, ..., T }. 2. For any non-empty set S ⊂ R, ∃ x ∈ X such that xt β0 + µ(x) ∈ S and xt0 β0 + µ(x) ∈ / S, 0

for some t, t ∈ {1, ..., T }. The first part of Assumption 3 requires that the sum of the linear index, xit β0 , and the conditional mean of ci , µ(xi ), has full support over the real line. Importantly this does not restrict all of the covariates in xi to be continuous and unbounded. It only requires that some of the covariates in xi are continuous and can induce unbounded variations in xit β0 + µ(xi ). The second part of Assumption 3 is an assumption on the joint distribution of covariates over time. It implies that no matter what subset of the real line one considers, there are points on the support of xi that correspond to “movers” in terms of xit β0 + µ(xi ), i.e. values of xi such that xit β0 + µ(xi ) is inside that subset for some time period and outside of that subset for a different time period. Note that assuming that the support of {xit β0 + µ(xi ), xit0 β0 + µ(xi )} is equal to R2 would imply Assumption 3 but that Assumption 3 is weaker than such an assumption of full support of the joint distribution. The next result shows that, under Assumptions 1 to 3, the distribution of ci conditional on xi is identified from knowledge of the conditional expected values of yit conditional on xi and of β0 .5 5 Since β0 can be estimated consistently under Assumption 1 as discussed in the previous section, we can take knowledge of β0 as given when discussing identification of Dci |xi .

8

Result 1: Under Assumptions 1-3, there is a unique function F : R × X → [0, 1] given by ´ F (c, x) = Dci |xi =x (c|x) that satisfies {E(yit |xi = x) = Λ(xt β0 + c)dF (c, x)}t=1,...,T ∀ x ∈ X . Proof. Pick any point x0 ∈ X and define the functions g0 (e) = g(e − µ(x0 )) and µ0 (x) = ´ µ(x) − µ(x0 ). Also define the function ψ(a) = Λ(a + e)g0 (e)de. For any function f with codomain in Rk , denote by f[j] the j th element of f for j = 1, ..., k. We first show that under Assumptions 1-3, there is a unique function F2 : R × X → R × [0, 1] given by F2 (a, x) = {µ0 (x), ψ(a)} that satisfies {E(yit |xi = x) = F2[2] (xt β0 + F2[1] (x))}t=1,...,T ∀ x ∈ X . Define Z ⊆ R to be the set of values for which F2[2] (.) = ψ(.) is implied by {E(yit |xi = x) = F2[2] (xt β0 + F2[1] (x))}t=1,...,T ∀ x ∈ X . Let M ⊆ X to be the set of values for which F2[1] (.) = µ0 (.) is implied by {E(yit |xi = x) = F2[2] (xt β0 + F2[1] (x))}t=1,...,T ∀ x ∈ X . Note that the function ψ is invertible since it is strictly increasing.6 Therefore, since xt β0 is known, we have:

Z ⊇ {a ∈ R : ∃ x ∈ M with xt β0 + µ0 (x) = a, t ∈ {1, ..., T }} ≡ k(M)

By the same argument we also have:

M ⊇ {x ∈ X : xt β0 + µ0 (x) ∈ Z, t ∈ {1, ..., T }} ≡ h(Z)

In addition, from the definition of µ0 , µ0 (x0 ) = 0. Hence {x0t β0 }t=1,...,T ∈ Z, i.e. Z is non-empty. The functions k and h are increasing, so that we have k(h(Z)) ⊆ Z. Therefore F2 is determined by {E(yit |xi = x) = F2[2] (xt β0 + F2[1] (x))}t=1,...,T ∀ x ∈ X if S = R is the only non-empty solution to the equation k(h(S)) ⊆ S (since we also have X = h(R)). Consider any non-empty set S = 6 R. From the second part of Assumption 3, ∃ x ∈ X such 0

that xt β0 + µ0 (x) ∈ S and xt0 β0 + µ0 (x) 6∈ S for some t, t ∈ {1, ..., T }. Since xt β0 + µ0 (x) ∈ S, 6

Note that if g0 were known, then ψ(.) would be known, so that µ0 (x) would be immediately identified, as in Newey (1994) for the case where ut follows the normal distribution instead of the logistic distribution and g0 is assumed to be the normal probability density function with an unknown variance parameter.

9

x ∈ h(S), but then xt0 β0 + µ0 (x) ∈ k(h(S)). Hence k(h(S)) 6⊆ S. S = R is a solution since X = h(R) and k(X ) = R under the first part of Assumption 3. Hence F2 : R × X → R × [0, 1] given by F2 (a, x) = {µ0 (x), ψ(a)} is the unique solution to {E(yit |xi = x) = F2[2] (xt β0 + F2[1] (x))}t=1,...,T ∀ x ∈ X . Secondly we show that, for any density function on the real line g1 , there exists no other density function on the real line g2 such that ˆ

ˆ Λ(a + e)g2 (e)de ∀ a ∈ R

Λ(a + e)g1 (e)de =

´

Note that Λ(a + e)g1 (e) ≤ g1 (e) and g1 integrates to one. Hence

d da

(3.2) ´

Λ(a + e)g1 (e)de =

λ(a + e)g1 (e)de where λ is the probability density function of the standard logistic distri-

bution, and the same holds for g2 . Therefore, after a variable of integration change, (3.2) implies:

ˆ

ˆ λ(a −

e)g1? (e)de

=

λ(a − e)g2? (e)de ∀ a ∈ R

(3.3)

where g1? (e) = g1 (−e) and g2? (e) = g2 (−e). By the convolution theorem, denoting F[f ](a) the Fourier transform of f evaluated at a, we then have: F[λ](a)F[g1? ](a) = F[λ](a)F[g2? ](a) ∀ a ∈ R

(3.4)

Note that Fourier transforms are guaranteed to exist here since λ, g1? and g2? are probability density functions, and hence L1 -integrable. Since the Fourier transform of λ is

√ 2πa exp(πa)−exp(−πa) ,

it does not vanish on the real line, so

that (3.4) implies: F[g1? ](a) = F[g2? ](a) ∀ a ∈ R

(3.5)

Since characteristic functions are injective (or Fourier transforms are injective on the space of L1 -integrable functions), this implies that g1? (a) = g2? (a) a.s. which is equivalent to g1 (a) = g2 (a) a.s.7 7 Note here that completeness is verified by using the fact that the first derivative of Λ is L1 -integrable, that its Fourier transform does not vanish on the real line, and that the unknown function to be identified is a

10

´

Therefore g0 is uniquely determined by knowledge of the function ψ defined by ψ(a) =

Λ(a+e)g0 (e)de which was shown to be itself uniquely determined by knowledge of {E(yit |xi = ´ x) = Λ(xt β0 + µ0 (x) + e)g0 (e)de}t=1,...,T ∀ x ∈ X in the first part of this proof. ´ Therefore, under Assumptions 1-3, µ0 and g0 are identified by {E(yit |xi = x) = Λ(xt β0 + µ0 (x) + e)g0 (e)de}t=1,...,T ∀ x ∈ X , which leads to the desired result since g(e − µ(x)) = ´ g0 (c − µ0 (x)) and Dci |xi =x (c|x) = e+µ(x)≤c g(e − µ(x)) under Assumption 2.

4

Identification with a Location and Scale Correlated Random Effects Model

In this section I consider an assumption on the class of distribution functions to which Dci |xi belongs that is more flexible than Assumption 2 in the previous section.

Assumption 4. (Location and Scale correlated random effects model)

Dci |xi =x (c) = G(

c − µ(x) ) σ(x)

(4.1)

where µ and σ are an unknown functions and G is an unknown continuous probability distribution function such that its derivative with respect to the Lebesgues measure, g, exists. Let Q be the support of Dσ(xi ) . Let X (σ0 ) be the support of Dxi |σ(xi )=σ0 . The next assumption is an assumption on the support of xi and the function σ(.). Assumption 5. (Full and banded conditional support of xit β0 + µ(xi )) Either σ(x) = 0 a.e. or there is σ0 ∈ Q, σ0 > 0 such that: 1. {a : a = xt β0 + µ(x), x ∈ X (σ0 )} = R. 2. For any S1 , S2 ⊂ R such that S1 ∪S2 6= R and ∃ x1 ∈ X (σ0 ) with {x1t1 β0 +µ(x1 ), x1t2 β0 + probability density function itself, so that it is also L1 -integrable. Newey and Powell (2003) or D’Haultfoeuille (2011) study the difficulty of guaranteeing completeness in more general problems. Here the structure of the Logit fixed effects model implies completeness without any additional restrictions on the unknown density function.

11

µ(x1 )} ∈ S1 × S2 , ∃ x2 ∈ X (σ0 ) with {x2t3 β0 + µ(x2 ), x2t4 β0 + µ(x2 )} ∈ S1 × S2 and x2t5 β0 + µ(x2 ) 6∈ S1 ∪ S2 , where {tj }j=1,...,5 ∈ {1, ..., T }, t1 6= t2 and t3 6= t4 . Assumption 5 is a stronger restriction on the support of xi than Assumption 3. The first part of the assumption does not restrict the distribution of xit β0 + µ(xi ) to have full support conditional on any value for σ(xi ), but it implies that there is a value of σ(xi ) for which the conditional distribution of xit β0 +µ(xi ) has full support. The second part of Assumption 5 also extends the second part of Assumption 3 in two directions: by considering the distribution of xi conditional on a specific value for σ(xi ), and by requiring that “movers” are now considered over three time periods, i.e. for any two subsets of R so that there is a point in the conditional support of xi with xt β0 + µ(x) in each subset at two different time periods, then there is a point in the conditional support of xi for which this is true as well but for which there is also a third time period where xt β0 + µ(x) falls outside of either subset. For any scalar d ∈ R and distribution density function f , define the function ψd,f : R × R+ → [0, 1]3 to be:      a ψd,f ( ) =   ´ σ

´ ´

 Λ(a + σe)f (e)de     

(4.2)

   E(yit |xi = x)  xt β0 + µ(x)    E(y |x = x)  = ψ(x −x )β ,g (  ) t s 0 is i     σ(x) E(yit yis |xi = x)

(4.3)

Λ(a + d + σe)f (e)de

Λ(a + σe)Λ(a + d + σe)f (e)de

ψd,f is defined so that: 



for t 6= s. The next lemma shows that this function ψd,f is always injective, which will be used both for showing identification of Dci |xi in this section and for defining an estimator of average partial effects in Section 6.

12

Lemma 1:

For any d ∈ R and any density distribution function with respect to the Lebesgues

measure f , ψd,f is injective. 0

0

Proof. Consider values {a, σ} and {a , σ } such that:     0 a a  ψd,f ( ) = ψd,f ( ) 0 σ σ 0

0

If σ = σ, then a = a follows immediately by the strict monotonicity in a of

(4.4)

´

Λ(a +

σe)f (e)de. 0

If σ > σ, we can show a contradiction. Indeed (4.4) implies:

ˆ

0

0

Λ(a + σe)(Λ(a + d + σ e) − Λ(a + d + σe))f (e)de ˆ 0 0 0 0 + Λ(a + d + σ e)(Λ(a + σ e) − Λ(a + σe))f (e)de = 0 ˆ 0 0 (Λ(a + d + σe) − Λ(a + d + σ e))f (e)de = 0 ˆ 0 0 (Λ(a + σe) − Λ(a + σ e))f (e)de = 0

which then implies: ˆ

0

a−a 0 0 ))(Λ(a + d + σ e) − Λ(a + d + σe))f (e)de (Λ(a + σe) − Λ(a + σ 0 σ −σ ˆ 0 a−a 0 0 + (Λ(a + d + σe) − Λ(a + d + σ 0 ))(Λ(a + σ e) − Λ(a + σe))f (e)de = 0 σ −σ We have: 0

a−a 0 0 ))(Λ(a + d + σ e) − Λ(a + d + σe)) ≥ 0 σ0 − σ 0 a−a 0 0 (Λ(a + d + σe) − Λ(a + d + σ 0 ))(Λ(a + σ e) − Λ(a + σe)) ≥ 0 σ −σ (Λ(a + σe) − Λ(a + σ

13

(4.5) (4.6)

0

0

and for any values of a, σ, a , σ , d, the functions of e defined by the left-hand-sides of (4.5) 0

and (4.6) are strictly decreasing for e <

a−a σ 0 −σ

0

and strictly increasing for e >

a−a . σ 0 −σ

Hence as

0

}, which is guaranteed by f being a density function long as f has positive mass on R\{ σa−a 0 −σ with respect to the Lebesgues measure, we have: ˆ

0

a−a 0 0 (Λ(a + σe) − Λ(a + σ 0 ))(Λ(a + d + σ e) − Λ(a + d + σe))f (e)de > 0 σ −σ ˆ 0 a−a 0 0 ))(Λ(a + σ e) − Λ(a + σe))f (e)de > 0 (Λ(a + d + σe) − Λ(a + d + σ 0 σ −σ and hence: ˆ

0

a−a 0 0 ))(Λ(a + d + σ e) − Λ(a + d + σe))f (e)de σ0 − σ ˆ 0 a−a 0 0 + (Λ(a + d + σe) − Λ(a + d + σ 0 ))(Λ(a + σ e) − Λ(a + σe))f (e)de > 0 σ −σ (Λ(a + σe) − Λ(a + σ

which is a contradiction. Result 2: Under Assumptions 1, 4, 5, there is a unique function F : R × X → [0, 1] given ´ by F (c, x) = Dci |xi =x (c|x) that satisfies {E(yit |xi = x) = Λ(xt β0 + c)dF (c, x), E(yit yis |xi = ´ x) = Λ(xt β0 + c)Λ(xs β0 + c)dF (c, x)}t,s=1,...,T ∀ x ∈ X . 0

0

Proof. Under Assumptions 1 and 4, for any t, t ∈ {1, ..., T }, t 6= t , and x ∈ X : 

   E(y |x = x) it i   xt β0 + µ(x)    E(y 0 |x = x)  = ψ(x −x )β ,g (  ) 0 t i 0   it t   σ(x) E(yit yit0 |xi = x)

(4.7)

For any set S ⊂ X , define the function h by:    0 E(y |x = x ) E(y |x = x) it i it i         0 0 0 0    = h(S) = {x ∈ X : ∃ x ∈ S, t, t ∈ {1, ..., T }, t 6= t with   E(yit0 |xi = x)   E(yit0 |xi = x ) )}     0 E(yit yit0 |xi = x) E(yit yit0 |xi = x ) (4.8) 

14

Define the sequence of sets Sk (x) = h(Sk−1 (x)), S1 (x) = {x}. By the injectivity of ψd,f , 0

0

note that ∀ x ∈ Sk (x), σ(x ) = σ(x). Since Sk (x) forms an increasing sequence of sets, its limit exists. Define S(x) = limk→∞ Sk (x). Consider a point x0 ∈ X such that {E(yit |xi = x), x ∈ S(x0 ), t = 1, ..., T } = (0, 1) and E(yit yis |xi = x) 6= E(yit |xi = x)E(yis |xi = x) ∀ x ∈ S(x0 ), t 6= s, so that {xt β0 + µ(x), x ∈ S(x0 ), t = 1, ..., T } = R and σ(x0 ) > 0. Assumption 5 implies that there exists such a point x0 . Define µ0 (x) = µ(x) − µ(x0 ) for x ∈ S(x0 ) and g0 (e) =

e−µ(x0 ) 1 σ(x0 ) g( σ(x0 ) )

for e ∈ R. Us-

ing Result 1 from the previous section, there is a unique function F1 : R × X → R × ´ R+ given by F1 (e, x) = {µ0 (x), g0 (e)} such that {E(yit |xi = x) = Λ(xt β0 + F1[1] (x) + e)F1[2] (e)de}t=1,...,T ∀ x ∈ S(x0 ) where f[j] denotes the j th element of a function f with codomain in Rk . We can rewrite:     xt β0 + µ(x) xt β0 + µ0 (x) ψ(x 0 −xt )β0 ,g ( ) = ψ(x 0 −xt )β0 ,g0 ( )  t t σ(x) σ(x) σ(x0 )

´

(4.9)

Since ψ(x 0 −xt )β0 ,g0 is injective and g0 was shown to be determined by {E(yit |xi = x) = t

σ(x) Λ(xt β0 + F1[1] (x) + e)F1[2] (e)de}t=1,...,T ∀ x ∈ S(x0 ), µ0 (x) and σ0 (x) ≡ σ(x are also 0) ´ determined for all x ∈ X by {E(yit |xi = x) = Λ(xt β0 + c)dF (c, x), E(yit yis |xi = x) = ´ Λ(xt β0 + c)Λ(xs β0 + c)dF (c, x)}t,s=1,...,T ∀ x ∈ X .

Therefore, under Assumptions 1, 4, 5, the result is shown since

5

e−µ(x) 1 σ(x) g( σ(x) )

=

e−µ0 (x) 1 σ0 (x) g0 ( σ0 (x) ).

Extension to the Ordered Logit Fixed Effects Model

The Logit fixed effects model can be extended to the case where instead of being binary, y takes a finite number of values, say {1, ..., K}, that have a natural order. Then a model of

15

the effect of x on y at a given time period t = 1, ..., T could be given by:

1[yt = k] = 1[τk−1 ≤ xt β0 + c + ut < τk ]

(5.1)

where τ0 = −∞, τ1 = 0, τK = +∞, and {τj }j=2,...,K−1 are additional parameters of the model satisfying τj > τj−1 . As for the binary model, the ordered Logit fixed effects model makes the assumption of random sampling in the cross-section, T ≥ 2, and that ut is serially independent, has a logistic distribution, and is independent of {x1 , ..., xT } and c, but does not restrict the dependence of c on x. Winkelmann and Winkelmann (1998), Das and van Soest (1999), and Baetschmann et al. (2015) showed that β0 can be estimated consistently under large n, fixed T asymptotics since for any category j, the model of the new binary dependent variable 1[yt ≤ j] conditional on observed covariates and unobserved heterogeneity is the Logit fixed effects model. Muris (2015) showed that {τj }j=2,...,K−1 can be estimated consistently as well since the new binary dependent variable 1[yt ≤ jt ] where jt ∈ {1, ..., K} and changes over time follows a Logit fixed effects model where {τj }j=2,...,K−1 can be recovered from looking at changes in the coefficients on indicator variables for each time period. For this model, partial effects would be given by the changes in the conditional probabilities of y falling in each of the K categories due to a change in the covariates:

P Ek,it = β0 (λ(xit β0 + ci − τk−1 ) − λ(xit β0 + ci − τk )) k = 1, ..., K

(5.2)

Hence we see that, as in the previous sections, the distribution of partial effects is known if the distribution of unobserved heterogeneity, c, is known in addition to β0 and {τk }k=2,...,K−1 . Since the binary dependent variable 1[yit ≤ j] follows a Logit fixed effects model, Dc|x is identified for the ordered Logit fixed effects model under the same assumptions as for the binary Logit fixed effects model in the two previous sections.

16

6

Estimation of Average Partial Effects

I do not consider fully non-parametric estimation of the distribution of partial effects here.8 Instead in this section I show that some of the results that were used for the proofs of Results 1 and 2 above can be used in order to define relatively simple estimators for average partial effects. The object of interest in this section is the average partial effect in time period t:9

AP Et = β0 E(λ(xit β0 + ci ))

(6.1)

Recall the definition in Section 2 of the function: ˆ ψ(a) =

Λ(a + e)g(e)de

(6.2)

Under Assumptions 1 and 2, we have:

E(yit |xi ) = ψ(xit β0 + µ(xi ))

(6.3)

and: 0

AP Et = β0 E(ψ (xit β0 + µ(xi )))

(6.4)

As noted in Section 3, ψ is strictly increasing, so that its inverse exists. For some s 6= t, define a1,i = (xit − xis )β0 and a2,i = E(yis |xi = x). Then under Assumptions 1 and 2: E(yit |a1,i , a2,i ) = ψ(a1,i + ψ −1 (a2,i ))

(6.5)

8 In addition to considering non-parametric estimation of the distribution of partial effects, one could also use the non-parametric identification results of Sections 3 and 4 in a heuristic argument to justify a flexible parametric estimator, as was invoked for example in Lewbel and Pendakur (2016) in a different context. 9 Average partial effects conditional on xi = x, for some value x ∈ X , can be estimated in a similar way as what is outlined in this section. I consider unconditional average partial effects here as it is the quantity that will be estimated in the empirical example of Section 7.

17

so that: AP Et = β0 E(

∂E(yit |a1,i , a2,i ) ) ∂a1,i

(6.6)

Hence from this expression, under Assumptions 1 and 2, if E(yis |xi ) and β0 were known, AP Et (x) could be estimated by the product of β0 and the average of a non-parametric estimator of the partial derivative of a conditional expectation. A feasible estimator can be obtained by replacing β0 with its Logit fixed effects estimator and replacing E(yis |xi ) by a non-parametric estimator. The asymptotic properties of the resulting estimator can be obtained as in Hahn and Ridder (2013), the details are not presented here as they are outside of the scope of this paper. Similar results are obtained under Assumptions 1 and 4 instead of Assumptions 1 and 2. Recall the the definition in Section 3 of the function:      a ψd,f ( ) =   ´ σ

´

 Λ(a + σe)f (e)de

´

   Λ(a + d + σe)f (e)de   Λ(a + σe)Λ(a + d + σe)f (e)de

(6.7)

Under Assumptions 1 and 4, for t 6= s and x ∈ X : 



   E(yit |xi = x)  xt β0 + µ(x)    E(y |x = x)  = ψ(x −x )β ,g (  ) t s 0 is i     σ(x) E(yit yis |xi = x)

(6.8)

−1 Lemma 1 shows that ψd,f is also injective. Denote by ψd,f [1] the first element of its inverse −1 and by ψd,f [2] the second element of its inverse.

For s1 , s2 6= t, s1 6= s2 , define a1,i = (xit − xis1 )β0 , a2,i = (xis2 − xis1 )β0 , a3,i = E(yis1 |xi ), a4,i = E(yis2 |xi ), a5,i = E(yis1 yis2 |xi ) and ai = {a1,i , a2,i , a3,i , a4,i , a5,i }. Under Assumptions 1 and 4 we have: ˆ E(yit |ai ) =

Λ(a1,i + ψa−1 (a3,i , a4,i , a5,i ) + ψa−1 (a3,i , a4,i , a5,i )e)g(e)de 2,i ,g[1] 2,i ,g[2]

18

(6.9)

so that: AP Et = β0 E(

∂E(yit |ai ) ) ∂a1,i

(6.10)

Therefore, as previously, an estimator can be obtained in three steps: Logit fixed effects estimation of β0 , non-parametric estimation of E(yis1 |xi ), E(yis2 |xi ), E(yis1 yis2 |xi ), and averaging a non-parametric estimator of

∂E(yit |ai ) ∂a1,i

obtained by replacing β0 and E(yis1 |xi ),

E(yis2 |xi ), E(yis1 yis2 |xi ) by their estimated counterparts. Clearly the estimator proposed under the more flexible Assumption 4 is significantly less parsimonious than the estimator proposed under Assumption 2, and sample size should be taken into account when implementing either of these estimators. As in Section 5, these estimators are easily extended to the ordered Logit fixed effects model.

7

Empirical Example: Labor Force Participation

The empirical example discussed here is taken from Section 15.8.3 in Wooldridge (2010). The object of interest is the effect of number of children and husband’s income on labor force participation. The dataset consists of 5,663 women observed over 5 periods of four months constructed by Chay and Hyslop (2001). Here the response variable is an indicator for labor force participation, and the observed covariates are a set of indicators for each time period, the number of children under the age of eighteen and the logarithm of husband’s income. Because of the moderate size of the sample, I only implement the estimator which corresponds to the formula in (6.6), and which is valid under Assumptions 1 and 2. I average partial effects over time in order to compare the results obtained with the estimator proposed in this paper to the results presented in Wooldridge (2010). For t = 2, ..., T , I set s = t − 1 to obtain the control variables in the second non-parametric step, and for t = 1 I set s = t + 1.10 Standard errors for the estimator defined here are obtained by bootstrap at 10 Since other choices of s for each t could be used for estimating the same parameters, this could be used for efficiency gains or for testing over-identifying restrictions. This is not considered here for simplicity.

19

the cross-sectional observation level. Polynomial regression was used for both non-parametric steps of the estimator, and I report results for different degrees of polynomials in each step in Table 1. For simplicity, both covariates of interest were treated as continuous covariates, although number of children could also have been treated as a discrete covariate. STATA code used to calculate these results as well as the dataset are available on the author’s website. For comparison purposes, Table 2 reports estimation results for the linear fixed effects estimator and the parametric correlated random effects Probit estimator taken from Table 15.3 of Wooldridge (2010), and which here are fairly close to the results shown in Table 1. The interpretation of the reported average partial effects is that having one more child leads to a decrease of around .03 in the probability of working, depending on which estimation method is used, and that a ten percent increase in husband’s income leads to a decrease of around .09 in the probability of working.

8

Conclusion

The results above show that the distribution of partial effects is identified in the context of the Logit fixed-effects model under relatively flexible restrictions on the distribution of unobserved heterogeneity conditional on the observed covariates and on the support of the covariates. A relatively simple estimator for average partial effects is also proposed. The issue of mapping the estimated conditional distribution of the observed data to the conditional distribution of partial effects in a fully non-parametric way as in Newey and Powell (2003) and subsequent papers is left for future work. The results presented in this note only require that the number of time periods exceeds two or three, as they are intended for typical application of Logit fixed effects with a small number of time periods. It is possible that more flexible restrictions on the conditional distribution of unobserved heterogeneity than have been studied here could be used when more time periods are available. It is also likely that the results on identification and the estimation methods proposed can be extended to other linear index models of panel data for which consistent estimators for the coefficients of the linear index exist, although the specific structure of the 20

Logit fixed effects was used here to show that the condition of completeness is satisfied in the proof of identification. Studying the extension of these results to dynamic models where covariates are not strictly exogenous would also be of interest.

References Altonji, J. G. and R. L. Matzkin (2005, July). Cross Section and Panel Data Estimators for Nonseparable Models with Endogenous Regressors. Econometrica 73 (4), 1053–1102. Arellano, M. and S. Bonhomme (2011). Nonlinear Panel Data Analysis. Annual Review of Economics 3 (1), 395–424. Baetschmann, G., K. E. Staub, and R. Winkelmann (2015, June). Consistent estimation of the fixed effects ordered logit model. Journal of the Royal Statistical Society: Series A (Statistics in Society) 178 (3), 685–703. Bester, C. A. and C. Hansen (2009, April). Identification of Marginal Effects in a Nonparametric Correlated Random Effects Model. Journal of Business & Economic Statistics 27 (2), 235–250. Browning, M. and J. M. Carro (2014, February). Dynamic binary outcome models with maximal heterogeneity. Journal of Econometrics 178 (2), 805–823. Chamberlain, G. (1984). Panel Data. In Z. Griliches and M. D. Intrigilator (Eds.), Handbook of Econometrics, pp. 1247–1313. Elsevier, Amsterdam. Chay, K. Y. and D. Hyslop (2001, May). Identification and Estimation of Dynamic Binary Response Models: Empirical Evidence Using Alternative Approaches. SSRN Scholarly Paper ID 1532468, Social Science Research Network, Rochester, NY. Chen, S. and S. Khan (2001, June). Semiparametric Estimation of a Partially Linear Censored Regression Model. Econometric Theory null (03), 567–590.

21

Chernozhukov, V., I. Fern´ andez-Val, J. Hahn, and W. Newey (2013, March). Average and Quantile Effects in Nonseparable Panel Models. Econometrica 81 (2), 535–580. Chernozhukov, V., I. Fern´ andez-Val, S. Hoderlein, H. Holzmann, and W. Newey (2015, October). Nonparametric identification in panels using quantiles. Journal of Econometrics 188 (2), 378–392. Das, M. and A. van Soest (1999). A panel data model for subjective information on household income growth. Journal of Economic Behavior & Organization 40 (4), 409–426. D’Haultfoeuille, X. (2011). On The Completeness Condition In Nonparametric Instrumental Problems. Econometric Theory 27 (03), 460–471. Gayle, G.-L. and C. Viauroux (2007, November). Root-N consistent semiparametric estimators of a dynamic panel-sample-selection model. Journal of Econometrics 141 (1), 179–212. Gayle, W.-R. (2013, August). Identification and -consistent estimation of a nonlinear panel data model with correlated unobserved effects. Journal of Econometrics 175 (2), 71–83. Hahn, J. and G. Ridder (2013, January). Asymptotic Variance of Semiparametric Estimators With Generated Regressors. Econometrica 81 (1), 315–340. Hoderlein, S. and H. White (2012, June). Nonparametric identification in nonseparable panel data models with generalized fixed effects. Journal of Econometrics 168 (2), 300–314. Lewbel, A. and K. Pendakur (2016). Unobserved Preference Heterogeneity in Demand Using Generalized Random Coefficients. Journal of Political Economy forthcoming. Muris, C. (2015). Estimation in the Fixed Effects Ordered Logit Model. Working Paper . Newey, W. K. (1994). The Asymptotic Variance of Semiparametric Estimators. Econometrica 62 (6), 1349–1382. Newey, W. K. and J. L. Powell (2003, September). Instrumental Variable Estimation of Nonparametric Models. Econometrica 71 (5), 1565–1578.

22

Winkelmann, L. and R. Winkelmann (1998, February). Why Are the Unemployed So Unhappy?Evidence from Panel Data. Economica 65 (257), 1–15. Wooldridge, J. M. (2005). Unobserved Heterogeneity and Estimation of Average Partial Effects. In Identification and Inference for Econometric Models. Cambridge University Press. Wooldridge, J. M. (2010, October). Econometric Analysis of Cross Section and Panel Data (second ed.). The MIT Press.

23

Table 1: Estimated effect of number of children and husband’s income on women’s labor force participation. APE (2,2)

APE (2,3)

APE (3,3)

APE (3,4)

APE (4,4)

APE (4,5)

-.644

-.030

-.034

-.029

-.028

-.030

-.026

(.125)

(.008)

(.010)

(.010)

(.009)

(.010)

(.010)

-.184

-.009

-.010

-.008

-.008

-.009

-.007

(.083)

(.004)

(.004)

(.004)

(.004)

(.004)

(.003)

1,055

5,663

5,663

5,663

5,663

5,663

5,663

Coefficient

kids

lhinc

number of women

Time period indicators were included in the list of covariates. The first column displays results from maximum likelihood estimation of the Logit fixed effects model. The subsequent columns display estimated average partial effects. The numbers in parenthesis are the degrees of the polynomials used for the regressions in the first and second non-parametric steps of the estimator. Standard errors for APE are obtained by bootstrap at the level of cross-sectional observations.

24

Table 2: Estimated effect of number of children and husband’s income on women’s labor force participation from Wooldridge (2010).

Linear FE

kids

lhinc

number of women

Parametric CRE Probit Coefficient

APE

-.039

-.317

-.039

(.009)

(.027)

(.008)

-.009

-.029

-.009

(.005)

(.014)

(.005)

5,663

5,663

5,663

Results taken from Table 15.3 of Wooldridge (2010) and presented here for comparison with the results in Table 1.

25

A Flexible Correlated Random Effects Approach to ...

Sep 1, 2016 - the fixed effects ordered logit model. Journal of the Royal Statistical Society: Series A. (Statistics in Society) 178(3), 685–703. Bester, C. A. and C. Hansen (2009, April). Identification of Marginal Effects in a Nonparamet- ric Correlated Random Effects Model. Journal of Business & Economic Statistics 27(2),.

306KB Sizes 0 Downloads 179 Views

Recommend Documents

Information cascades on degree-correlated random networks
Aug 25, 2009 - This occurs because the extreme disassortativity forces many high-degree vertices to connect to k=1 vertices, ex- cluding them from v. In contrast, in strongly disassortative networks with very low z, average degree vertices often con-

Information cascades on degree-correlated random networks
Aug 25, 2009 - We investigate by numerical simulation a threshold model of social contagion on .... For each combination of z and r, ten network instances.

The IPS Framework: A Flexible Approach to Loosely ...
Nov 19, 2013 - Support multiple implementations of the same physics ... Task pools and event service support ... Computer-Aided Engineering of Batteries.

AN AUTOREGRESSIVE PROCESS WITH CORRELATED RANDOM ...
Notations. In the whole paper, Ip is the identity matrix of order p, [v]i refers ... correlation between two consecutive values of the random coefficient. In a time.

A Flexible Approach to Efficient Resource Sharing in ...
multiple concurrent bursty workloads on a shared storage server. Such a situation ... minimum throughput guarantees [8, 16] (IOPS) or response time bounds [11 ...

A Random Graph Approach to NMR Sequential ...
experiments is maintained in a data structure called a spin system, which ...... which can mislead the assignment by filling in for correct or missing spin systems.

Inference in Approximately Sparse Correlated Random ...
Jul 3, 2017 - tional mean of the unobserved heterogeneity and does not attempt to relax the probit functional form, as the former is likely the most serious ...

Flexible, Random Models for Information Retrieval ...
by the development of Internet QoS, we believe that a different ... measure Internet QoS [14]. Rim is broadly re .... mann machines and the memory bus are never.

Effects of correlated variability on information ...
9 L. Borland, F. Pennini, A. R. Plastino, and A. Plastino, Eur. Phys. J. B 12, 285 1999. 10 A. R. Plastino, M. Casas, and A. Plastino, Physica A 280, 289. 2000.

Success of visual effects studio rockets with flexible IT
Benefits. • Company drives success internationally, creating stunning VFX. • Staffmaximise ... Business launches new offices ... Technology Officer, Framestore.

Success of visual effects studio rockets with flexible IT
business, supported by Dell Financial ... Framestore. Industry. Arts, Entertainment &. Media. Country. United Kingdom ... Framestore is a byword for the best.

Rally Effects, Threat, and Attitude Change: An Integrative Approach to ...
Mar 7, 2017 - emotional appraisal models, and the political science literature. Keywords: ... Alison Zisser, and by a first-year research project conducted by Laura. Scherer ... provide one possible reason why rally effects did not occur fol-. lowing

Peer Effects in the Workplace: A Network Approach
Dec 21, 2017 - endogenous and exogenous peer effects in the workplace using an explicit network approach. We begin ... exposure to peers off of the stable part of a worker's co-worker network, which is prone to .... on the returns to training, see Le

Treating participants as random vs. fixed effects
We note also that model convergence appears to be somewhat more robust for ... random effects fails to converge, it may be worthwhile to treat them as fixed ...

PDF Generalized Linear Models with Random Effects
Data Analysis Using Hierarchical Generalized Linear Models with R · Hands-On Machine Learning with Scikit-Learn and TensorFlow · Generalized Additive ...

Plug4Green: A Flexible Energy-aware VM Manager to ...
Sep 23, 2014 - implementation of 23 SLA constraints and 2 objectives aiming at ... Centre, Resource Management, Energy Efficiency, Service Level Agreement. 1. ...... power consumption of idle servers, in: Sustainable Internet and ICT for.

Flexible material
Jul 13, 2000 - (75) Inventor: David Stirling Taylor, Accrington (GB) ... 156/299; 156/300;156/301; 156/512; 156/560;. 156/308.2; 428/141; ... Sarna Xiro GmbH, EC Safety Data Sheet, Jan. 16, 2001, 5 ..... 3 is a plan vieW ofa cutter grid. FIGS.

A Random-Walk Based Scoring Algorithm with Application to ...
which contains data collected from a popular recommender system on movies .... on singular value decomposition [9], Bayesian networks [10],. Support Vector ...

Apparatus for making a random access to the reverse common ...
Sep 2, 2005 - CDMA) chie?y developed in Europe and Japan. The future lMT-2000 system is to support both voiced data and packet data communications. In this case, it is very ineffective that the packet data is assigned with an exclusive channel as the

Flexible material
Jul 13, 2000 - one side of the separate elements and the substrate or to weld the elements to the substrate. The separate elements are preferably bonded to ...

A Random-Walk Based Scoring Algorithm applied to ...
Such a trend is confirmed by the average of the ..... Then micro DOA is something like a weighted averaging of individual DOA values. .... ference on Security and Privacy. (May 2002) ... Internet Technology 5(1) (February 2005) 92–128. 35.