Journal of Econometrics 149 (2009) 52–64

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Bootstrap validity for the score test when instruments may be weak Marcelo J. Moreira a,b,∗ , Jack R. Porter c , Gustavo A. Suarez d a

Columbia University, United States

b

FGV/EPGE, Brazil

c

University of Wisconsin, United States

d

Federal Reserve Board, United States

article

info

Article history: Available online 5 November 2008 JEL classification: C12 C31 Keywords: Bootstrap t-statistic Score statistic Identification Non-regular case Edgeworth expansion Instrumental variable regression

a b s t r a c t It is well-known that size adjustments based on bootstrapping the t-statistic perform poorly when instruments are weakly correlated with the endogenous explanatory variable. In this paper, we provide a theoretical proof that guarantees the validity of the bootstrap for the score statistic. This theory does not follow from standard results, since the score statistic is not a smooth function of sample means and some parameters are not consistently estimable when the instruments are uncorrelated with the explanatory variable. © 2008 Elsevier B.V. All rights reserved.

1. Introduction Inference in the linear simultaneous equations model with weak instruments has recently received considerable attention in the econometrics literature. It is now well understood that standard first-order asymptotic theory breaks down when the instruments are weakly correlated with the endogenous regressor; cf., Bound et al. (1995), Dufour (1997), Nelson and Startz (1990), Staiger and Stock (1997), and Wang and Zivot (1998). It is then natural to apply the bootstrap to decrease size distortions of the Wald statistic (also known as the t-statistic), since the bootstrap is valid under some regularity conditions. However, these conditions, which rely on the statistics being smooth functions of sample moments and the parameters being consistently estimable, break down for the Wald statistic in the weak-instrument case. In fact, the bootstrap does not seem to perform well in decreasing the size distortions of the Wald statistic; cf., Horowitz (2001). In this paper, we show that it is valid to bootstrap the score statistic even in the weak-instrument case. Although the score is well-behaved with weak instruments, showing the validity of the bootstrap in the unidentified case has several potential pitfalls. First, the bootstrap replaces parameters with inconsistent



Corresponding author. Tel.: +1 212 854 3680; fax: +1 212 854 8059. E-mail address: [email protected] (M.J. Moreira).

0304-4076/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2008.10.008

estimators. Hence, the empirical distribution function of the residuals may differ substantially from their true cumulative distribution function, which runs counter to the usual argument for bootstrap success. Second, the score statistic is not a smooth function of sample means. In many known non-regular cases1 the usual bootstrap method fails, even in the first-order. Familiar cases from the statistics and econometrics literature include estimation on the boundary of the parameter space (Shao, 1994; Andrews, 2000) and estimating a non-differentiable function of the population mean. Commonly used fixes for bootstrap failure due to nonregularity are to use the m out of n bootstrap or subsampling. However, these methods have two limitations. First, in practice they give quite different results for different choices of the bootstrap sample (or subsample) size m. Second, they do not provide asymptotic refinements in the regular case. For instance, in the nondifferentiable example above, the function may be differentiable at some values of the population mean and non-differentiable at other values. Then, at the differentiable values, the statistic is typically regular and the usual bootstrap is not only valid but

1 A statistic is said to be regular if, when written as function of sample moments, the first derivative of this function evaluated at the population mean exists and is different from zero.

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

also provides second-order improvements. Hence, there is a tradeoff between robustness (m out of n bootstrap or subsampling) and refinements (the usual bootstrap). Lastly, subsampling does not provide a general method of controlling size uniformly in cases where the bootstrap fails (Andrews and Guggenberger, forthcoming). In this paper, we find that weak instruments are not, in general, the cause of bootstrap failure. Although parameters are not consistently estimable when instruments are weak and the score statistic is not differentiable, we show that the re-centered residual bootstrap for the score is valid regardless of instrument strength. In light of the recent negative results on the bootstrap, it is notable that the bootstrap can still work in some non-regular cases. Still, we additionally find that the higher-order improvements provided by the bootstrapped score statistic when instruments are strong do not extend to the case of weak instruments. The remainder of this paper is organized as follows. In Section 2, we present the model and establish some notation. In Section 3, we summarize some folk theorems showing the size improvements based on the bootstrap for the Wald and score tests under standard asymptotics. In Section 4, we present the main results. We establish the validity of the bootstrap for the score statistic, and show that the bootstrap will not in general provide second-order improvements in the unidentified case. In Section 5, we present Monte Carlo simulations that suggest that the bootstrap methods may lead to improvements, although in general they do not lead to higher-order adjustments in the weak-instrument case. Section 6 concludes. In Appendix A, we provide all proofs pertaining to the score statistic. In Appendix B, we provide some additional useful results and extensions. 2. The model The structural equation of interest is y1 = y2 β + u,

(1)

where y1 and y2 are n × 1 vectors of observations on two endogenous variables, u is an n × 1 unobserved disturbance vector, and β is an unknown scalar parameter. This equation is assumed to be part of a larger linear simultaneous equations model, which implies that y2 is correlated with u. The complete system contains exogenous variables that can be used as instruments for conducting inference on β . Specifically, it is assumed that the reduced form for Y = [y1 , y2 ] can be written as y1 = Z π β + v1

(2)

y2 = Z π + v2 ,

where b β2SLS = (y02 NZ y2 )−1 y02 NZ y1 and σˆ u2 = [1, −b β2SLS ]Ω [1, b −β2SLS ]0 . It is now well understood that the Wald statistic has important size distortions when the instruments may be weak. In particular, under the weak-instrument asymptotics of Staiger and Stock (1997), the limiting distribution of the Wald statistic is not standard normal. An alternative statistic is the score (LM) used by Kleibergen (2002) and Moreira (2002):



LM = S 0 T / T 0 T ,

(3)

where S = (Z 0 Z )−1/2 Z 0 Yb0 · (b00 Ω b0 )−1/2 and T = (Z 0 Z )−1/2 Z 0 Y Ω −1 a0 · (a00 Ω −1 a0 )−1/2 . The (two-sided) score test rejects the null if the LM 2 statistic is larger than the 1 − α quantile of the chi-squareone distribution. This test is similar if the errors are normal with known variance Ω , since the LM statistic is pivotal. With unknown error distribution, the score test is no longer similar. However, unlike the Wald test, the score test is asymptotically similar under both weak-instrument and standard asymptotics. In practice, the covariance matrix Ω is typically unknown, so e = Y 0 MZ Y /n: we replace it with the consistent estimator Ω

e e b0 )−1/2 , S = (Z 0 Z )−1/2 Z 0 Yb0 · (b00 Ω

e e −1 a0 · (a00 Ω e −1 a0 )−1/2 , T = (Z 0 Z )−1/2 Z 0 Y Ω p f =e LM S 0e T/ e T 0e T. e [1, −b For the Wald statistic, replace σˆ u2 by σ˜ u2 = [1, −b β2SLS ]Ω β2SLS ]0 e e f to obtain W . Below we present results for W and LM although analogous results for the known covariance case are also similarly available. 3. Preliminary results In this section, we summarize some folk theorems for the strong-instrument case. Some of the results are already known, and those that are new follow from standard results. The results in this section provide a foundation for the weak-instrument results to be presented in Section 4. For any symmetric matrix A, let v ech(A) denote the column vector containing the column by column vectorization of the nonredundant elements of A. The test statistics given in the previous section can be written as functions of Rn = v ech



Yn0 , Zn0

0

Yn0 , Zn0



 0 = f1 Yn0 , Zn0 , . . . , f` Yn0 , Zn0 , where fi , i = 1, . . . , `, ` = (k + 2) (k + 3) /2, are elements of the 0 0 0  e and LM f statistics can be written matrix Yn0 , Zn0 Yn , Zn . Both W in the form



where Z is an n × k matrix of exogenous variables having full column rank k with probability one (w.p.1) and π is a k × 1 vector. The n rows of Z are i.i.d., and F is the distribution of each row of Z and V = [v1 , v2 ]. Unless otherwise stated, we consider the case where Z is independent of V . The n rows of the n × 2 matrix of the reduced-form errors V are i.i.d. with mean zero and 2 × 2 nonsingular covariance matrix Ω = [ωi,j ]. In what follows, Xi is the i-th observation of some random vector X . For instance, Zi denotes the column vector containing the ith row of the matrix Z . The sample mean of the first n observations of X is X n . The subscript n is typically omitted in what follows, unless it clarifies exposition. Tests for the null hypothesis H0 : β = β0 play an important 0

−1

0

role in our results. Let NA = A A A A and MA = I − NA for any conformable matrix A, and let b0 = (1, −β0 )0 and a0 = (β0 , 1)0 . The commonly used (two-sided) Wald test rejects H0 for large values (of the square) of the Wald statistic:

p b β2SLS − β0 y02 NZ y2 W = , σˆ u

53

n H Rn − H (µ) ,





(4)

where µ = E (Rn ). Let k·k be the Euclidean norm and k·k∞ the supremum norm. Hereinafter, we use the following high-level assumptions: Assumption 1. π is fixed and different from zero. Assumption 2. E kRn ks < ∞ for some s ≥ 3. Assumption 3. lim sup E exp it 0 Rn < 1.





kt k→∞

Assumption 1 is key to the standard strong-instrument asymptotics. Under Assumption 1, the gradient of H evaluated

 2s

at µ differs from zero. Assumption 2 holds if E Yn0 , Zn0 < ∞. This minimum moment assumption seems too strong at first glance,but note that test statistics involve quadratic functions of Yn0 , Zn0 . Assumption 3 is the commonly used Cramér’s condition. The following result by Bhattacharya (1977) provides a sufficient condition for Assumption 3.

54

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

Lemma 1 (Bhattacharya, 1977). Let Yn0 , Zn be a random vector with values in Rk+2 whose distribution has a nonzero absolutely continuous component G (relative to the Lebesgue measure on Rk+2 ). Assume that there exists an open ball B in Rk+2 in which the density of G is positive almost everywhere. If, in B, the functions 1, f1 , . . . , f` are linearly independent, then Assumption 3 holds.

 0

In the identified case in which π is fixed and different from zero, not only are the 2SLS and LIML estimators consistent for β , but also both Wald and score statistics admit second-order Edgeworth expansions under mild conditions. As a simple application of Theorem 2 of Bhattacharya and Ghosh (1978), we obtain the following result:

e and Theorem 2. Under Assumptions 1–3, the null distributions of W f can be uniformly approximated (in x) by Edgeworth expansions: LM

f ≤ x − [Φ (x) + (a) P LM 

Ps−2 i=1



n−i/2 piLM (x; F , β0 , π) φ (x)]

∞  −(s−2)/2 = ,

o n

 P

e

(b) P W ≤ x − [Φ (x) + si=−12 n−i/2 piW (x; F , β0 , π) φ (x)] ∞  = o n−(s−2)/2 ,

where piW and piLM , i = 1, 2, are polynomials in x with coefficients depending on moments of Rn , β0 , and π . We now turn to the bootstrap. For each bootstrap sample, a test statistic is computed, which in turn generates a simulated empirical distribution for the Wald or score statistic. This distribution can then be used to provide new critical values for the test. The bootstrap sample is generated based on an estimate of β , and likewise the null hypothesized value of β is replaced by that estimate in forming the bootstrap test statistics. Given consistent estimates b β and b π , the residuals from the reduced-form equations are obtained as

b v1 = y1 − Z b πb β b v2 = y2 − Z b π. These residuals are re-centered to yield (e v1 ,e v2 ). Then Z ∗ and  ∗ ∗ v1 , v2 are drawn independently from the empirical distribution function of Z and (e v1 ,e v2 ). Next, we set y∗1 = Z ∗ b πb β + v1∗ y∗2 = Z ∗ b π + v2∗ .

We want to stress here that the simulation method above is exactly equivalent to simulating directly from the structural model y∗1 = y∗2b β + u∗ where Z ∗ and u∗ , v2 are drawn independently from the empirical

distribution function of Z and (e u,e v2 ), where e u =e v1 −e v2b β . Also, the probability under the empirical distribution function (conditional on the sample) will be denoted P ∗ in what follows. Finally, the fact that Z ∗ is randomly drawn reflects our interest in the correlated case. We do not consider the fixed Z case here, although this can be done by establishing conditions similar to those by Navidi (1989) and Qumsiyeh (1990, 1994) in the simple regression model. Of course, this entails different Edgeworth expansions and bootstrap methods. The following result shows that the bootstrap approximates the  empirical Edgeworth expansion up to the o n−(s−2)/2 order. Theorem 3. Under Assumptions 1–3,



f ≤ x − [Φ (x) + (a) P ∗ LM  = o n−(s−2)/2 ,



−i/2 i pLM (x; Fn , b β, b π )φ(x)] i =1 n ∞

Ps−2



−i/2 i pW (x; Fn , b β, b π )φ (x)] i=1 n ∞

Ps−2

The error based on the bootstrap simulation is of order n−1/2 due to the fact that the conditional moments of R∗n converge almost surely to those of Rn , and that b β and b π converge almost surely to β and π . Consequently, Theorem 3 shows that the bootstrap offers a better approximation than the standard normal approximation. 4. Main results In the previous section, we considered the strong-instrument case in which the structural parameter β is identified. Our results there are threefold: the null distribution of the Wald and score statistics can be approximated by an Edgeworth expansion up to the n−(s−2)/2 order, for some integer s; the bootstrap estimate and the (s − 1)-term empirical Edgeworth expansion for both statistics are asymptotically equivalent up to the n−(s−2)/2 order; and the error of estimation of the bootstrap is of order n−1/2 for onesided versions and of order n−1 for two-sided versions of the Wald and score tests. However, the three results in Section 3 depend crucially on Assumption 1. In this section, we address the issues above that arise in a weak-instrument setting. Formally, we consider two alternative weak-instrument assumptions that replace Assumption 1. Assumption 1A (Unidentified Case). π = 0.



Assumption 1B (Locally Unidentified Case). π = c / n for some non-stochastic k-vector c. Under Assumption 1A, β0 is replaced by an inconsistent estimator b β in the bootstrap estimates, and the score and Wald statistics are non-smooth functions of sample means. However, the standard proofs of bootstrap validity for statistics in the form (4) crucially depend on the assumption that the derivatives of functions evaluated at µ = E (Rn ) are defined and different from zero (regular case). From the known examples of bootstrap failure in nonregular cases, it is not clear whether the bootstrap actually provides valid approximations even in the first-order. In fact, similar versions of Theorems 2 and 3 have been considered to fix size distortions of the Wald test in the weak-instrument case. To see the role of the lack of differentiability in the weakinstrument results of this section, recall from Section 3 that, under the null, the score statistic can be written as a function of sample means

√  ∗





 = o n−(s−2)/2 , a.s. as n → ∞.

f= LM

y∗2 = Z ∗ b π + v2∗ ,





e ∗ ≤ x − [Φ (x) + (b) P ∗ W

n H Rn − H (µ) .





(5)

Smoothness in H allows an expansion and yields the strong instrument results of Section 3. In the unidentified case, this function is non-differentiable.2 Note that E [Vi ] = 0, so the expression in (5) can be simplified to

√ f= LM

nH Rn





√  = S0T / T 0T .

(6)

2 For some j ∈ {1, . . . , k}, let z denote the jth column of Z (and let π be the j j corresponding jth element of π ). To simplify the expressions, focus here on the known covariance case LM and consider the derivative of H (for LM) with respect to the argument of H ( Rn ) corresponding to y1 0 zj /n. For π 6= 0, this derivative takes the form πj {(b0 Ω b0 )(π 0 ΩZZ π)}−1/2 . It is easy to see that this derivative is not well-defined at π = 0. Or, consider the limit of this derivative as π approaches zero. Let c 6= 0 and consider a scalar sequence λn ↓ 0. Then the derivative evaluated at π = c λn is cj {(b0 Ω b0 )(c 0 ΩZZ c )}−1/2 and the derivative at π = −c λn is −cj {(b0 Ω b0 )(c 0 ΩZZ c )}−1/2 . Since these expressions are also equal to directional derivatives at π = 0 yet are not equal to each other, H is non-differentiable at π = 0.

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

Expressions (5) and (6) suggest two possible bootstrap meth∗ ods. The first one would use Rn and Rn , the sample means and bootstrap sample means based on residuals (without re-centering). The bootstrap statistic would be ∗

f = LM

√   n H





Rn − H Rn



.

(7)



Because H Rn is not necessarily zero here, we would need to expand (7). Since H (·) is not smooth, this bootstrap would be problematic. The second, more commonly-used bootstrap method ∗

would be based on e Rn and e Rn , respectively, the sample means and bootstrap sample means using re-centered residuals. Then the bootstrapped score statistic, as defined in Section 4.1, can be written as ∗

f = LM

√  

∗



Rn − H e Rn n H e

√ =



∗



Rn nH e



where H e Rn





 √ = S ∗0 T ∗ / T ∗0 T ∗ ,

(8)

= 0, due to the re-centered residuals. Eq. (9) pro-

4.1.1. Strong consistency The usual intuition for the bootstrap requires that the empirical distribution, from which the bootstrap sample is drawn, to be close to the distribution of the data under the null. For the model given in Eqs. (1) and (2), the empirical distribution used in bootstrap sampling depends on the residuals from these equations. These reduced-form residuals depend on the parameter b estimates through b π and b πb β . Despite the inconsistency  of β when instruments are weak, the estimated residuals vˆ 1 , vˆ 2 are close to (v1 , v2 ) in the reduced-form model if a.s.

a.s.

b π → π and b πb β → π β.

a.s.

a.s.

Then b π → π and b πb β → π β .3 The assumptions of Proposition 4 are typical high-level conditions, as in Andrews (1987), Newey and McFadden (1994), and Potscher and Prucha (1997). These assumptions are standard for showing that the unrestricted M-estimator of Π is strongly consistent. The conclusion of the proposition goes a step further. We find that the conditions for strong consistency of the unrestricted estimator of Π are also sufficient for strong consistency of the Mestimator that imposes the reduced-form restrictions, Π (θ ). This consistency is obtained despite the fact that θ itself is unidentified under weak instruments. In Appendix B, we verify that the regularity conditions of Proposition 4 hold in maximum likelihood estimation. Given these findings, we assume that b π and b πb β are strongly consistent in the bootstrap validity results of the next subsection. 4.1.2. First-order validity The derivation of bootstrap validity for the score statistic under weak instruments is divided into two steps. The score statistic is a function of the statistics e S and e T , so we first obtain the limiting distribution for e S ∗ and (re-centered) e T ∗ under weak instruments. Then, the continuous mapping theorem leads to the limiting f ∗. distribution for LM These asymptotic results are obtained despite the fact that the estimator b β is not a consistent estimator under Assumption 1A or 1B and replaces the null hypothesized value of β = β0 in e S ∗ and e T ∗ . Therefore, we have

Define b θ = (b β, b π 0 )0 to be the M-estimator that minimizes a sample criterion Qn (Π (θ )). Though θ is not identified at π = 0, the following result shows that the restricted estimator Π (b θ ) is still strongly consistent. Proposition 4. Let B be some set in R2k , Q (·) be a deterministic function, and δ be the distance induced by the Frobenius norm in R2k , and suppose the following hold: inf

Π ∈B;δ ( Π ,Π )>

(ii) sup Qn Π − Q



where aˆ = (b β, 1)0 and bˆ = (1, −b β)0 . To derive the asymptotic distributions of e S ∗ and e T ∗ , we re∗ e center T by subtracting the term





tn =

 n

Z 0Z n

Q

 Π > Q (Π );

 a.s. Π → 0;

1/2 p e ∗−1 aˆ . b π aˆ 0 Ω

We can then consider the joint limiting distribution of (e S∗, e T ∗ − tn∗ ), where

e T ∗ − tn∗ =

"

√ n

Z ∗0Z ∗

√ +

1/2

Z ∗0 Z ∗

−1/2

n

n

 −

n



(10)

The argument for bootstrap validity in Section 4.2 will rely on Eq. (10) holding. In this subsection we show that the strong consistency in (10) can be considered the norm even with weak instruments. 0 Let θ = (β, π 0 )0 ∈ Rk+1 and Π = Π (θ ) = π 0 β, π 0 ∈ R2k .

Π ∈B

inf Qn

Π ∈Bc

0 0 e e ∗ bˆ )−1/2 , S ∗ = (Z ∗ Z ∗ )−1/2 Z ∗ Y ∗ bˆ · (bˆ 0 Ω 0 0 ∗ ∗ ∗ − 1 / 2 ∗ ∗ ∗− 1 e e aˆ · (ˆa0 Ω e ∗−1 aˆ )−1/2 , T = (Z Z ) Z Y Ω

4.1. Bootstrap



n→∞

(9)

vides intuition for the first-order validity of the bootstrap based on re-centered residuals, shown formally below in Section 4.1. Re-centering the residuals allows us to just rely on the continuous mapping theorem. On the other hand, the lack of differentiability in (8) means that the standard expansion arguments in Bhattacharya and Ghosh (1978) break down in the unidentified case, which foreshadows our higher-order conclusions in Section 4.2.

(i) ∀ > 0,

(iii) lim inf

55

  Π − Qn (Π ) ≥ 0 a.s.



Z 0Z n

Z ∗0 V ∗ n

p

1/2 # p e ∗−1 aˆ b π aˆ 0 Ω

e ∗−1 aˆ Ω

e ∗−1 aˆ aˆ 0 Ω

.

To describe this limiting distribution, we require some additional notation, by Liapunov’s Central Limit Theorem and the Delta method,



d 0 n[(Z ∗ Z ∗ /n)1/2 − E (Z 0 Z /n)1/2 ]π → N (0, Σ ),

where Σ depends directly on π . In particular, define Σ = √ 0 when π = 0. For π = 0, nb π is bounded in probability and (Z ∗ 0 Z ∗ /n)1/2 − (Z 0 Z /n)1/2 has a zero conditional

3 We can relax the notion of convergence in assumptions (i) and (ii) to convergence in probability. Following Potscher and Prucha (1997), we can then obtain convergence in probability using standard subsequence arguments. However, this result is rather trivial for 2SLS and LIML estimation. If π 6= 0, then it p

p

p

is well known that b π → π and b β → β . If π = 0, then b π → 0 and b β = Op (1). As a p

p result, b π → π and b πb β → πβ for any value of π .

56

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

probability limit almost surely. Hence, the first term of e T ∗ − tn∗ is asymptotically negligible, and the second term has a joint normal limit distribution with e S∗. Lemmas in Appendix A show asymptotic normality of the following expression under various assumptions including strongand weak-instrument cases:



Z ∗0V ∗



 b b p   e ∗b  n  b b0 Ω b   ∗0 ∗  ∗− 1 e b Ω a   Z V  √  p   n e ∗−1b b a0 Ω a    √ W ∗ 0 ı W 0 ı  − n √

n

n

∈ Rk(k+1)/2 , W = (w1 , . . . , wn )0 , wi∗ = v ech Zi Zi , W = (w1 , . . . , wn∗ )0 , and ı denotes an n × 1 vector of ones. Since (e S∗, e T ∗ − tn∗ ) is a function of the expression above, the where wi = v ech Zi Zi 0 ∗

 ∗0







next result on the asymptotic distribution of these bootstrapped statistics follows. Lemma 5. Suppose that, for some δ > 0, E kZi k4+δ , E kVi k4+δ < ∞. Let b π and b β be estimators satisfying either: a.s.

a.s.

(i) Assumption 1, b β → β, b π − π → 0; or

a.s.

(ii) Assumption 1B (or 1A), b πb β − πn β → 0, b π − πn = Op (n−1/2 ), b β = Op (1). Then, the following result holds:

   e Ik d S∗ X → N 0, n e 0 T ∗ − tn∗



0 I k + Σ a0 Ω − 1 a



a.s. ,

where Xn = {(Y10 , Z10 ), . . . , (Yn0 , Zn0 )} and a = (β, 1). We are now ready to show the first-order validity of the score test

e∗0e∗ f ∗ = pS T . LM e T ∗0e T∗ This result holds regardless of the strength of the instruments.4 Theorem 6. Suppose that, for some δ > 0, E kZi k4+δ , E kVi k2+δ < ∞. Let b π and b β be estimators satisfying either a.s.

a.s.

(i) Assumption 1, b β → β, b π − πn → 0; or a.s.

(ii) Assumption 1B (or 1A), b πb β − πn β → 0, b π − πn = Op (n−1/2 ), b β = Op (1). Then, the following result holds: ∗

d

f |Xn → N (0, 1) LM 0

4.2. Edgeworth expansions Given the robustness of validity for the bootstrap score test in Theorem 6, it is natural to wonder whether the higher-order improvements under strong instruments in Theorem 3(a) carry over to the weak instrument case. Below we will show that, in fact, the bootstrap typically does not deliver higher-order improvements (in the usual sense) for the score statistic. The reason is twofold. First, the higher-order terms typically depend on b β separately from the term b πb β . Second, the higher-order terms are not necessarily continuous functions of the parameters in the unidentified case. We have noted the lack of differentiability in (8) in the unidentified case. Because standard expansion arguments rely on smoothness, higher-order improvement results for empirical Edgeworth expansions (or the bootstrap) may not hold here. For example, consider the problem of finding second-order Edgeworth f statistic when Ω is unknown but the expansions for the LM errors are normal.5 We can compute the higher-order terms using standard methods. Alternatively, we can adapt the results in Cavanagh (1983) and Rothenberg (1988) to compute the f based on a stochastic second-order Edgeworth distribution for LM expansion:

f = LM + n−1/2 Pn + n−1 Qn + Op n−3/2 , LM 

where P and Q are stochastically bounded with conditional moments pn (x) = E (Pn |LM = x) , qn (x) = E (Qn |LM = x) ,

a.s. ,

vn (x) = V (Pn |LM = x) .

where Xn = {(Y1 , Z1 ), . . . , (Yn , Zn )}. 0

2. Two alternative bootstrap methods could also be pursued. The first alternative amounts to not replacing β0 with b β . This avoids the problem of replacing the structural parameter with the inconsistent estimator b β , yet it possibly entails power losses (recall that the e.d.f. of the residuals will not be close to their c.d.f. when the true β is different from the hypothesized value β0 ). The second alternative amounts to doing OLS regressions in the reduced-form model ignoring the nonlinear constraints of the reduced-form coefficients. However, the interpretation from bootstrapping from the structural form residuals is no longer valid in the over-identified model. 3. Proposition 4 shows that the assumption of almost sure convergence of πˆ and πˆ βˆ is typical even in the unidentified case. However, we note that the proof of Theorem 6 also works for the case where πˆ and πˆ βˆ converge in probability. Then the weak convergence in the conclusion of the theorem occurs with probability approaching one rather than almost surely. Both almost-sure and in-probability conclusions correspond to modes of convergence that have been proposed for the bootstrap; cf. Efron (1979) and Bickel and Freedman (1981).

0

0

Comments. 1. Validity of the bootstrap in the (locally) unidentified case is the main result in this paper. For completeness, we also show the first-order validity under the same moment conditions for the strong instrument case. Of course, secondorder improvements with strong instruments are available under stronger assumptions, see Theorem 3(a).

4 In addition, we can use Lemma 5 to (i) show that the bootstrap is valid for the Anderson–Rubin, see Appendix B; (ii) give a formal proof that the bootstrap does not provide a first-order approximation to the Wald and LR statistics with weak instruments; and (iii) note that the fixed-T bootstrap provides an alternative method to compute critical values for the CLR test of Moreira (2003), see Moreira et al. (2004).

Proposition 7. If the errors are jointly normally distributed,  and f admits a second-order Edgeworth expansion, P LM f ≤ x can be LM approximated by

 Φ x − n−1/2 pn (x) + 0.5 · n−1   × 2pn (x) p0n (x) − 2qn (x) + vn0 (x) − xvn (x)  −1 6 up to an o n

term.

5 Although the stated results are for tests designed for the known covariance matrix case, analogous results hold when we replace Ω with its consistent f and W e statistics also admit Edgeworth e . In particular, the LM estimator Ω expansions, but with different polynomials in the higher-order terms (see Appendix A). 6 This proposition is proved in Rothenberg (1988), Appendix A.

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

Comments. 1. The terms pn (x), qn (x), and vn (x) can be approximated by functions such that the terms in the higherorder expansion are expressed exactly as powers of n−1/2 ; see Rothenberg (1988). 2. Recall that under normality the LM statistic is N (0, 1) under H0 , f statistic is not. Therefore, Proposition 7 provides a but the LM f statistic using conditional second-order correction for the LM moments of the LM statistic. In FGLS examples, Edgeworth expansions are known to correct for skewness and kurtosis due to an estimated error covariance matrix; cf. Horowitz (2001) and Rothenberg (1988). We find that this behavior carries over to the IV setting as well. The higher-order terms for the score statistic typically depend on π and Ω . In practice, we do not know π and Ω , and need to replace them with consistent estimators in the higher-order terms. As long as the higher-order polynomials are continuous functions of the parameters, empirical Edgeworth expansions (or the bootstrap) lead to higher-order improvements. However, in the next result, we find that the non-differentiability of the score statistic in the unidentified case leads to a discontinuity in the second-order term of the Edgeworth expansion at π = 0. Recall that Theorem 2 guarantees that the score statistic admits a secondorder Edgeworth expansion at any π 6= 0. Here, we actually compute the second-order terms. Corollary 8. Under Assumptions 1, 2 (with s = 3), and 3, the null f can be uniformly approximated (in x) by distribution of LM

f ≤ x) = Φ (x) + n−1/2 α2 + (α1 − α2 )x2 φ(x) + o(n−1/2 ), Pr(LM 



where

α1 =

1

α2 =

1

, 2 (b00 Ω b0 )3/2 (π 0 µZZ π )1/2 6 (b00 Ω b0 )3/2 (π 0 µZZ π )3/2

,

of ones; the remaining k − 1 columns are distributed N (0, Ik−1 ), independently from [u, v2 ].8 To examine the performance of the tests under various degrees of identification, we consider three different values of the population first-stage F-statistic, π 0 (nIk )π /k.9 ,10 In particular, we consider the completely unidentified case (π 0 (nIk )π /k = 0); a weak-instruments case (π 0 (nIk )π /k = 1); and a stronginstruments case (π 0 (nIk )π /k = 10). The Monte Carlo simulations in this section compute actual rejection rates for the two-sided version of the score and Wald tests for the hypothesis that β = 0.11 For each specification, 1000 pseudo-data sets are generated under the null hypothesis. For each pseudo-data set, we compute rejections using two different critical values: (a) the 5% critical value of the asymptotic distribution of the given test (i.e., chi-square one); (b) the bootstrapped 5% critical value computed with 1000 replications for each pseudo-data set. The simulations are designed to consider various cases of disturbance distributions. The first set of simulations restricts attention to disturbances following conventional distributions with small sample sizes. For design I, we take ui and v2i to be jointly normally distributed. For design II, we consider Wishart √ distributed disturbances. In particular, we take ui = (ξ1i2 − 1)/ 2



and v2i = (ξ2i2 − 1)/ 2, where ξ1i and ξ2i are standard normal √ random variables with correlation ρ . The next set of simulations considers nonnormal disturbances with varying numbers of instruments and large sample sizes. In these simulations we follow Kotz et al. (2000) to generate random draws of ui and v2i with zero mean, unit variance, skewness of κ3 , kurtosis of κ4 , and correlation ρ . In particular, we generate random draws for i = [1i , 2i ]0 , where the i ’s are standard normal variables, and the correlation between 1i and 2i is ρ . Then, we set 2 ui = a + b1i + c 1i + d1i3 ,

E [(Zi0 π )(Vi0 b0 )3 ]

E [(Zi0 π )3 (Vi0 b0 )3 ]

57

and

v2i = a + b2i + c  + d . 2 2i

and

µZZ = E (Zi Zi0 ). This higher-order term in general cannot be extended to be continuous at π = 0 (take π = c · n−1/2 → 0 for different vectors c). Thus, the empirical Edgeworth expansion and bootstrap7 approaches typically do not provide a n−1/2 correction and can perform poorly in the unidentified case. This result is not a weak-instrument problem, but rather a non-differentiability problem of the score test. For example, in Appendix B we show that the Anderson–Rubin statistic admits an Edgeworth expansion even when π = 0. 5. Monte Carlo simulations In this section, we use simulation to examine the size performance of the bootstrap for the Wald and score test statistics. The basic simulation model is described by Eqs. (1) and (2). The n rows of [u, v2 ] are i.i.d. with mean zero, unit variance, and correlation ρ . The correlation coefficient, ρ , represents the degree of endogeneity of y2 , and the distribution for these disturbances will vary depending on the design, as described below. We take the first column of the matrix of instruments, Z , to be a vector

3 2i

Fleishman (1978) describes the simultaneous equations system that a, b, c, d, and ρ should satisfy to yield the desired skewness (κ3 ), kurtosis (κ4 ), and correlation (ρ ) for (ui , v2i ). This family of distributions is helpful in checking the sensitivity of the rejection rates to variation in the degree of nonnormality as specified by the kurtosis and skewness of the distributions. Tables 1 and 2 report null rejection probabilities for the score and Wald tests with sample sizes of 20 and 80 observations. Bootstrapping the score test instead of using the first-order asymptotic approximation takes actual rejection rates closer to the nominal size, sometimes even in the unidentified case.12 For the smaller sample size results, the bootstrapped score is closest to the nominal 5% level when the instruments are strongest. The performance of the Wald test exhibits even more sensitivity to instrument strength. When instruments are not strong, the size distortions for both asymptotic and bootstrap critical values can be dramatic. On the other hand, when instruments are strong, rejection rates for the Wald test are much closer to the nominal size, and bootstrapping the Wald test offers improvements over

8 There is a slight difference between Moreira’s (2002, 2003) design and ours. In analogy with our results, our design takes Z as being random whereas Moreira’s (2002, 2003) design takes Z as being fixed. 9 The first-stage F-statistic corresponds to the concentration parameter λ0 λ/k in the notation of Staiger and Stock (1997). 10 Note that E (Z 0 Z ) = nI . k

7 In addition, the bootstrap has the problem of replacing β with the inconsistent estimator b β.

11 Further simulations for other values of β have revealed similar results. 0 12 We have also performed simulations using the empirical Edgeworth expansion

for the one-sided score test. Results not reported here indicate that this approximation method is outperformed by the bootstrap.

58

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

Table 1 Null rejection probabilities, nominal 5% n = 20, k = 4 (replications = 1000).

ρ

π 0 (nIk )π/k

Normal

Wishart

LM

0 0 0 0.5 0.5 0.5 0.75 0.75 0.75 0.99 0.99 0.99

0 1 10 0 1 10 0 1 10 0 1 10

Wald

LM

Wald

BS

3.84

BS

3.84

BS

3.84

BS

3.84

4.8 4.1 4.5 5.8 4.2 4.6 6.1 4.3 4.9 5.9 4.5 5.1

8.0 7.4 6.5 9.1 6.4 6.6 7.6 6.5 6.3 7.6 6.5 6.5

0.0 1.3 3.4 12.0 13.0 5.7 42.7 27.9 7.6 95.2 35.4 9.1

0.5 2.4 4.9 15.4 14.1 7.4 48.7 32.6 10.6 99.1 57.2 14.2

7.9 6.2 5.8 7.4 6.9 6.5 7.5 6.9 7.0 9.0 7.0 7.0

11.9 9.5 9.5 11.3 10.4 9.7 12.8 9.7 10.5 13.3 10.3 10.6

0.9 1.7 5.7 14.5 9.3 6.3 39.0 22.7 8.2 93.7 31.7 9.0

2.0 4.0 8.7 2.1 14.6 8.8 50.7 29.2 12.3 98.3 51.3 15.2

• BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results. Table 2 Null rejection probabilities, nominal 5% n = 80, k = 4 (replications = 1000).

ρ

π 0 (nIk )π/k

Normal

Wishart

LM

0 0 0 0.5 0.5 0.5 0.75 0.75 0.75 0.99 0.99 0.99

0 1 10 0 1 10 0 1 10 0 1 10

Wald

LM

Wald

BS

3.84

BS

3.84

BS

3.84

BS

3.84

5.8 5.5 5.2 6.4 5.6 5.5 6.0 4.8 6.4 5.5 4.9 5.4

6.3 6.1 5.8 7.1 5.9 6.0 6.8 5.4 6.4 5.9 5.2 5.3

0.0 0.1 4.3 12.8 16.0 6.9 46.3 29.5 7.7 95.2 29.3 7.7

0.0 1.3 4.6 15.9 13.8 6.9 47.9 31.4 9.1 98.9 54.3 12.2

5.7 5.6 5.1 5.3 5.3 5.5 5.8 5.8 4.8 6.2 7.2 7.2

6.6 6.0 5.6 6.0 6.0 6.2 6.4 6.1 6.0 6.7 7.7 8.0

0.2 0.3 4.7 10.8 11.2 5.6 44.2 26.1 5.9 95.4 28.6 7.6

0.3 1.4 5.0 14.0 12.9 6.7 49.2 28.5 9.0 98.8 56.9 12.9

• BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results. Table 3 LM test: null rejection probabilities, nominal 5% (1000 replications).

ρ

π 0 (nIk )π k

n = 100, k = 4

κ3 = 0 κ4 = 6 0 0 0 0.5 0.5 0.5 0.75 0.75 0.75 0.99 0.99 0.99

0 1 10 0 1 10 0 1 10 0 1 10

n = 250, k = 4

κ3 = 6 κ4 = 0

κ3 = 6 κ4 = 6

κ3 = 0 κ4 = 6

n = 1000, k = 4

κ3 = 6 κ4 = 0

κ3 = 6 κ4 = 6

κ3 = 0 κ4 = 6

κ3 = 6 κ4 = 0

κ3 = 6 κ4 = 6

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

6.8 5.1 4.8 5.7 5.6 5.3 6.4 4.5 4.7 5.0 4.3 4.5

7.3 5.4 4.8 6.0 6.0 5.5 6.9 4.5 5.5 5.2 4.5 4.4

6.2 5.4 4.3 6.2 4.9 5.0 5.5 4.7 4.7 5.5 4.8 4.6

6.9 6.0 5.1 6.4 5.6 5.4 6.2 5.5 5.4 6.2 5.5 5.4

4.7 5.8 5.0 5.5 5.6 5.3 5.6 5.1 4.9 4.5 4.1 4.2

5.3 5.8 6.1 6.5 6.1 5.5 6.2 5.9 5.7 5.4 4.2 4.8

5.3 4.6 4.2 5.1 5.4 5.0 5.2 5.3 5.2 6.1 5.9 4.7

5.1 4.5 4.3 4.6 5.4 5.3 5.0 5.1 5.2 6.0 5.7 5.1

4.9 3.8 3.8 5.5 4.4 4.0 5.5 4.4 4.0 5.5 4.4 5.9

5.0 4.2 3.5 5.7 4.5 4.1 5.7 4.5 4.1 5.8 4.5 6.0

4.9 4.6 3.8 6.4 4.5 4.3 5.2 4.9 4.2 5.6 5.1 5.8

5.2 5.2 3.8 6.2 4.9 4.4 5.6 5.0 4.1 5.9 5.2 5.8

4.4 6.3 6.1 3.8 5.2 5.1 4.8 4.8 4.5 4.3 4.9 4.9

4.5 6.5 6.1 3.6 4.9 5.3 4.5 5.2 5.3 4.5 5.1 5.2

5.2 5.6 6.0 5.2 5.8 5.9 4.3 5.0 4.7 4.3 5.0 4.8

5.2 5.5 5.7 5.3 5.6 5.9 4.2 4.8 4.9 4.2 4.8 4.9

4.3 4.0 5.1 4.5 4.4 5.2 4.6 4.6 4.7 4.6 4.8 5.0

4.2 4.0 5.1 4.5 4.2 5.5 4.2 4.1 4.8 4.3 4.5 5.0

• BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results.

first-order asymptotics. The poor behavior of the bootstrap for the Wald test with weak instruments is explained, as previously, by its dependence on π . For the remaining designs, we focus on the behavior of the score test. Table 3 gives the results for the score test with sample sizes n = 100, 250, 1000 and varying skewness and kurtosis values. The number of instruments is fixed at four for this table. For the smallest sample sizes, the bootstrapped critical values generally provide some improvements over the asymptotic values, with the exception of the high correlation (ρ = .99), strong-instrument

designs. In the high skewness, high kurtosis designs, the bootstrap performs especially well. It is also notable that some of the largest bootstrap gains occur in the unidentified case. With larger sample sizes, the rejection rates close in on the nominal 5% values, and the bootstrap and asymptotic rejection rates mirror each other. Table 4 considers designs with a large number of instruments (k = 50). Otherwise, Table 4 designs are identical to Table 3 designs. With a large number of instruments, the bootstrap gains over the asymptotic critical value are dramatic at the smallest sample size. Even with the middle-range sample size (n = 250),

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

59

Table 4 LM test: null rejection probabilities, nominal 5% (1000 replications).

ρ

π 0 (nIk )π k

n = 100, k = 50

κ3 = 0 κ4 = 6 0 0 0 0.5 0.5 0.5 0.75 0.75 0.75 0.99 0.99 0.99

0 1 10 0 1 10 0 1 10 0 1 10

n = 250, k = 50

κ3 = 6 κ4 = 0

κ3 = 6 κ4 = 6

κ3 = 0 κ4 = 6

n = 1000, k = 50

κ3 = 6 κ4 = 0

κ3 = 6 κ4 = 6

κ3 = 0 κ4 = 6

κ3 = 6 κ4 = 0

κ3 = 6 κ4 = 6

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

BS

3.84

9.8 8.9 6.1 10.5 7.3 4.7 8.3 4.6 3.7 10.3 3.1 3.6

16.4 13.3 7.7 16.9 10.3 5.5 15.9 7.9 4.6 16.9 4.3 4.4

10.1 6.3 5.5 9.7 6.6 5.3 8.5 5.9 5.5 8.5 5.9 5.5

16.2 7.9 5.9 16.0 7.8 6.1 16.9 7.8 5.9 16.9 7.8 5.9

9.1 7.6 6.9 9.3 6.6 5.9 8.7 8.0 5.3 8.9 8.2 5.5

15.2 11.2 8.5 17.6 12.0 6.9 16.9 10.9 6.8 16.6 11.2 6.8

6.7 6.4 5.8 6.7 6.6 6.0 6.9 6.3 5.6 6.9 4.5 4.7

7.2 7.2 5.6 8.1 7.6 6.5 7.7 7.1 5.7 6.9 5.2 5.1

7.0 5.1 5.9 7.1 5.2 5.4 6.2 5.2 5.9 6.2 5.3 5.9

8.3 5.3 6.0 8.7 5.4 5.7 8.5 6.0 6.0 8.6 6.0 6.0

6.9 5.0 4.6 6.1 5.1 5.6 6.3 4.5 5.5 6.3 4.6 5.8

8.4 5.9 5.0 8.3 5.9 6.1 8.6 5.4 5.7 8.4 5.5 5.8

5.6 5.8 6.2 5.5 5.8 6.1 6.1 5.7 5.5 5.5 5.8 6.1

6.6 6.2 6.2 6.1 6.4 6.3 6.1 5.6 5.6 6.1 6.4 6.3

5.2 5.1 4.9 5.0 5.0 4.9 5.7 5.3 4.9 5.7 5.4 4.9

5.5 5.3 5.1 5.2 5.5 4.7 5.8 5.0 4.6 5.9 5.0 4.6

5.8 5.0 5.1 5.7 4.8 5.0 5.7 5.9 5.3 5.7 5.8 5.2

5.9 4.6 5.3 5.4 4.8 5.2 5.8 5.9 5.1 5.9 5.8 5.5

• BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results.

the bootstrap consistently outperforms the asymptotic critical values for the various designs. For the largest sample size (n = 1000), the rejection rates for both methods are converging to the nominal value of 5%. Still, even with n = 1000, the bootstrap provides some improvement for 25 of the 36 designs and has equivalent performance to asymptotics for another six of the designs. For the strong instrument cases, the simulation results in these tables are consistent with the higher-order improvements expected of the bootstrap from the theoretical findings in Sections 3 and 4. It is also notable that some of the largest gains for the bootstrap occur in the unidentified designs. In Theorem 6, the first-order validity of the score test is established, but from Section 4.2, we know that these bootstrap gains cannot be explained by traditional Edgeworth expansion arguments. These simulations, then, suggest an interesting path for future research.13 6. Conclusion It is well-known that in the strong-instrument case, the Wald statistic (and the score statistic) are smooth functions of sample means and the bootstrap provides higher-order improvements. In the unidentified case, the statistics are, in general, non-regular, and the standard proofs for the validity of the bootstrap break down. Despite the known fragility of the bootstrap in non-regular cases (Shao, 1994; Andrews, 2000), this paper provides a positive result that bootstrapping the score statistic is, in fact, valid. As a negative result, the bootstrap for the score does not, in general, provide standard improvements in the weak-instrument case. This is due to the structural parameter not being consistently estimable and the higher-order polynomials of the Edgeworth expansions not being necessarily continuous in the unidentified case. Nevertheless, this discontinuity due to non-differentiability of the score statistic can be quite interesting, given that little is known about expansions when the statistic is not smooth. In the words of Wallace (1958): ‘‘The assumption H 0 (µ) 6= 0 and its equivalent for functions of several moments rule out many interesting functions for which no general theory of asymptotic expansions is known.’’ Our approach can, in principle, be extend to obtain results for the unidentified case in the GEL and GMM contexts; cf.

13 There has been some recent work that supports bootstrap improvements beyond standard Edgeworth expansion arguments, e.g., Goncalves and Meddahi (forthcoming) on realized volatility.

Guggenberger and Smith (2005), Otsu (2006), Stock and Wright (2000), and Brown and Newey (forthcoming). Inoue (2006), and Kleibergen (2006) present some simulations and results which indicate that the bootstrap can lead to size improvements for the unidentified case also in the GMM context. Our theoretical results should be adaptable to those cases by analyzing GMM and GEL versions of the two sufficient statistics for the simple simultaneous equations model analyzed here. Acknowledgement Discussions with Tom Rothenberg were very important for this paper, and we are indebted for his insights and generosity. We also thank the participants at Boston College, Boston University, Cornell, Harvard-MIT, Maryland, Montreal, North Carolina State, Rice, Texas A&M, Texas Austin, and USC–UCLA seminars, and at the NSF Weak Instrument Conference at MIT and the Semiparametrics in Rio conference organized by FGV/EPGE. Moreira and Porter gratefully acknowledge the research support of the National Science Foundation via grant numbers SES-0418268, SES-0819761, and SES-0438123. Suarez acknowledges support from the LosAndes-Harvard Fund and Banco de la Republica. This paper represents the views of the authors and does not necessarily represent those of the Federal Reserve System or members of its staff. Appendix A. Proofs Proof of Theorem 2. First, we prove part (a). Under H0 ,

 0 −1 0  −1 q 0 0 0 e a0 / b 0 Ω e b0 Z Y /n Ω √ b0 Y Z /n Z Z /n f= n q LM e −1 (Y 0 Z /n) (Z 0 Z /n)−1 (Z 0 Y /n) Ω e −1 a0 a00 Ω with

e = Y 0 Y /n − Y 0 Z /n Ω



Z 0 Z /n

−1

Z 0 Y /n .



f can be re-written as The expression for LM √

f= LM

n H Rn − H (µ) ,





where H is a real-valued Borel measurable function on R` such that H (µ) = 0. All the derivatives of H of order s and less are continuous in the neighborhood of µ. Using Assumptions 2 and 3, the result follows Theorem 2 of Bhattacharya and Ghosh (1978). Corollary 8 gives the expression for the second term.

60

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

The proof for part (b) is analogous to the proof for part (a). The Wald statistic equals −1 0 −1 0 √ (y0 Z /n Z 0 Z /n Z y2 /n)−1/2 y02 Z /n Z 0 Z /n Z (y1 − y2 β0 ) /n e= n 2 p W , 0 b b e [1, −β2SLS ]Ω [1, −β2SLS ] where

b β2SLS = (y02 Z /n Z 0 Z /n

−1

Z 0 y2 /n)−1 y02 Z /n Z 0 Z /n

−1

Z 0 y1 /n.

Like the score statistic, the Wald statistic can be written as

√ e= W

n H Rn − H (µ)





under H0 , where H is a real-valued Borel measurable function such that H (µ) = 0. All the derivatives of H of order s and less are continuous in the neighborhood of µ. The result then follows by Theorem 2 of Bhattacharya and Ghosh (1978). 

Proof of Proposition 4. The function ρ (θ1 , θ2 ) = δ (Π (θ1 ) , Π (θ2 )) is nonnegative, symmetric, and satisfies the triangle inequality. Therefore, ρ (θ1 , θ2 ) is a pseudometric. The pseudometric ρ (θ1 , θ2 ) induces an equivalence relation — the metric identification — that converts the pseudometric space into a metric space. Define θ1 ∼ θ2 if ρ (θ1 , θ2 ) = 0, and let Θ ∗ = Θ / ∼ and ρ ∗ ([θ1 ] , [θ2 ]) = ρ (θ1 , θ2 ). Then (Θ ∗ , ρ ∗ ) is a metric space. Define Qn ([θ ]) = Qn (Π (θ)) and Q ∗ ([θ ]) = Q ∗ (Π (θ)). Define B∗ = {[θ ] : θ ∈ Θ , Π (θ )} = {[θ ] : θ ∈ Θ , θ ∈ Π −1 (B)} . From Assumption (i), we obtain that ∀ > 0, inf Q∗ [θ ]∈B∗ ;ρ ∗ ([θ ],[θ ])>

"

a.s.

conditional on Xn = {(Y10 , Z10 ), . . . , (Yn0 , Zn0 )}. Here, Zn∗ has probability 1/n in taking the values Zn , and Yn∗ has probability 1/n in taking the values

e Yn = Zn b πb a+e Vn = Z n b π (b β, 1) + e Vn .

 e Rn = v ech (e Yn0 , Zn0 )0 (e Yn0 , Zn0 ) .

d≤kt k≤enδ

√  ∗   sup P ∗ n e Rn − e Rn ∈ A A∈A # Z " s−2 X −i/2 − 1+ n Pi (−D : Fn ) φV (x) dx A i=1  is o n−1 a.s. as n → ∞ for every class A of Borel subsets of R` satisfying, for some ϑ > 0,  sup ΦV ((∂ A)ε ) = O ε ϑ as ε ↓ 0. A∈A







f ∗ follows as in Reduction of the expansion of n1/2 e Rn − e Rn to LM Bhattacharya and Ghosh (1978) once we realize that ∗

√   



a.s.

metric, this does not imply that b θn → θ . However, by construction of the pseudometric ρ , we have

δ(Π (b θn ), Π (θ)) ≡ ρ(b θn , θ ) → 0. a.s.

Because δ is a metric, we obtain the desired result: Π (b θn ) → Π (θ ).  a.s.

Lemma A1. Suppose b πb β − πn β → 0. If, for some δ > 0, E kZi k2+δ < ∞, E kvi k2+δ < ∞, then for j = 1, . . . , k and m = 1, 2, ∗ 2+δ E ∗ [|Zj∗,i vm ] is bounded a.s. ,i | ∗



2+δ

E [|vm,i |

= 0 (due to re-centered residuals).

=

]. For j = 1, . . . , k, n 1X

n i =1

E ∗ [|Zj∗,i |2+δ ]

a.s.

|Zj,i |2+δ → E [|Zj,i |2+δ ].

1 Let v m = 1n i=1 vm,i , m = 1, 2, and Z = n i=1 Zi . Using Minkowski and Cauchy–Schwartz inequalities, we obtain

Pn

E ∗ [|v1∗,i |2+δ ] = n−1

n X

Pn

|˜v1,i |2+δ

i=1

= n−1

n X

0 |v1,i − v 1 − Zi − Z (b πb β − πn β)|2+δ

i=1

( ≤ C1 n−1

n X

|v1,i − v 1 |2+δ + n−1

i=1

( ≤ C2 n−1

n X

n X

) 0 | Zi − Z (b πb β − πn β)|2+δ

i=1

|v1,i − v 1 |2+δ

n X

2+δ

Zi − Z 2+δ + b πb β − πn β n−1

i=1

for a large enough constants C1 and C2 . An analogous result holds for E ∗ [|v2∗,i |2+δ ]:

( E



[|v2∗,i |2+δ ]

≤ C2 n−1

n X

|v2,i − v 2 |2+δ

+ kb π − πn k 

)

i=1

i=1

  ∗ n H e Rn − H e Rn

with H e Rn

definition of ρ ∗ , we obtain that ρ(b θn , θ ) → 0. Because ρ is not a

E ∗ [|Zj∗,i |2+δ ] =

b Fn (t ) ≤ 1 −  a.s.

Since the rows e R∗n are i.i.d. (conditionally given Xn ) with common distribution Fn , one can proceed as in Bhattacharya (1987) to show that

f = LM

a.s.

∗ 2+δ Proof. By independence, E ∗ [|Zj∗,i vm ] ,i |

Following Lemma 2 of Babu and Singh (1984), there exists for each d > 0 positive numbers  and δ such that n→∞

From (11) and (12), it follows that ρ ∗ ([b θn ], [θ ]) → 0. By

a.s.

The re-sampling mechanism for e Yn and Zn and the re-centering procedure for b V of subtracting samples means reflect the fact that Z and V are independent. If Z and V were uncorrelated, it would entail different drawing mechanisms and re-centering procedures. But the essence of the proofs for the bootstrap presented here would remain the same. Let b Fn be the Fourier transform of Fn and

sup

inf Qn [θ ]∈(B∗ )c

n→∞

 e R∗n = v ech (e Yn∗0 , Zn∗0 )0 (e Yn∗0 , Zn∗0 )

(12)

#   ∗ θ − Qn ([θ ]) ≥ 0 a.s.



lim inf

and let Fn be the distribution of

lim sup

  a.s.   θ − Q ∗ θ → 0 and



 Rn = v ech (Yn0 , Zn0 )0 (Yn0 , Zn0 )

(11)

From Assumptions (ii) and (iii), we obtain sup Qn∗ [θ ]∈B∗

Proof of Theorem 3. Let F be the distribution of

  θ > Q ∗ ([θ ]) .

2+δ

n

−1

) n X

2+δ

Z i − Z . i =1

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

Using the Minkowski inequality again, we get

(

n

n− 1

δ

≤ C5 n− 2

)

n

X X

2+δ

Zi − Z 2+δ ≤ C1 n−1 kZi k2+δ + Z i =1

i=1 a.s.

   → C1 E kZi k2+δ + kE [Zi ]k2+δ ,

a.s.

using Z → kE [Zi ]k ≤ E kZi k ≤ (E kZi k2+δ )1/(2+δ) by Jensen’s inequality. Similarly, using the Minkowski inequality again, we obtain

(

n

n− 1

X

)

n

|vm,i − v m |2+δ ≤ C1 n−1

i =1

X 2+δ vm,i + |v m |2+δ i =1

a.s.

n o 2+δ , → C1 E vm,i a.s.

a.s.

as v m → 0. Since b πb β − πn β → 0, the term n−1 bounded a.s. 

P

|˜vm,i |2+δ is



∈ Rk(k+1)/2  ∗ ∗0

Recall the following notation wi = v ech Zi Zi 0

i

and W = (w1 , . . . , wn ). Similarly, let wi∗ = v ech Zi Zi and W ∗ = (w1∗ , . . . , wn∗ ). Also let Ωww = Var(wi ) and let ı be an n × 1 vector of ones. Lemma A2. If, for some δ > 0, E kZi k4+δ < ∞, E kVi k2+δ < ∞, then

 

Z ∗0V ∗



 p √   n 0Ω b eb   b b   ∗ 0 ∗  e −1 Ω b a   Z V  Xn  √ p   n e −1b b a0 Ω a      √ W ∗ 0 ı W 0 ı  n − n n    d I ⊗ E (Zi Zi0 ) 0 → N 0, 2 0 Ωww b b

" b J =

 √ / n,

0 v1∗,i , v2∗,i

# e −1b Ω a p ,p . b eb e −1b b b0 Ω b a0 Ω a







Z 0Z n







eww d , c + d0 Ω

0 (wP i − w) (wi − w) . n ∗ 2+δ (c) Finally, we need to show that limn→∞ i=1 E [|Xn,i | ] = 0 a.s. Let (p)j denote the j-th entry of a vector p and cm,j denote the j-th entry of the vector cm , m = 1, 2; that is, cm,j = (cm )j .

eww = n−1 is finite a.s., where Ω

Pn

i=1

We have n X

E ∗ |Xn,i |2+δ





δ

e ) = max κ1 (Ω

n

X

E ∗ |c 0 (b J 0 Vi∗ ⊗ Zi∗ )|2+δ + |d0 wi∗ − w |2+δ



e σ11 , 2 e σ11e σ22 − e σ12

s

e σ22 2 e σ11e σ22 − e σ12

) ,

(14)

e. where e σij is the (i, j)-th entry of Ω This bound follows from the fact that

k11 k12

k12 k22



τ1 √ τ 0 K τ ≤ sup τ

and

τ=

  τ1 , τ2

s

s  τ 0 e1 e01 τ k22 . = τ 0K τ k11 k22 − k212

Given the bound in (14), the conclusion of Lemma A1, and the Pfact n that E kZi k4+δ < ∞ is sufficient to bound E ∗ wli∗ − (n−1 j=1

wli )|2+δ almost surely, the final condition of the Liapunov Central bww converges a.s. to its Limit Theorem now follows because Ω

 







i=1

   δ ≤ C4 n− 2 E ∗ |c10 Zi∗ (b J 0 Vi∗ )1 |2+δ + |c20 Zi∗ (b J 0 Vi∗ )2 |2+δ + |d0 wi∗ − w |2+δ

Z ∗0V ∗



 b b p   n ∗ 0 b eb   b Ω b  ∗ 0 ∗  e ∗−1  Ω b a  Z V   Xn  √ p   n 0Ω e ∗−1b b a a    √  W ∗ 0 ı W 0 ı   n − n n    d I ⊗ E (Zi Zi0 ) 0 → N 0, 2 0 Ωww √

a.s.

Proof. Noting that V ∗ = [v1∗ v2∗ ], we can rewrite the first two terms of the expression above:

 

i=1

≤ C3 n− 2 n−1

(13)

Lemma A3. If, for some δ > 0, E kZi k4+δ < ∞, E kVi k2+δ < ∞, then

(a) E ∗ Xn,i = 0 follows from independence and E ∗ Vi∗ = 0. (b) By independence,

 

(s

positive definite limits.

We use the Cramér–Wald device and verify the conditions of the Liapunov Central Limit Theorem.

  E ∗ Xn2,i = n−1 c 0 I2 ⊗

(Ω −1 a) 2 e b p b e −1b a0 Ω a

where K is a symmetric positive definite matrix. Then, the following holds:

a.s.

b b



(Ω −1 β a)1 −b e b , p , and p b b e −1b eb a0 Ω a b0 Ω b

are almost always well-defined. These terms are also bounded by e ), where κ1 (Ω



= is the i-th bootstrap draw of the (rePn centered) reduced-form residuals, w = n−1 i=1 wi , and where Vi

1 p , b 0 eb bΩ b

K =





e and Ω e −1 The vectorsb a andb b both have one as an element, and Ω converge almost surely to positive definite limits. So, regardless of the value of πn or b β , the terms

and the following claim (which holds regardless of the value of π ). Let

Xn,i = c 0 ( b J 0 ⊗ Ik ) Vi∗ ⊗ Zi∗ + d0 wi∗ − w ∗

for large enough constants C3 , C4 , and C5 .

e −1b e −1 Ω eΩ e −1b b a0 Ω a =b a0 Ω a,

Proof. Let (c 0 , d0 )0 be a nonzero vector with c = (c10 , c20 )0 ∈ R2k and d ∈ Rk(k+1)/2 . Define



61

  2+δ  2+δ k X c c (Ω e −1b   a)1 1,j 2,j  p · + p  E ∗ |Zj∗,i v1∗,i |2+δ  j=1 b e −1b eb b a0 Ω a b0 Ω b   2+δ 2+δ  c (−b c (Ω e −1b  ∗ ∗ 2+δ  β) a)2 1,j 2,j ∗  +  p + p  E |Zj,i v2,i | b b eb e −1b b0 Ω b a0 Ω a ! 2+δ  (k + 1)k/2 n  X 1X ∗ ∗ + dl E wl,i − wj,i  n j=1 l=1

Z ∗0V ∗



b b

p √  n  b0 Ω e ∗b b  ∗ 0 ∗  eb∗− Ω 1b a  Z V p √ n e ∗−1b b a0 Ω a



 ∗ 0 ∗  Z v1   √n  ∗0   J ⊗ Ik )   = (b  Z ∗ 0 v2∗  √ n

  , 

62

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

where

" b J∗ =

˜∗

T − tn =

# e ∗−1b Ω a p ,p . b e ∗b e ∗−1b b a b0 Ω b a0 Ω

"





n

 √

Z ∗0 Z ∗ n

n

Since



Z ∗ 0 v1∗

     d  n   Xn → N 0, Ω ⊗ E (Zi Zi0 )  Z ∗ 0 v ∗  √ 2

"



a.s.

n

Z ∗0Z ∗

by the Liapunov CLT by an argument similar to the proof of a.s.

Lemma A2, it will suffice to show thatb J ∗ −b J |Xn → 0 a.s. Notice that E ∗ [Z ∗ 0 Z ∗ /n] = Z 0 Z /n. So by the Markov Law of Large Numbers,

a.s. Xn → 0 a.s. − n a.s.

 −1

Z ∗0 V ∗

e ∗−1b √ Ω a n p . e ∗−1b b a0 Ω a

 −

Z 0Z n

1/2 # q e ∗−1b b π b a0 Ω a ∗0 ∗

is uncorrelated with the term Z √Vn . Using the fact that Z ∗ 0 Z ∗ /n|Xn

p √ a.s. a.s. a.s. b ∗−1b → E (Zi Zi0 ) a.s., b π → π and b a0 Ω a − a0 Ω −1 a|Xn → 0 a.s., d (S˜ ∗0 , (T˜ ∗ − tn∗ )0 )0 |Xn → N (0, I2k + Σ a0 Ω −1 a) a.s. "

√ n

Z ∗0Z ∗

0

a.s.

a.s.

1/2

 −

n

Moreover, Z Z /n → E (Zi Zi ) which is positive definite, and so Z ∗ 0 Z ∗ /n

n

1/2 # q e ∗−1b b π b a0 Ω a

For case (ii) in which Assumption 1B (or 1A) holds, the term

Z 0Z

0

1/2

n

n

n

−1/2

Z 0Z

We shall prove (i) first. By Lemma A3,



Z ∗0Z ∗

 −

n

b b

+

1/2

Z ∗0Z ∗

Z 0Z n

1/2 # q b ∗−1b a0 Ω a b π b

|Xn → E (Zi Zi0 )−1 a.s. Similarly, Z ∗ 0 V ∗ /n|Xn → 0 E ∗ [Z ∗ V ∗ /n] = 0 a.s. Also, E ∗ [V ∗ 0 V ∗ /n] = e V 0e V /n, and a.s. a.s. ∗0 ∗ 0e 0e e e V V /n − V V /n|Xn → 0 a.s. By standard arguments V V /n → Ω a.s. a.s. e→ e∗ − Ω e |Xn → and Ω Ω . Hence Ω 0 a.s. 0 e ∗b −1/2 b eb Consider the terms (b Ω b) and (b b0 Ω b)−1/2 from b J ∗ and b J. Let Ω denote a generic value of the covariance matrix. As argued in the proof of Lemma A2, |(b b0 Ω b b)−1/2 | and |b β(b b0 Ω b b)−1/2 | can b be bounded by κ1 ( Ω ) for all b. For any c > 1, there exists a neighborhood of Ω such that for Ω in the neighborhood, κ1 ( Ω ) < c κ1 (Ω ). Note that ∂ σ¯∂ (b b)−1/2 = −(b b0 Ωb b)−3/2 /2. So, for large b0 Ωb 11

is p conditionally asymptotically negligible since

∂ 1 0 b −1/2 3 3 b ( b Ω b ) ∂ σ¯ ≤ 2 κ1 ( Ω ) < 2κ1 (Ω ) a.s.

S˜ ∗ |Xn → N (0, Ik ) a.s.

enough n,

11

e with λ ∈ [0, 1]. The same bound applies for Ω = λΩ + (1 − λ)Ω taking the partial derivative with respect to the other terms of Ω . It follows by the mean value theorem that for large enough n, e∗

1 1 e∗ − Ω e k a.s. −p p ≤ 8κ1 (Ω )3 kΩ b 0 ∗ 0 b b b e e bΩ b b Ωb e ∗b eb The same bound applies to | − b β(b b0 Ω b)−1/2 − (−b β)(b b0 Ω b)−1/2 |. A similar argument can be used to bound the terms of e ∗−1b a √Ω e ∗−1b b a0 Ω a

j = 1, 2,



e −1 a qΩ b . Let κ2 ( Ω ) e −1b b a0 Ω a

= max{σ¯ 11 , σ¯ 22 , κ1 ( Ω )}. Then, for

(Ω ∗−1 e −1b a)j (Ω a) j e b −p p b e ∗−1b e −1b b a0 Ω a a0 Ω a e ∗−1 − Ω e −1 k a.s. ≤ 8(κ2 (Ω )3 + κ2 (Ω ))kΩ a.s.

It follows that b J∗ − b J |Xn → 0 a.s. and by Slutsky’s Theorem the result holds.  Proof of Lemma 5. The result is a direct application of the Delta Method and the limiting distribution given in Lemma A3 (and noting the zero covariances between the three components in the normal limiting distribution). We have

˜∗

S =

(Z

∗0 Z ∗ −1/2 Z ∗0 V ∗ √ b n n

) p e ∗ bˆ bˆ 0 Ω

ˆ



nb π, b β and

b ∗−1b b a0 Ω a are bounded in probability. It then follows that d ∗0 ˜ ∗ ˜ (S , (T − tn∗ )0 )0 |Xn → N (0, I2k ) a.s.  Proof of Theorem 6. Consider part (i) first. We write



S˜ ∗0 T˜ ∗ / n f∗ = q . LM T˜ ∗0 T˜ ∗ /n



a.s.

Under (i), we have T˜ ∗ / n|Xn → E (Zi Zi 0 ) Lemma A3,

1/2

π (a0 Ω −1 a)1/2 a.s. By

d

d



f |Xn → N (0, Ik ) a.s. As a result, LM For part (ii), we write √



f = q LM

S˜ ∗0 (tn∗ + T˜ ∗ − tn∗ )/ n

(tn∗ + T˜ ∗ − tn∗ )0 (tn∗ + T˜ ∗ − tn∗ )/n ∗

d

f |Xn → N (0, 1) a.s. By Lemma 5(ii), LM

.



f as a function H of R¯ = Proof of Corollary 8. Write the score LM Y 0 Y /n, Z 0 Z /n, Z 0 Y /n . Let L(yy, zz , zy) = (yy − zy0 (zz )−1 zy), M (yy, zz , zy) = L(yy, zz , zy)−1 zy0 (zz )−1 zy and define the function H (yy, zz , zy) to be the following expression: a00 M (yy, zz , zy)b0

. [a0 M (yy, zz , zy)L(yy, zz , zy)−1 a0 ]1/2 [b00 L(yy, zz , zy)b0 ]1/2 √ f = nH (R¯ ). Then LM Let r = (yy, zz , zy), µZZ = E (Z 0 Z /n), µYY = E (Y 0 Y /n) = a0 π 0 µZZ π a00 + Ω , µZY = E (Z 0 Y /n) = µZZ π a00 , and µ = (µYY , µZZ , µZY ). Elements of the matrices in r are denoted yyij , zzlm , zyrs , where i, j, s = 1, 2 and l, m, r = 1, . . . , k. 0

Let hyyij =

∂ H (r )|r =µ ∂ yyij

hyyij zzlm =

∂2 H (r )|r =µ ∂ yyij ∂ zzlm

for i, j = 1, 2 and l, m = 1, . . . , k. Similarly, define for appropriate index ranges hzzlm , hzyrs , hyyij yylm , hyyij zyrs , hzzlm zzrs , hzzlm zyrs , and hzylm zyrs .

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

Also, let Yt , Zt , and Vt denote the tth observation (row) from the matrices Y , Z , and V . The ith element of each row of observations is denoted Yi,t and Zi,t . Define

k

A24 = 3

and similarly for other σ· notation. √ From Hall (1992, Theorem 2.2), Pr( nH (R¯ )/σ ≤ c ) = Φ (c ) + n−1/2 p1 (c )φ(c ) + o(n−1/2 ), where p1 (c ) = −A1 σ −1 −

1 6

A2 σ −3 (c 2 − 1),

σ is the asymptotic variance of

" 2 X 2 X 2 X 2 X

A1 =

2

k X k X k X k X

+

hyyij yylm σyyij yylm

A2 =

Q

k X k X k X 2 X

# hzzij zylm σzzij zylm

.

i=1 j=1 l=1 m=1

Note that the last term in the numerator of H (r ) is zy · b0 and µZY b0 = µZZ π a00 b0 = 0. It follows that hyyij = 0, hzzij = 0, hyyij yylm = 0, hzzij zzlm = 0, and hyyij zzlm = 0. The other terms in the formula for A1 can be obtained directly, yielding A1 = −

E [Zh 0 π (Vh 0 b0 ))3 ]

1

2 (b0 Ω b0 )3/2 (π 0 µZZ π )1/2 0

A2 = A21 + A22 + A23 + A24 , where

=

k X 2 X k X 2 X k X 2 X

hzyij hzylm hzyrs σzyij zylm zyrs

i=1 j=1 l=1 m=1 r =1 s=1 E Zh 0 3 Vh 0 b0 3

[( π ) ( ) ] (b00 Ω b0 )3/2 (π 0 µZZ π )3/2

A22 = 6

k X 2 X k X 2 X 2 X 2 X k X 2 X

= − A23 = 6

(b00 Ω b0 )3/2 (π 0 µZZ π )1/2

. 

Y − Z Π 0 Y − Z Π  Π = . n 

 0  Π = Ω + Π − Π ΩZZ Π − Π , 

is minimized at Π = Π . Hence, Assumption (i) holds for any compact set B (additional algebra shows that Π is in fact uniquely identified for Rk×2 ). The second-order derivative of Qn (π1 , π2 ) ≡ Qn (Π ) is

∂ 2 Qn (π1 , π2 )  ∂π1 ∂π 0 1 ∇ 2 Qn (π1 , π2 ) =   ∂ 2 Qn (π1 , π2 ) ∂π1 ∂π20 

 ∂ 2 Qn (π1 , π2 ) ∂π2 ∂π10   2 ∂ Qn (π1 , π2 )  ∂π2 ∂π20

∂ 2 Qn (π1 , π2 ) ∂π1 ∂π10 = 2Z 0 Z (y2 − Z π2 )0 (y2 − Z π2 ) − 2Z 0 (y2 − Z π2 ) (y2 − Z π2 )0 Z ∂ 2 Qn (π1 , π2 ) ∂π1 ∂π20 = 4Z 0 (y1 − Z π1 ) (y2 − Z π2 )0 Z − 4Z 0 (y2 − Z π2 ) (y1 − Z π1 )0 Z 2 ∂ Qn (π1 , π2 ) ∂π2 ∂π20 = 2Z 0 Z (y1 − Z π1 )0 (y2 − Z π1 ) − 2Z 0 (y1 − Z π1 ) (y1 − Z π1 )0 Z .

hzyij hzylm hyypq zyrs σyypq zyij σzylm zyrs

∂ 2 Qn (π1 , π2 ) ∂ 2 Qn (π1 , π2 ) c 0 ∇ 2 Qn (π1 , π2 ) c = c10 c1 + c20 c2 0 0 ∂π1 ∂π1

) ] [( π )( (b0 Ω b0 )3/2 (π 0 µZZ π )1/2

∂π2 ∂π2

2 ∂ Qn (π1 , π2 ) 0 ∂ Qn (π1 , π2 ) c + c c1 2 2 ∂π1 ∂π20 ∂π2 ∂π10 2

0

k X 2 X k X 2 X k X k X k X 2 X

Π

The function Qn (π1 , π2 ) is convex (a.s.) because for any nonzero vector c = (c10 , c20 )0 ∈ R2k , c 0 ∇ 2 Qn (π1 , π2 ) c > 0 (a.s.):

i=1 j=1 l=1 m=1 p=1 q=1 r =1 s=1 3E Zh 0 Vh 0 b0 3

i=1 j=1 l=1 m=1 p=1 q=1 r =1 s=1

= 0

3E [(Zh 0 π )(Vh 0 b0 )3 ]

where the partial derivatives are given by

.

The formula for A2 is given in Hall (1992) and involves considerably more terms than A1 . Given the length of that formula, we will take advantage of the simplifications hyyij = 0, hzzij = 0, hyyij yylm = 0, hzzij zzlm = 0, and hyyij zzlm = 0 to state A2 more compactly.

A21 =

(b00 Ω b0 )3/2 (π 0 µZZ π )3/2



where ΩZZ = E (Zi Zi0 ). Because ΩZZ is positive definite, Q

hyyij zylm σyyij zylm

i=1 j=1 l=1 m=1

+2

E [(Zh 0 π )3 (Vh 0 b0 )3 ]

This objective function converges a.s. to the continuous function hyyij zzlm σyyij zzlm

i=1 j=1 l=1 m=1

+2

hzyij hzylm hzypq zyrs σzypq zyij σzylm zyrs

i=1 j=1 l=1 m=1 p=1 q=1 r =1 s=1

In this section, we first provide a verification of the regularity conditions from Proposition 4 for maximum likelihood estimators. The second subsection contains an Edgeworth expansion result for the Anderson–Rubin statistic regardless of instrument strength.

Qn

hzyij zylm σzyij zylm

2 X 2 X k X 2 X

2

We give an example here in which Assumptions (i)–(iii) of Proposition 4 hold for any closed (bounded) sphere B containing Π . Consider the objective function for maximum likelihood estimation with normal errors:

hzzij zzlm σzzij zzlm

2 X 2 X k X k X

k

Hence,

i=1 j=1 l=1 m=1

+2

2

B.1. Strong consistency of maximum likelihood

i=1 j=1 l=1 m=1 k X 2 X k X 2 X

63 k

Appendix B. Additional results

i=1 j=1 l=1 m=1

+

2



nH, and expressions for A1 and A2 are derived below. f is already standardized, First it should be noted that since LM it is straightforward to show (using the expression in Hall (1992)) that σ = 1. Then, p1 (·) yields the desired expansion. The following expression for A1 is directly from Hall (1992). 1

k

XXXXXXXX

= 0.

σyyij zzlm = E [(Yi,t Yj,t − µYYij )(Zl,t Zm,t − µZZlm )] σyyij zzlm zyrs = E [(Yi,t Yj,t − µYYij )(Zl,t Zm,t − µZZlm )(Zr ,t Ys,t − µZYrs )]

2

+ c10 hzyij hzylm hzzpq zyrs σzzpq zyij σzylm zyrs

= c10

2 ∂ 2 Qn (π1 , π2 ) 0 ∂ Qn (π1 , π2 ) c + c c2 > 0, 1 2 ∂π1 ∂π10 ∂π2 ∂π20

64

M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

using the Cauchy–Schwartz inequality. By Theorem 10.8 of Rockafellar (1970), pointwise convergence of convex functions implies uniform convergence of any compact set B. Therefore, Assumption (ii) holds. To show Assumption (iii), we follow the proof of Theorem 2.7 b of Newey and McFadden (1994). Consider the maximum and Π on the compact set B. This estimator is consistent for Π . Then the b , Π ) <  so that Qn (Π b ) ≤ Qn ( Π ) for any Π ∈ B event that δ(Π has probability one. In this event, for any Π Ď outside B, there is b + (1 − λ) Π Ď that lies in B. By a linear convex combination λΠ b convexity, we obtain Qn (Π ) ≤ Qn (Π Ď ). The result now follows a.s.

b ) → Q (Π ). from Qn (Π

B.2. Edgeworth expansion for the Anderson–Rubin statistic Lemma 9. Let B be any class of Borel Sets satisfying

Z sup B∈B

(∂ B)ε

φA (v) dv = O ()

as  ↓ 0,

(15)

where φA is the pdf of a mean zero normal distribution with variance A, ∂ B is the boundary of B, and (∂ B)ε is the ε -neighborhood of B. If Assumptions 1 or 1A, 2, and 3 hold, then

Z  sup P (Sn ∈ B) − ψs,n (v) dv = o n−(s−2)/2 B∈B B

under H0 : β = β0 and ψs,n is a formal Edgeworth expansion of order s − 2 for Sn . Proof. Under H0 , the statistics S can be written as



n Z 0 Z /n

−1/2



n H Rn



S =

=

(Z 0 V /n)b0 · (b00 Ω b0 )−1/2  − H (µ)

for a measurable mapping H from R` onto R2k with derivatives of order s and lower being continuous in the neighborhood of µ. The result follows from Bhattacharya and Ghosh (1978, p. 437). An analogous result holds fore S, albeit the Edgeworth expansion would have different polynomials for the higher-order terms.  Theorem 10. Let Gk (x) and gk (x) be respectively the cdf and pdf of a chi-square-k distribution. Under Assumptions 1 or 1A, 2, and 3, the null distributions of AR can be uniformly approximated (in x) by P (AR ≤ x) = Gk (x) +

r X

n−i piAR (x; F , β0 , π) gk (x) + o n−r .



i=1

Proof. We want to approximate P (AR ≤ x) = P Sn0 Sn ≤ x



uniformly in x. This expression can be written as P (Sn ∈ Cx ), for the convex sets C x = s ∈ R k ; s0 s ≤ x .





Using Corollary 3.2 of Bhattacharya and Rao (1976), we can show that sup Φ ((∂ Cx )ε ) ≤ d (k) ε, x∈R

where d (k) is a function of only k and ε > 0. Hence, Lemma 9 holds. Finally, the result follows from integration and the fact that the odd terms of ψs,n are even. 

References Andrews, D.W.K., 1987. Consistency in nonlinear econometric models: A generic uniform law of large numbers. Econometrica 55, 1465–1471. Andrews, D.W.K., 2000. Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica 68, 399–405. Andrews, D.W.K., Guggenberger, P., 2008. Asymptotic size and a problem with subsampling and the m out of n bootstrap. Econometric Theory (forthcoming). Babu, J., Singh, K., 1984. On one term edgeworth correction by effron’s bootstrap. Sankhya, Series A 46, 219–232. Bhattacharya, R.N., 1977. Refinements of the multidimensional central limit theorem and applications. Annals of Probability 5, 1–27. Bhattacharya, R.N., 1987. Some aspects of edgeworth expansions in statistics and probability. In: Puri, M.L., Vilaplana, J.P., Wertz, W. (Eds.), New Perspectives in Theoretical and Applied Statistics. New York, John Wiley and Sons, pp. 157–170. Bhattacharya, R.N., Ghosh, J., 1978. On the validity of the formal Edgeworth expansion. Annals of Statistics 6, 434–451. Bhattacharya, R.N., Rao, R., 1976. Normal approximation and asymptotic expansions. In: Wiley Series in Probability and Mathematical Analysis. Bickel, P., Freedman, D., 1981. Some asymptotic theory for the bootstrap. Annals of Statistics 9, 1196–1217. Bound, J., Jaeger, D., Baker, R., 1995. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variables is weak. Journal of American Statistical Association 90, 443–450. Brown, B., Newey, W., 2004. GMM, efficient booststrapping, and improved inference. Journal of Business and Economic Statistics (forthcoming). Cavanagh, C., 1983. Hypothesis Testing in Models with Discrete Dependent Variables, Ph.D. Thesis, UC Berkeley. Dufour, J.-M., 1997. Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica 65, 1365–1388. Efron, B., 1979. Bootstrap methods: Another look at the jacknife. The Annals of Statistics 7, 1–26. Fleishman, A., 1978. A method for simulating nonnormal distributions. Psychometrika 43, 521–532. Goncalves, S., Meddahi, N., 2008. Bootstrapping realized volatility. Econometrica (forthcoming). Guggenberger, P., Smith, R., 2005. Generalized empirical likelihood estimators and tests under partial, weak and strong identification. Econometric Theory 21, 667–709. Hall, Peter, 1992. The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York. Horowitz, J., 2001. The bootstrap. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of Econometrics. North-Holland, New York, pp. 3159–3228. Inoue, A., 2006. A bootstrap approach to moment selection. Econometrics Journal 9, 48–75. Kleibergen, F., 2002. Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica 70, 1781–1803. Kleibergen, F., 2006. Expansions of GMM statistics and the bootstrap. Brown University, Unpublished Manuscript. Kotz, S., Balakrishnan, N., Johnson, N., 2000. Continuous multivariate distributions, 2nd ed. John Wiley and Sons, New York. Moreira, M.J., 2002. Tests with Correct Size in the Simultaneous Equations Model, Ph.D. Thesis, UC Berkeley. Moreira, M.J., 2003. A conditional likelihood ratio test for structural models. Econometrica 71, 1027–1048. Moreira, M.J., Porter, J., Suarez, G., 2004. Bootstrap and higher order expansion when instruments may be weak, NBER Working Paper t0302. Navidi, W., 1989. Edgeworth expansions for bootstrapping regression models. The Annals of Statistics 17, 1472–1478. Nelson, C., Startz, R., 1990. Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58, 967–976. Newey, W., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engle, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics. Elsevier Science, Amsterdam, pp. 2111–2245. Otsu, T., 2006. Generalized empirical likelihood inference for nonlinear and time series models under weak identification. Econometric Theory 22, 513–527. Potscher, B., Prucha, I., 1997. Dynamic Nonlinear Econometric Models. SpringVerlag. Qumsiyeh, M., 1990. Edgeworth expansion in regression models. Journal of Multivariate Analysis 35, 86–101. Qumsiyeh, M., 1994. Bootstrapping and empirical Edgeworth expansions in multiple linear regression models. Communication in Statistics — Theory and Methods 23, 3227–3239. Rockafellar, T., 1970. Convex Analysis. Princeton University Press, Princeton. Rothenberg, T.J., 1988. Approximate power functions for some robust tests of regression coefficients. Econometrica 56, 997–1019. Shao, J., 1994. Bootstrap sample size in nonregular cases. Proceedings of the American Mathematical Society 122, 1251–1262. Staiger, D., Stock, J.H., 1997. Instrumental variables regression with weak instruments. Econometrica 65, 557–586. Stock, James H., Wright, J., 2000. Gmm with weak identification. Econometrica 68, 1055–1096. Wallace, D., 1958. Asymptotic approximations to distributions. The Annals of Mathematical Statistics 29, 635–654. Wang, J., Zivot, E., 1998. Inference on a structural parameter in instrumental variables regression with weak instruments. Econometrica 66, 1389–1404.

Bootstrap validity for the score test when instruments ...

Nov 5, 2008 - ... almost everywhere. If, in B, the functions 1,f1,...,fl ...... also thank the participants at Boston College, Boston University,. Cornell, Harvard-MIT ...

537KB Sizes 0 Downloads 160 Views

Recommend Documents

The Black-White Test Score Gap Through Third Grade
We are grateful to Cecelia Rouse and the participants of the Brown V. Board of Education Conference at. Princeton University for helpful comments and suggestions. Correspondence can be addressed to Fryer at. Department of Economics, Harvard Universit

Bartik Instruments: What, When, Why, and ...
Page 1 ... or the Federal Reserve Board. All errors are our own. ...... the leave-one-out Bartik instrument as the feasible Bartik instrument less the own-location.

Tests With Correct Size When Instruments Can Be ...
Sep 7, 2001 - also would like to thank Peter Bickel, David Card, Kenneth Chay, ..... c. If π unknown and P contains a k-dimensional rectangle, the two-sided.

The ML Test Score: A Rubric for ML Production ... - Research at Google
lead to surprisingly large amounts of technical debt [1]. Testing and ... rapid, low-latency inference on a server. Features are often derived from large amounts of data such as streaming logs of incoming data. However, most of our recommendations ap

The ML Test Score: A Rubric for ML Production ... - Research at Google
As machine learning (ML) systems continue to take on ever more central roles in real-world production settings, ... machine learning models in real world systems [6]. Those rules are complementary to this rubric, which ...... professional services an

Revision in the validity of NCFM test fees to 90 days from the ... - NSE
Jun 28, 2018 - This revision shall not apply to the Candidates who have made payment for NCFM Test on or before June 30, 2018. For avoidance of doubt, ...

official score report - test taker copy
Feb 24, 1982 - GMAT. Graduate Management Admission Test. Last (Family) Name: First (Given) Name: Middle Name: Telephone Number: E-mail Address:.

Validity of the phase approximation for coupled ...
original system. We use these results to study the existence of oscillating phase-locked solutions in the original oscillator model. I. INTRODUCTION. The use of the phase dynamics associated to nonlinear oscil- lators is a .... to the diffusive coupl

The validity of collective climates
merged; thus the number of clusters prior to the merger is the most probable solution' (Aldenderfer ..... Integration of climate and leadership: Examination of.

The Concept of Validity - Semantic Scholar
very basic concept and was correctly formulated, for instance, by. Kelley (1927, p. 14) when he stated that a test is ... likely to find this type of idea in a discussion of historical con- ceptions of validity (Kane, 2001, pp. .... 1952), the mornin

The Concept of Validity - Semantic Scholar
one is likely to build ever more complicated systems covering different aspects of .... metaphysics changes the rules of the game considerably. In some highly ...

What's your ML Test Score? A rubric for ML production ... - eecs.tufts.edu
Using machine learning in real-world software systems is complicated by a host of issues not found ... "Harry" and "Potter" and they account for 10% of all values.

Football Score Tracker for Kids.pdf
Page 1 of 1. Football Score Tracker for Kids. Touchdown = 6 Extra Point = 1 Field Goal = 3 Safety = 2. Team 1_______________________. Team 2_______________________. www.healthyhappythriftyfamily.com. Page 1 of 1. Football Score Tracker for Kids.pdf.

The Bootstrap for Regression and Time-Series Models
Corresponding to a bootstrap resample χ∗ is a bootstrap replication ...... univariate bootstrap estimation of bias and variance for an arbitrary statistic, theta. It.

The Handbook of Financial Instruments
Aug 1, 2001 - Fixed Income Securities, Second Edition by Frank J. Fabozzi ..... the various types of financial assets or financial instruments. ... the city of Philadelphia, and the government of France. ...... ple, if an asset has a beta of 1.5, it

TIMI score for Unstable Angina.pdf
Page 1 of 1. Age 65. 3 CAD risk factors. (FHx, HTN, chol, DM, active. smoker). ST deviation 0.5 mm. cardiac markers. Recent ( 24H) severe angina.

test name departmental test for gazetted officers of the ... - gunturbadi
Aug 3, 2013 - The candidates whose Registration Numbers given below are provisionally declared to have PASSED the Test. REGISTRATION NUMBERS.

Validity of Edgeworth expansions for realized volatility ...
4 Oct 2015 - sizes: n = 23400, 11700, 7800, 4680, 1560, 780, 390 and 195, corresponding to “1-second”, “2-second”,. “3-second” ..... m,iBm,i]∣∣ ≤. C m and. E[|Zm,i|2(3+δ)] + E[|. √. mBm,i|3+δ] ≤ C. (iii) For all r > 0, there e

JBernstein: A Validity Checker for Generalized ...
processors, and mixed analog/ digital circuits. Despite .... each term is smaller than the cardinality (i.e., every k is the unique signature of each Bernstein ... create an internal Boolean variable field isUnknown that is initially set to false. Du

The European Investment Bank (EIB) instruments for ...
Feb 2, 2015 - European Union's long-term lending bank set up in 1958 by the Treaty of Rome. The EIB contributes to the realization of investment projects that further the economic, social and political cooperation priorities of the EU. Shareholders: