1

Statistics & Decisions 28, 1001–1011 (2011) / DOI 10.1524/stnd.2011.1078 c R. Oldenbourg Verlag, M¨unchen 2011 °

3

A note on moment convergence of bootstrap M-estimators

4

Kengo Kato

5

Received: October 25, 2009; Accepted: October 27, 2010

2

11

Summary: This paper studies the consistency of bootstrap moment estimators for a general Mestimator. We establish a theorem on the uniform integrability of the bootstrap M-estimator, thereby giving sufficient conditions for the consistency of the bootstrap moment estimators. As an application of our theorem, we provide sufficient conditions for the consistency of the bootstrap variance estimator for the quantile regression estimator, which has been considered as an important unsolved problem in the literature. We also discuss a justification of a bootstrap information criterion.

12

1 Introduction

6 7 8 9 10

13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

The bootstrap introduced by Efron (1979) is a convenient general method for making statistical inference. It is well known that under suitable regularity conditions, the bootstrap is consistent in estimating the distribution of a general M-estimator (see Arcones and Gin´e, 1992; Wellner and Zhan, 1996). The distributional consistency of the bootstrap, however, does not imply the consistency of the bootstrap moment estimators, where “the bootstrap moment estimators” mean the corresponding conditional moments of the bootstrap estimator given the sample. In this paper, we study the consistency of the bootstrap moment estimators for a general M-estimator. Our framework allows for non-smooth objective functions such as the absolute value function or more generally the “check” function used in quantile regression (Koenker and Bassett, 1978). We establish a theorem on the uniform integrability of the bootstrap M-estimator, thereby giving sufficient conditions for the consistency of the bootstrap moment estimators. There is a vast literature on the consistency of bootstrap moment estimators. Shao and Tu (1995) reviewed some earlier results on this topic. More recently, Gonc¸alves and White (2005) proved the consistency of the bootstrap variance estimator for the least squares estimator in the time series context (more precisely, we should say “the bootstrap covariance matrix estimator” rather than “the bootstrap variance estimator”; however, we would use the latter term for convenience). For general M-estimation that allows for nonsmooth objective functions, (primitive) conditions for the consistency of the bootstrap moment estimators do not appear to be available. AMS 2010 subject classification: 62F40, 62E20 Key words and phrases: Bootstrap, M-estimator, moment convergence, quantile regression

1002

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

55 56 57 58

59 60 61 62 63 64 65 66 67 68 69 70

71 72 73

Of particular interest is the consistency of the bootstrap variance estimator for the quantile regression estimator. √ Under suitable regularity conditions, the quantile regression estimator meets the n-asymptotic normality (see Koenker, 2005, Ch.4). The distinctive point of the quantile regression estimator is that the asymptotic covariance matrix depends on the unknown conditional density. Therefore, the bootstrap variance estimation is particularly convenient to the quantile regression case as it can avoid the nonparametric density estimation. Hahn (1995) proved the distributional consistency of the bootstrap quantile regression estimator but did not study the consistency of the bootstrap variance estimator. A motivation to study the consistency of the bootstrap variance estimator to the quantile regression case also comes from the observation of Buchinsky (1995) who compared several inference methods for quantile regression models based on the Monte Carlo study. Buchinsky (1995) reported that inference based on the bootstrap variance estimator performs quite well in his numerical examples. It is thus of interest to study some theoretical justification of the bootstrap variance estimation to the quantile regression case. Gonc¸alves and White (2005, p.972) remarked that “establishing theoretical results that justify the application of the bootstrap to variance estimation for the quantile regression estimator is an important area of future research.” This paper gives an answer to a more general problem than what Gonc¸alves and White (2005) posed, since it considers a general moment rather than the second moment and general M-estimation that includes quantile regression as a special case. We give in Section 3 (relatively) primitive sufficient conditions for the consistency of the bootstrap variance estimator for the quantile regression estimator. Another important application is a justification of the extended information criterion (EIC) proposed by Ishiguro et al. (1997), in which the bias term of the information criterion is estimated by the bootstrap. We also give a brief discussion on conditions for the consistency of the bootstrap bias estimator in Section 3. A closely related topic is the moment convergence of an M-estimator. Nishiyama (2010) established sufficient conditions for the moment convergence of a general Mestimator by using a connection to the convergence rate theorem of van der Vaart and Wellner (1996, Theorem 3.2.5). The approach taken by this paper is based on the technique used in the same convergence rate theorem. Thus, the present result may be viewed as a bootstrap version √ of Nishiyama’s (2010) result, although Nishiyama allows for √ a rate different from n while this paper focuses on the case where the estimator is nconsistent. However, it should be noted that we are dealing with the convergence of conditional moments of the bootstrap M-estimator, which we believe is sufficiently different from Nishiyama’s topic to make this paper non-trivial and to require a separate treatment. Yoshida (2010) also tackled the moment convergence problem of an M-estimator in a different approach. The rest of the paper consists of two sections. In Section 2, we present the main result including the proofs. In Section 3, we consider an application of our theorem to the quantile regression case and EIC.

1003

74

75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114

2 Main result Let X, X1 , . . . , Xn be an independent sample from a distribution P on a measurable space (X , A). For each θ ∈ Θ, which is assumed to be a (Borel) measurable subset of Rd , let mθ : X → R be a known function. We assume the joint measurability of the map (x, θ) 7→ mθ . We consider the M-estimator θˆn := arg minθ∈Θ Mn (θ) where Pn Mn (θ) := n−1 i=1 mθ (Xi ). We assume the existence of a measurable solution θˆn , which is satisfied, for instance, when Θ is compact and the map θ 7→ mθ is continuous (see Jenrich, 1969, Lemma 2). Suppose for a while that M (θ) := E[mθ (X)] is minimized at θ0 ∈ Θ and twice continuously differentiable at θ0 with nonsingular second derivative matrix A, and that there exists a vector-valued measurable function m ˙ θ0 : X → Rd such that E[km ˙ θ0 (X)k2 ] < 0 2 2 ∞ and E[{mθ (X) − mθ0 (X) − (θ − θ0 ) m ˙ θ0 (X)} ] = o(kθ − θ0 k ) as θ → θ0 . ˆ we Then, under some additional regularity conditions including the consistency of θ, Pn √ ˆ d −1 −1 −1 −1/2 have n(θn −θ0 ) = −A {n ˙ θ0 (Xi )}+op (1) → N (0, A BA ) where i=1 m B := E[m ˙ θ0 (X)m ˙ θ0 (X)0 ] (see, for instance van der Vaart and Wellner, 1996, Sec.3.2). Statistical inference on θ0 based on the asymptotic distribution often requires a consistent estimator of the asymptotic covariance matrix C := A−1 BA−1 . However, in some cases such at the quantile regression case, the estimation of the asymptotic covariance matrix turns out to be a non-trivial issue. In such cases, the bootstrap gives a convenient alternative to the estimation of the asymptotic covariance matrix. Let X1∗ , . . . , Xn∗ denote a bootstrap sample, i.e., an independent sample from the empirical distribution of X1 , . . . , Xn . WeP consider the bootstrap M-estimator θˆn∗ := n ∗ ∗ ∗ −1 arg minθ∈Θ Mn (θ) where Mn (θ) := n i=1 mθ (Xi ). A bootstrap estimator of the ∗ ∗ ˆ ˆ θˆ∗ − θ) ˆ 0 | X1 , . . . , Xn ]. It is asymptotic covariance matrix is given by Cˆ := E[n(θn − θ)( n √ known that under suitable regularity conditions, the conditional distribution of n(θˆn∗ − ˆ given the sample converges weakly to N (0, C) in probability (see below the definiθ) tion of the conditional weak convergence in probability). The distributional consistency, however, does not imply of Cˆ ∗ . We study conditions under which condi√ ˆ∗the consistency ˆ tional moments of n(θn − θn ) given the sample converge in probability to those of the limiting distribution, given the distributional consistency of the bootstrap estimator. Pn We note that M∗n (θ) can be written as M∗n (θ) = n−1 i=1 Wni mθ (Xi ) where Wni is the number of times that Xi is redrawn from the original sample. The vector (Wn1 , . . . , Wnn )0 has multinomial distribution with parameters n and (probabilities) n−1 , . . . , n−1 . The randomness of bootstrap quantities (such as θˆn∗ ) comes from the randomness of both X1 , . . . , Xn and Wn1 , . . . , Wnn . As in van der Vaart and Wellner (1996, Sec. 3.6.1), we view X1 , X2 , . . . as the coordinate projection on the first countably infinite coordinates of the product space (X ∞ , A∞ , P ∞ ) × (W, C, Q) and let the triangular sequence {Wni : i = 1, . . . , n; n = 1, 2, . . . } depends on the last factor only. We assume that θˆn∗ is chosen such that it is a measurable map from the product space to Rd , which is often satisfied by suitable primitive regularity conditions. Let EW [·] denote the expectation with respect to Wni (i = 1, . . . , n; n = 1, 2, . . . ) conditional on X1 , X2 , . . . In this paper, we presume the distributional consistency of θˆn∗ since there are several available results on that topic (see Arcones and Gin´e, 1992; Hahn, 1995; Wellner and

1004

Zhan, 1996). For completeness, we clarify the concept of the conditional weak convergence in probability. Put Dn := {X1 , . . . , Xn }. Recall the bounded Lipschitz metric on the space of distributions (see van der Vaart and Wellner, 1996, p.73). Let Tn∗ be some scalar statistic of X1 , . . . , Xn and Wn1 , . . . , Wnn . We say that the conditional distribution of Tn∗ given Dn converges weakly to some fixed distribution (ν, say) in probability if the bounded Lipschitz metric between the two distributions converges in probability to zero, i.e., ¯ ¯ Z ¯ ¯ p sup ¯¯EW [g(Tn∗ )] − gdν ¯¯ → 0, g∈BL1

115 116 117 118 119

120 121 122

where BL1 is the set of all functions on R with Lipschitz norm bounded by one. The next lemma gives a sufficient condition for the consistency of conditional moments of Tn∗ given Dn when the conditional distribution of Tn∗ given Dn converges weakly to ν in probability. The lemma might look obvious from the standard uniform integrability argument. However, we give a proof for it for clarity. Lemma 2.1 Let Tn∗ be a scalar statistic of X1 , . . . , Xn and Wn1 , . . . , Wnn such that the conditional distribution of Tn∗ given Dn converges weakly to some fixed distribution (ν, say) in probability. If EW [|Tn∗ |q ] = Op (1) for some q > 1, then Z (a) ν has q-th absolute p

123

124 125

moment; (b) for any integer 1 ≤ r < q, we have EW [Tn∗r ] →

tr dν(t).

Proof: Let T denote a random variable with distribution ν. Part (a): Take a subsequence {n0 } ⊂ {n} such that conditionally on X1 , X2 , . . . , d

126 127 128 129 130 131 132 133 134

135

136

137 138

139 140 141

142 143

Tn∗0 → ν for almost every sequence X1 , X2 , . . . By Fatou’s lemma together with Skorohod’s theorem, we have E[|T |q ] ≤ lim inf n0 EW [|Tn0 |q ], a.s. The fact that EW [|Tn∗ |q ] = Op (1) implies that the liminf is finite with positive probability. Since E[|T |q ] is nonrandom, we obtain the first assertion. Part (b): The proof is a modification of Lemma 4.5.2 in Chung (2001). Fix ² > 0 and η > 0. Take a sufficiently large K such that P (EW [|Tn∗ |q ] > K) ≤ η for all n ≥ 1 and E[|T |q ] ≤ K. For a positive L such that K/L(q−r) ≤ ², define gL (t) := Lr if t > L; := tr if |t| ≤ L and := (−L)r if t < −L. Since gL is bounded and Lipschitz continuous, p we have EW [gL (Tn∗ )] → E[gL (T )]. On the other hand, |EW [Tn∗r ] − EW [gL (Tn∗ )]| ≤ EW [|Tn∗ |r I(|Tn∗ | > L)] EW [|Tn∗ |q ] ≤ , Lq−r which is less than or equal to ² with probability greater than 1 − η. We also have |E[gL (T )] − E[T r ]| ≤ K/Lq−r ≤ ². Therefore, P (|EW [Tn∗r ] − E[T r ]| > 3²)

≤ P (|EW [Tn∗r ] − EW [gL (Tn∗ )]| > ²) + P (|EW [gL (Tn∗ )] − E[gL (T )]| > ²) ≤ P (|EW [gL (Tn∗ )] − E[gL (T )]| > ²) + η.

Taking the limit of both the sides, we obtain lim supn→∞ P (|EW [Tn∗r ] − E[T r ]| > 3²) ≤ η. Therefore, the proof is completed. 2

1005

144 145 146

147 148 149 150 151 152 153

154 155 156 157

158 159 160 161 162 163 164 165

166 167 168 169 170

171

We now present the main result of the paper. In the statement of the theorem, we use the notation J(1, F) to represent a uniform metric entropy integral (see van der Vaart and Wellner, 1996, p. 239). Theorem 2.2 Suppose that: (i) There exist a θ0 ∈ Θ and a positive constant c such that M (θ) − M (θ0 ) ≥ ckθ − θ0 k2 for all θ ∈ Θ. (ii) The class of functions Mδ := {mθ − mθ0 : kθ − θ0 k ≤ δ, θ ∈ Θ} has envelope Mδ such that for some p ≥ 2 and ² > 0, E[Mδp+² ] ≤ const. ×δ p+² for all δ > 0, and the class Mδ with envelope Mδ satisfies the uniform metric entropy condition: J(1, Mδ ) ≤ const. for all δ > 0, where √ 0 the constants are independent of δ. Then, we have supn≥1 E[k n(θˆn∗ − θˆn )kp+² ] < ∞ 0 for any ² ∈ (0, ²). Remark 2.3 A primitive sufficient condition for condition (ii) is: (ii)’ There exists a measurable function m ˙ : X → R such that |mθ1 (x) − mθ2 (x)| ≤ m(x)kθ ˙ 1 − θ2 k and p+² E[m(X) ˙ ] for some p ≥ 2 and ² > 0. Use Theorem 2.7.11 of van der Vaart and Wellner (1996) and the relation between covering numbers and bracketing numbers. Before going to the proof, we explain an implication of the theorem. Suppose that √ d n(θˆn − θ0 ) → N (0, C) and the conditional distribution of n(θˆn∗ − θˆn ) given Dn converges weakly to N (0, C) in probability, where C is given in the previous discussion. Suppose also that p is a positive integer. Theorem 2.2 establishes sufficient conditions √ p under which EW [g( n(θˆn∗ − θˆn ))] → E[g(Z)] with Z ∼ N (0, C) for a polynomial function g of degree less than or equal to p (Theorem 2.2 indeed ensures the L1 -convergence). In particular, if the conditions of Theorem 2.2 holds with p = 2, the bootstrap variance estimator Cˆ ∗ will be consistent.



√ 0 Proof of Theorem 2.2: We first show that supn≥1 E[k n(θˆn∗ − θ0 )kp+² ] < ∞. The proof consists of a combination of the proof of √Theorem 3.2.5 in van der Vaart and Wellner (1996). Define Sj,n := {θ ∈ Θ : 2j−1 < nkθ − θ0 k ≤ 2j } for j = 1, 2, . . . If √ ˆ∗ nkθn − θ0 k > 2L for some integer L, then inf θ∈Sj,n {M∗n (θ) − M∗n (θ0 )} ≤ 0 for some j ≥ L. Therefore, P

³√

¶ ´ X µ P inf {M∗n (θ) − M∗n (θ0 )} ≤ 0 . nkθˆn∗ − θ0 k > 2L ≤ j≥L

172

173 174 175 176

θ∈Sj,n

Decompose M∗n (θ) − M∗n (θ0 ) as M∗n (θ) − M∗n (θ0 ) = [M∗n (θ) − M∗n (θ0 ) − {Mn (θ) − Mn (θ0 )}] + [Mn (θ) − Mn (θ0 ) − {M (θ) − M (θ0 )}] + {M (θ) − M (θ0 )} =: I1n (θ) + I2n (θ) + I3 (θ).

1006

177

178

179

180

181

182 183 184

185

By condition (i), for θ ∈ Sj,n , I3 (θ) ≥ c22j−2 /n. This implies that µ ¶ ∗ ∗ P inf {Mn (θ) − Mn (θ0 )} ≤ 0 θ∈Sj,n µ ¶ c22j−2 ≤P inf {I1n (θ) + I2n (θ)} ≤ − θ∈Sj,n n à ! 2j−2 c2 ≤ P sup |I1n (θ) + I2n (θ)| ≥ n θ∈Sj,n à ! à ! c22j−2 c22j−2 ≤ P sup |I1n (θ)| ≥ + P sup |I2n (θ)| ≥ . 2n 2n θ∈Sj,n θ∈Sj,n

Recall the definition of Mδ and Mδ . By the joint measurability of the map (x, θ)√ 7→ mθ , Mδ is image admissible Suslin (see Dudley, 1999, Sec. 5.3). Put δj,n := 2j / n. By Theorem 2.14.1 of van der Vaart and Wellner (1996), we have " # E

sup |I2n (θ)|p+²

θ∈Sj,n

188 189 190

where the constants are independent of (j, n). Thus, by Markov’s inequality, the second term on the right hand side of (2.1) is bounded by const. ×2−(p+²)j . To bound the first term, recall that EW [M∗n (θ)] = Mn (θ). By Theorem 2.14.1 of van der Vaart and Wellner (1996), we have # " EW

191

194

sup |I1n (θ)|p+²

θ∈Sj,n

≤ const. ×n−(p+²)/2 J(1, Mδj,n )p+² {n−1

192

193

≤ const. ×n−(p+²)/2 J(1, Mδj,n )p+² E[Mδj,n (X)p+² ] ≤ const. ×n−(p+²) 2(p+²)j ,

186

187

(2.1)

Pn

p+² }, i=1 Mδj,n (Xi )

where the constant is independent of (j, n). The fact that Mδj,n is image admissible Suslin ensures to apply Fubini’s theorem to get # " E

195

sup |I1n (θ)|p+² ≤ const. ×n−(p+²) 2(p+²)j .

θ∈Sj,n 196

197

We have shown that there exists a constant D such that for any positive integer L, ³√ ´ P P nkθˆn∗ − θ0 k > 2L ≤ D j≥L 2−(p+²)j ≤ 2D2−(p+²)L .

198

199 200 201 202

Take L = [log2 t] for t ≥ 2 where [a] denotes the integer part of a number a. Then, we √ can see that there exists another constant D0 independent of t such that P( nkθˆn∗ −θ0 k > t)Z≤ D0 t−(p+²) . Because of the fact that for a non-negative random variable Z, E[Z q ] = ∞ √ 0 q tq−1 P(Z > t)dt for q ≥ 1, we obtain: supn≥1 E[k n(θˆn∗ − θ0 )kp+² ] < ∞. 0

1007

203 204

√ 0 The analogous argument leads to that supn≥1 E[k n(θˆn − θ0 )kp+² ] < ∞. Combin√ ˆ∗ ˆ p+²0 ing the previous result, we obtain: supn≥1 E[k n(θn − θn )k ] < ∞. 2

221

We give a brief discussion on the conditions of Theorem 2.2. Conditions (i) and (ii) √ (or (i) and (ii)’) are adapted from conditions for the n-consistency of the M-estimator discussed in van der Vaart and Wellner (1996, p. 291). The different points are: (a) we put a global restriction on the behavior of M (θ) rather than a local one; (b) we put a higher moment restriction on Mδ (or m). ˙ Part (b) is natural for the present purpose. Part (a) is √ essential for the present proof since we have to control the behavior of P( nkθˆn∗ −θ0 k > t) for large t and hence have to control the behavior of M∗n (θ) − M∗n (θ0 ) over all “shells” Sj,n for large j. Not surprisingly, the conditions of Theorem 2.2 are analogous to those of Nishiyama’s (2010) Theorem 1 that establishes the moment convergence (of any order) of an original M-estimator, as the proof strategies of both the theorems have the same root, Theorem 3.2.5 of van der Vaart and Wellner (1996). It is worthwhile to remark that √ the proof uses 0the uniform integrability of the original M-estimator (i.e., supn≥1 E[k n(θˆn − θ0 )kp+² ] < ∞). Thus, under the same set of √ conditions and the n-asymptotic normality of θˆn , the moment convergence of θˆn also follows. In view of the previous discussion, there seems essentially no additional cost to ensure the uniform integrability of the bootstrap M-estimator, in comparison with that of the original one.

222

3 Applications

223

3.1 Quantile regression

205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220

224 225 226 227 228 229

230 231 232 233

234

235 236 237 238 239

In this section, we consider an application of our Theorem 2.2 to the quantile regression case. In particular, we are interested in the consistency of the bootstrap variance estimator for the quantile regression estimator. Let Y be a scalar dependent variable and let Z be a d-dimensional vector of explanatory variables. We consider the quantile regression model: Qτ (Y |Z) = Z 0 β0 , where τ ∈ (0, 1) is a quantile of interest, which is assumed to be fixed, Qτ (Y |Z) is the conditional τ -quantile of Y given Z and β0 ∈ Rd is an unknown parameter vector. Suppose that we have n independent observations (Y1 , Z1 ), . . . , (Yn , Zn ) of (Y, Z). Koenker and Bassett (1978) proposed an estimator (“the quantile regression estimator”) " n # X βˆn := arg min ρτ (Yi − Z 0 β) β∈B

i

i=1

where ρτ (u) := {τ − I(u ≤ 0)}u. We restrict the parameter space to be a compact and convex subset B of Rd for some technical reason stated later. Let fY |Z (y|z) denote the conditional density of Y given Z = z. Under suitable regularity conditions, it is √ d shown that n(βˆn − β0 ) → N (0, A−1 BA−1 ), where A := E[fY |Z (Z 0 β0 |Z)ZZ 0 ] and 0 B := τ (1 − τ )E[ZZ ]. We consider to estimate the asymptotic covariance matrix C :=

1008

240 241

242

243

244

A−1 BA−1 by the bootstrap. Let (Y1∗ , Z1∗ ), . . . , (Yn∗ , Zn∗ ) denote a bootstrap sample from (Y1 , Z1 ), . . . , (Yn , Zn ). Consider the bootstrap quantile regression estimator " n # X ∗ ∗ ∗0 ˆ β := arg min ρτ (Y − Z β) . n

β∈B

i

i

i=1

Put Dn := {(Y1 , Z1 ), . . . , (Yn , Zn )}. A bootstrap estimator of C is given by Cˆ ∗ := E[n(βˆn∗ − βˆn )(βˆn∗ − βˆn )0 | Dn ],

250

which can be calculated by a simulation method. As usual in the quantile regression literature, we use mβ (y, z) := ρτ (y − z 0 β) − ρτ (y − z 0 β0 ) as an objective function. We investigate sufficient conditions for the consistency of Cˆ ∗ . Possible sufficient conditions are: √ (Q0) The conditional distribution of n(βˆn∗ −βˆn ) given Dn converges weakly to N (0, C) in probability.

251

(Q1) The parameter space B is a compact and convex subset of Rd .

252

(Q2) E[kZk2+² ] < ∞ for some ² > 0.

245 246 247 248

249

253 254 255

(Q3) The conditional density fY |Z (y|z) is continuous in y and there exists a constant Cf < ∞ such that fY |Z (y|z) ≤ Cf . The matrix Aβ := E[fY |Z (Z 0 β|Z)ZZ 0 ] is positive definite for all β ∈ B.

261

Condition (Q0) is a high level condition. Primitive sufficient conditions for (Q0) are found in Hahn (1995). Conditions (Q1)-(Q3) guarantee that there exists a positive constant c such that M (β) := E[mβ (Y, Z)] ≥ ckβ − β0 k2 for all β ∈ B. On the other hand, it is not difficult to see that |mβ1 (y, z)−mβ2 (y, z)| ≤ kzk·kβ1 −β2 k for all β1 , β2 ∈ Rd . Thus, given condition (Q2), condition (ii)’ in Remark 2.3 is satisfied with m(y, ˙ z) = kzk and with p = 2. In summary, we have shown that:

262

Corollary 3.1 Under conditions (Q0)–(Q3), Cˆ ∗ → C.

256 257 258 259 260

p

263 264 265 266 267 268 269 270 271 272 273 274

The boundedness of the parameter space is in usual not assumed in the quantile regression literature, although it is standard in general asymptotic theory. The boundedness of the parameter space is indeed essential for the present purpose. To see this, we recall the result of Ghosh et al. (1984). Ghosh et al. (1984) showed that the bootstrap variance estimator of the sample quantile may not be consistent despite the distributional consistency of the bootstrap sample quantile. The inconsistency of the bootstrap variance estimator is caused by the fact that the bootstrap sample quantile may sometimes take an extremely large value when there is no moment restriction. Ghosh et al. (1984) also showed that the bootstrap variance estimator will be consistent when a mild moment restriction is satisfied. In the present case, since there is no moment restriction on Y , if B is unbounded, Cˆn∗ can be inconsistent (recall that the quantile regression estimator reduces to the sample quantile when Z = 1. In that case, Aβ reduces to the density of Y , of which

1009

283

the infimum over the entire real line must be zero). The role of the boundedness of the parameter space is to prevent βˆn∗ from taking an extremely large value, thereby ensuring the consistency of Cˆ ∗ . An alternative possible way of ensuring the consistency of Cˆ ∗ is to put a suitable moment restriction on Y instead of restricting the parameter space, as Ghosh et al. (1984) did in the sample quantile case. We leave this extension as a future research topic. It is worthwhile to remark that the bootstrap variance estimator is robust to misspecification. Suppose that the model (3.1) is misspecified but there exists a unique solution β0 to the unconditional moment restriction:

284

E[{τ − I(Y ≤ Z 0 β0 )}Z] = 0.

275 276 277 278 279 280 281 282

288

√ d Then, under suitable regularity conditions, it is shown that n(βˆn − β0 ) → N (0, −1 −1 0 A BA ) where A is the same as before but B := E[{τ − I(Y ≤ Z β0 )}2 ZZ 0 ] (Angrist et al., 2006). It is not difficult to see that the conclusion of Corollary 3.1 is valid under the present situation with B being the present specification.

289

3.2 EIC

285 286 287

298

In this section, we give an informal discussion on the justification of EIC proposed by Ishiguro et al. (1997). We do not intend to make a full list of regularity conditions for EIC to make the paper succinct and focus on how to use our Theorem 2.2 to the justification of EIC. Let X, X1 , . . . , Xn be an independent sample from a distribution P . Consider a parametric model {f (x|θ) : θ ∈ Θ ⊂ Rd }, where for each θ ∈ Θ, f (x|θ) is a probability density with respect to some common base measure. We assume that the map θ 7→ f (x|θ) is sufficiently smooth. We allow for that the model does not contain the true distribution but assume that there exists a unique solution θ0 to the equation:

299

˙ E[`(X, θ0 )] = 0,

290 291 292 293 294 295 296 297

300 301 302 303 304 305 306 307 308 309 310 311 312 313

˙ θ) := ∂`(x, θ)/∂θ. Let θˆn denote the maxiwhere `(x, θ) := log f (x|θ) and `(x, mum likelihood estimator (MLE) based on the sample X1 , . . . , Xn . Then, under suit√ d able regularity conditions, it is shown that n(θˆn − θ0 ) → N (0, A−1 BA−1 ), where 2 0 ¨ ¨ ˙ ˙ A := E[−`(X, θ0 )], `(x, θ) := ∂ `(x, θ)/∂θ∂θ and B := E[`(X, θ0 )`(X, θ0 )0 ] (White, 1982). Akaike (1974) proposed to use minus of the expected log likelihood, −E[`(X, θˆn )], Pn to measure the adequacy of the estimated model. It is well known that −n−1 i=1 `(Xi , Pn θˆn ) has a bias of order n−1 . Put bn := E[n−1 i=1 {`(Xi , θˆn ) − `(X, θˆn )}]. Takeuchi (1976) heuristically showed that bn = tr(BA−1 )/n + o(n−1 ) =: b/n + o(n−1 ), which can be formally justified by using Theorem 1 of Nishiyama (2010), and proposed an Pn information criterion (“TIC”): −n−1 i=1 `(Xi , θˆn ) + b/n, which reduces to “AIC” (Akaike, 1974) when the model is correctly specified. Ishiguro et al. (1997) proposed a bootstrap estimator of the bias term b. Let X1∗ , . . . , ∗ Xn denote a bootstrap sample from Dn := {X1 , . . . , Xn } and let θˆn denote the bootPn strap MLE. Ishiguro et al. (1997) proposed the estimator ˆb∗ := E[ i=1 {`(Xi∗ , θˆn∗ ) −

1010

314

315

`(Xi , θˆn∗ )} | Dn ]. We argue the consistency of ˆb∗ . Decompose ˆb∗ as " n # X ˆb∗ = E {`(X ∗ , θˆ∗ ) − `(X ∗ , θˆn )} | Dn i

i=1

n

"

i

n X +E {`(Xi , θˆn ) − `(Xi , θˆn∗ )} | Dn

316

#

i=1

=: E[I | Dn ] + E[II | Dn ].

317

318 319 320 321 322 323 324 325 326 327 328 329 330

The Taylor expansion gives that I = 2−1 n(θˆn∗ − θˆn )0 Aˆ∗n (θ˜n∗ )(θˆn∗ − θˆn )0 and II = Pn ¨ ˆ∗ 2−1 n(θˆn∗ − θˆn )0 Aˆn (θ˜n∗ )(θˆn∗ − θˆn )0 , where Aˆn (θ) := −n−1 i=1 `(X i , θ), An (θ) := P n ¨ ∗ , θ) and θ˜∗ is on the line segment between θˆn and θˆ∗ (I and II may −n−1 i=1 `(X n n i have different θ˜n∗ ). Under suitable regularity conditions, the conditional distributions of I and II given Dn converge weakly to the distribution of 2−1 Z 0 AZ in probability where Z ∼ N (0, A−1 BA−1 ), and E[Z 0 AZ] = tr(BA−1 ) = b. Suppose that, for instance, ¨ θ)k ≤ H(x) for all θ ∈ Θ for some suitthere exists a function H(x) such that k`(x, Pn √ −1 −1 able norm k · k. Then, |I| ≤ 2 {n H(Xi∗ )} · k n(θˆn∗ − θˆn )k2 and |II| ≤ i=1 Pn √ 2−1 {n−1 i=1 H(Xi )} · k n(θˆn∗ − θˆn )k2 . In view of Lemma 2.1, sufficient conditions √ for the moment convergence are E[H(X)2(1+²) ] < ∞ and E[k n(θˆn∗ − θˆn )k4(1+²) | Dn ] = Op (1) for some ² > 0. Theorem 2.2 gives primitive sufficient conditions for the latter condition (Theorem 2.2 indeed gives sufficient conditions for the stronger assertion that E[|ˆb∗ − b|] → 0).

333

Acknowledgments. The author thanks Professor Tatsuya Kubokawa for his valuable comments. This research was supported by the Grant-in-Aid for Scientific research provided by the JSPS.

334

References

331 332

335 336 337 338 339 340 341 342 343 344 345 346 347

Akaike, H. (1974). Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory, ed. by B.N. Petrov and F. Csaki, pp. 267–281, Akademiai Kiado. Angrist, J., Chernozhukov, V. and Fernand´ez-Val, I. (2006). Quantile regression under misspecification, with an application to the US wage structure. Econometrica 74 539– 563. Arcones, M. and Gin´e, E. (1992). On the bootstrap of M-estimators and other statistical functionals. In: Exploring the Limits of Bootstrap, ed. by R. LePage and L. Billard, pp. 14–47, Wiley. Buchinsky, M. (1995). Estimating the asymptotic covariance matrix for quantile regression models : A Monte Carlo study. J. Econometrics 68 303–338. Chung, K. L. (2001). A Course in Probability Theory, 3rd edition. Academic Press. Dudley, R. M. (1999). Uniform Central Limit Theorem. Cambridge Univ. Press.

1011

348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373

374 375 376 377 378 379 380 381 382

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26. Ghosh, M., Parr, W. C., Singh, K. and Babu, G. J. (1984). A note on bootstrapping the sample median. Ann. Statist. 12 1130–1135. Gonc¸alves, S. and White, H. (2005). Bootstrap standard error estimates for linear regression. J. Amer. Stat. Assoc. 100 970–979. Hahn, J. (1995). Bootstrapping quantile regression estimators. Econometric Theory 11 105–121. Ishiguro, M., Sakamoto, Y. and Kitagawa, G. (1997). Bootstrapping log likelihood and EIC, an extension of AIC. Ann. Inst. Stat. Math. 49 411–434. Jenrich, R.I. (1969). Asymptotic properties of non-linear least squares estimators. Ann. Math. Stat. 40 633–643. Koenker, R. (2005). Quantile Regression. Oxford Univ. Press. Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica 46 33–50. Nishiyama, Y. (2010). Moment convergence of M-estimators. Statist. Neerlandica, 64 505–507. Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer-Verlag. Takeuchi, K. (1976). Distribution of information statistics and criteria for adequacy of models. Mathematical Sciences 153 12–18 (in Japanese). Wellner, J. A. and Zhan, Y. (1996). Bootstrapping Z-estimators. Unpublished manuscript. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–25. van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer-Verlag. Yoshida, N. (2010). Polynomial type large deviation inequalities and quasi-likelihood analysis for stochastic differential equations. Ann. Inst. Stat. Math., to appear, Online first: May 20, 2010, DOI: 10.1007/s10463-009-0263-z.

Kengo Kato Department of Mathematics Graduate School of Science Hiroshima University 1-3-1 Kagamiyama Higashi-Hiroshima Hiroshima 739-8526 Japan [email protected]

A note on moment convergence of bootstrap M-estimators

3.6.1), we view X1,X2,... as the coordinate projection on the first countably infi-. 109 ...... 360. Nishiyama, Y. (2010). Moment convergence of M-estimators. Statist.

273KB Sizes 0 Downloads 133 Views

Recommend Documents

A note on the H1-convergence of the overlapping Schwarz waveform ...
time interfaces need to be stored in the iterative process. We refer to [6,7,8,9] for the early development of the overlapping SWR method, [10,11,12,13] for convergence analyses of the method, [14,15,16] for extension to nonlinear prob- lems, [17,18,

A note on the convergence of the secant method for ...
Abstract. The secant method is one of the most popular methods for root finding. Standard text books in numerical analysis state that the secant method is super ...

A NOTE ON THE NONEXISTENCE OF SUM OF ...
The coefficient of tk in pm is the trace of Sm,k(A, B) := the sum of all words of length m in A and B in which B appears exactly k times (and therefore A exactly m − k times). In his ingenious 2007 paper [Häg07], Hägele found a dimension-free alg

Note on Drafting a speech.pdf
Page 1 of 1. Speech is supposed to be an oral presentation. But ,since you have speech as a discourse ,it is desirable. that we must learn the techniques of writing a speech.While presenting a speech on a stage, the speaker has a. lot of advantages .

A note on Kandori-Matsushima
Sep 16, 2004 - Social Science Center, London, ON, N6A 5C2, Tel: 519-661-2111 ext. ... equilibria, where private information is revealed every T-periods, as δ ...

A Moment of Violent Madness.pdf
and walls.” The children were buried. on 26 December 1952 at the Santa. Clara City Cemetery. Archie Connett and his ex-wife. survived their injuries. After a month- long trial, Archie Connett was found. guilty of three counts of second-degree. murd

A Note on -Permutations
We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Mathematical Association of America is collaborating with JSTOR to digitize,

A note on the identification of dynamic economic ...
DSGE models with generalized shock processes, such as shock processes which fol- low a VAR, have been an active area of research in recent years. Unfortunately, the structural parameters governing DSGE models are not identified when the driving pro-

A Critical Note on Marx's Theory of Profits
Greece email: [email protected] ..... determination of r and the best form for generalizing to more complex cases'. (Steedman (1991, p. ... 37-9). Finally, it must be remarked that in Steedman's examples there is a case in which the said.

A Note on the Power of Truthful Approximation ...
Aug 26, 2009 - Email: [email protected]. 1 ... The private information of each bidder is vi (notice that the ki's are private information too). We will assume that the ... Of course, we can also implement a weighted versions of VCG: Definition ..

briefing note on - Services
systems. In the light of these conclusions, a series of meetings in Africa, including the Foresight. Africa workshop in Entebbe, the AU meeting of Directors for Livestock Development in. Kigali 2004, the Congress ... OIE meeting of SADC Chief Veterin

A Note on Separation of Convex Sets
A line L separates a set A from a collection S of plane sets if A is contained in one of ... For any non-negative real number r, we denote by B, the disk with radius r.

A Note on Uniqueness of Bayesian Nash Equilibrium ...
Aug 30, 2011 - errors are my own. email: [email protected], website: ... second and main step of the proof, is to show that the best response function is a weak contraction. ..... In et.al, A. B., editor, Applied stochastic control in econometrics.

A note on juncture homomorphisms.pdf - Steve Borgatti
A full network homomorphism f: N -+ N' is a regular network homomorphism if for each R E [w fi( a) f2( R) fi( b) * 3 c, d E P such that fi(u) = fi( c), fi( b) = fi( d), cRb and uRd for all a, b E P. In a network N the bundle of relations B,, from a t

A NOTE ON STOCHASTIC ORDERING OF THE ... - Springer Link
Only the partial credit model (Masters, 1982) and special cases of this model (e.g., the rat- ing scale model, Andrich, 1978) imply SOL (Hemker et al., 1997, ...

A Note on Uniqueness of Bayesian Nash Equilibrium ...
Aug 30, 2011 - errors are my own. email: [email protected], website: ... second and main step of the proof, is to show that the best response ..... Each country has an initial arms stock level of yn ∈ [0,ymax], which is privately known.

A Moment of Violent Madness.pdf
Page 1 of 1. YFT's Skeleton in the Cupboard series is. now available for your Kindle at Amazon. (www.amzn.com/B0052U6RKK). Get your copy of Skeletons in ...

A NOTE ON GROUP ALGEBRAS OF LOCALLY ...
When X is the class of injective modules, X-automorphism invariant mod- ..... Department of Mathematics and Computer Science, St. Louis University, St. Louis,.

Note on the Voice of the Customer
on the ambient room light and reflections, the colors that the software designer chooses, the ... 10. Classes have a balance between theory and real-world application. ... A wide selection of companies and industries that recruit at the business.

moment restrictions on latent
In policy and program evaluation (Manski (1990)) and more general contexts ..... Let P = N (0,1), U = Y = R, Vθ = {ν : Eν(U)=0}, and Γθ (y) = {1} for all y ∈ Y, and ...

On the Convergence of Perturbed Non-Stationary ...
Communications Research in Signal Processing Group, School of Electrical and Computer Engineering. Cornell University, Ithaca ..... transmissions to achieve consensus, with the trade-off being that they ..... Science, M.I.T., Boston, MA, 1984.

On the uniform convergence of random series in ... - Project Euclid
obtain explicit representations of the jump process, and of related path func- ...... dr. Then, with probability 1 as u → ∞,. Y u. (t) → Y(t). (3.4) uniformly in t ∈ [0,1], ...

On the Convergence of Stochastic Gradient MCMC ... - Duke University
†Dept. of Electrical and Computer Engineering, Duke University, Durham, NC, USA ..... to select the best prefactors, which resulted in h=0.033×L−α for the ...

On the Convergence of Perturbed Non-Stationary ...
Cornell University, Ithaca, NY, 14853. ‡ Signal Processing and Communications Group, Electrical and Computer ...... Science, M.I.T., Boston, MA, 1984.