Statistics and Probability Letters 83 (2013) 1969–1972

Contents lists available at SciVerse ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

The growth rate of significant regressors for high dimensional data Qi Zheng, Colin Gallagher, K.B. Kulasekera ∗ Department of Mathematical Sciences, Clemson University, Clemson, SC 29634-0975, United States

article

info

Article history: Received 25 April 2013 Accepted 25 April 2013 Available online 6 May 2013

abstract We give a new consistency proof for high-dimensional quantile regression estimators. A consequence of this proof is that the number of significant regressors can grow at a rate s log2 (s) = o(n). To our best knowledge, this is the fastest rate achieved for highdimensional quantile regression. © 2013 Elsevier B.V. All rights reserved.

Keywords: High dimension Quantile regression Increasing rate

1. Introduction Over last decade, problems arising in genetics, signal processing and many other fields have created a great demand for data analysis of high-dimensional sparse regression models: yi = zTi β ∗ + ϵi ,

i = 1, . . . , n,

(1)

where yi ’s are random variables, zi ’s are p × 1 independent random covariate vectors, and ϵi ’s are identically distributed random errors with mean 0, independent of zi ’s. In this context, we are interested in estimating the vector of regression coefficients β ∗ = (β1∗ , . . . , βp∗ )T , when the regression parameter β ∗ is sparse in the sense that only s ≪ p of its components are non-zero and both s and p may depend on the sample size. Various methods have been developed to simultaneously identify the unknown model and estimate the corresponding coefficients. These include: SCAD (Fan and Peng, 2004), the bridge estimator (Huang et al., 2008a), the adaptive lasso (Huang et al., 2008b), L1 penalized quantile regression estimator (Belloni and Chernozhukov, 2011), and  the Dantzig selector (Candes and Tao, 2007). These authors show the respective estimators achieve or get very close to the

n s

rate, which is the oracle

rate obtained by knowing the underlying model in advance (Fan and Li, 2001). For M-estimators with absolutely continuous score functions, Portnoy (1984) showed that s log2 (s) can grow as fast as o(n). Welsh (1989) considered M-estimators with discontinuous score functions and showed that in this case s can grow as fast as o(n1/3 ). He and Shao (2000) improved the growth rate to s log3 (s) = o(n). However, we wonder if we can have a faster growth rate for quantile regression so that the quantile regression can be carried out for more complex applications. In this note we give a new proof of consistency of the quantile regression estimator by exploiting common moment assumptions on the covariates. The main contribution is likely the new proof method, but the growth rate of predictor variables is improved slightly. We show that the allowed growth rate of significant variables can be as much as s log2 (s) = o(n). This is a significant improvement upon Welsh (1989), and a very slight improvement upon the result from



Corresponding author. E-mail addresses: [email protected] (Q. Zheng), [email protected] (C. Gallagher), [email protected], [email protected] (K.B. Kulasekera).

0167-7152/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.spl.2013.04.029

1970

Q. Zheng et al. / Statistics and Probability Letters 83 (2013) 1969–1972

He and Shao (2000). This rate gives us more insight into high-dimensional sparse models, and allows the high-dimensional sparse model to be used in more applications. To our best knowledge, this rate has not been shown elsewhere in the high dimensional regression literature. 2. Main results Since we consider the oracle rate, we assume that p = s throughout the rest of this article. The quantile regression oracle estimator is constructed by minimizing Qτ (θ ) =

n 

ρτ (yi − xTi θ )

i =1

where ρτ (t ) = τ 1(t > 0)t − (1 − τ )1(t ≤ 0)t is the check function for some quantile index τ , and θ = (qτ , β T )T ∈ Rs+1 . We consider the following commonly assumed conditions: C1 (Sampling and smoothness). For any value z in the support of zi , the conditional density fy|z (y|z) is continuously differentiable at each y ∈ R, and fy|z (y|z) and ∂∂y fy|z (y|z) are bounded in absolute value by constants f and f¯′ uniformly in y ∈ R and z in the support of zi . C2 (Eigenvalues of covariate matrix) c1 < λmin (E [xi xTi ]) < λmax (E [xi xTi ]) < c2 , where xi = (1, zTi )T . Moreover, q :=

3f

3/2

E [|xTi δ|2 ]3/2

inf

8 f¯ ′ δ∈Rs+1 ,δ̸=0 E [|xTi δ|3 ]

> 0.

C3 (Moments of covariates) Covariates satisfy the Cramér condition E [|zij |k ] ≤ and all j = 1, . . . , s.

! for some constant Cm , M, all k ≥ 2

Cm M k−2 k 2

Condition C3 is important for us to apply Bernstein’s inequality to control the tail probabilities of the quantile regression. In n addition, C3 (a) also implies i=1 E ∥xia ∥2 ∼ O(ns), which is essential for establishing the oracle consistency rate. Theorem 2.1. Suppose that conditions C1, C2, and C3 are satisfied. For any s log2 (s) ∼ o(n), the oracle quantile regression √ estimator βˆ , has the consistency rate n/s, that is,

∥βˆ − β ∗ ∥ ∼ Op

  s

n

. √

Theorem 2.1 implies the oracle consistency rate n/s can be achieved, as the growth rate of the number of covariates can be as much as s log2 (s)/n → 0. Since score functions for quantile regression are discontinuous, it relaxes the required rate s3 (log n)2+γ = o(n) in Welsh (1989) and the rate s log3 (s) = o(n) in He and Shao (2000). It is slightly slower than the growth rate s log(s) = o(n) required for the absolute continuous score functions.



Remark 2.1. Though we only show the largest growth rate for quantile regression to attain n/s consistency, with minor modifications the technique may be applied to the proofs of Welsh (1989) by using more complicated stochastic equicontinuity arguments (see e.g. Bickel, 1975). 3. Conclusion In this paper, we consider the allowable growth rate of the number of significant regressors in high-dimensional sparse models. It is shown that under commonly assumed conditions the possible growth rate of the number of true regressors s log(s)/n → 0 can be allowed, which relaxes dimension restrictions for high-dimensional sparse models, and hence make them more applicable. Our results can be extended to generalized M-estimators. Appendix Proof of Theorem 2.1. We want to show that for any ϵ > 0, there exists a sufficiently large constant C , such that

 P

    s ∗ δ > Qτ (θ ∗ ) > 1 − ϵ inf Qτ θ +

∥δ∥=C

n

(2)

Q. Zheng et al. / Statistics and Probability Letters 83 (2013) 1969–1972

1971

where δ ∈ Rs+1 and ∥δ∥ = C . Since the objective function Qτ (θ ) is strictly convex,  the inequality (2) implies with probability

at least 1 − ϵ the oracle quantile estimator lies in the shrinking ball {θ ∗ +

s n

δ : δ ∈ Rs+1 , ∥δ∥ ≤ C }. This provides the

consistency result immediately. To verify (2), consider Qτ

       n  s s θτ∗ + δ − Qτ (θτ∗ ) = ρτ yi − xTi θτ∗ + δ − ρτ (yi − xTi θ ∗ ). n

(3)

n

i =1

According to Knight (1998), for any x ̸= 0, we have y



|x − y| − |x| = −y[1(x > 0) − 1(x < 0)] + 2

[1(x < t ) − 1(x < 0)]dt . 0

Then

ρτ (x − y) − ρτ (x) = y[1(x < 0) − τ ] + 2

y



[1(x < t ) − 1(x < 0)]dt . 0

Hence, (3) can be written as



n s 

n i=1

xTi

δ[1(yi <

θ ) − τ] +

xTi τ∗

√ T sx δ √i

n   i=1

n

[1(yi <

0

θ + t ) − 1(yi <

xTi τ∗

θ )]dt :=



xTi τ∗

s n

T1 + T2 .

Using independence, the Cauchy–Schwarz inequality and condition C3 , E [T12 ] ≤ nτ (1 − τ )E ∥xi ∥2 ∥δ∥2 ≤ nsτ (1 − τ )Cm C 2 . Then applying Chebychev’s inequality, we see that for any constant k

 P

s n

|T1 | > ksC

2

 ≤

τ (1 − τ )Cm k2 C 2

.

(4) p

Next, we deal with T2 . The goal is to show that T2 ≥ nE

 √s T  n xi δ 0

[1(yi − xTi θτ∗ < t ) − 1(yi − xTi θτ∗ < 0)]dt

 √  s

T n xi δ

nE 

[1(yi −

0

θ < t ) − 1(yi −

xTi τ∗

2

1 f 4

(q∗τ )c1 Cm C 2 . Since V (X ) ≤ E [X 2 ], we obtain that V [T2 ] ≤

. Then given an η > 0 we have

θ < 0)]dt

xTi τ∗

2  1

s n





| δ| > η  xTi

     1/3 s T s T ≤ 4sE (xTi δ)2 1 |xi δ| > η ≤ 4sE [|xTi δ|3 ]2/3 P |xi δ| > η , n

n

where the last line follows from Holder’s inequality. Under condition C2, E [|xTi δ|3 ] ≤

3f

3/2

E [|xTi δ|2 ]3/2

8 f¯ ′

q

.

(5)

Applying Bernstein’s inequality (Lemma 2.2.11 of Van Der Vaart and Wellner (1996)),

 −η2 n P (| δ| > η n) ≤ 2 exp . √ √ 2s(C 2 Cm + MC η n/ s) √

xTi



(6)

Combining bounds (5) and (6) yields: xTi

4sE [|

3 2/3

 

δ| ]

P

1 n

1/3 ≤ 32/3 21/3

| δ| > η xTi

f

(f¯ ′ q)2/3

Cm C 2 s2 exp

√ η n √



√  −η n √

6MC

s



η n √ ). On the other hand, → ∞ and (A2) log(s) ∼ o( 12MC s  √   2    s xT δ n i s T nE  [1(yi − xTi θτ∗ < t ) − 1(yi − xTi θτ∗ < 0)]dt 1 |xi δ| ≤ η 

which converges to 0 if η satisfies (A1)

s

n

0

≤ 2nηE

 √ s T |x δ| n

0

i

[1(yi −

θ < t ) − 1(yi −

xTi τ∗

 

θ < 0)]dt 1

xTi τ∗

s n

| δ| < η xTi



1972

Q. Zheng et al. / Statistics and Probability Letters 83 (2013) 1969–1972

= 2nηE

 √ s T |x δ| n

i

0

[F (q∗τ

+ t) −

F (q∗τ )]dt

  1

s n

| δ| < η xTi



.

If η is close to 0, we can find a constant Cτ , s.t. F (t ) − F (0) ≤ Cτ f (q∗τ )t , ∀|t | < η. Thus, we obtain 2nηE

 √ s

T n |xi δ|

0

≤ 2Cτ f (q∗τ )ηnE

[F (q∗τ

+ t) −

F (q∗τ )]dt

 √ s T |x δ| n

 

 

i

tdt

1

0

1 s n

s n

| δ| < η xTi

|xTi δ| < η





≤ Cτ f (q∗τ )c2 C 2 ηs.

Thus, if η satisfies (A1), (A2) and η → 0 (A3), then V [T2 ] ≤ Cτ f (q∗τ )c2 C 2 ηs. By Chebyshev’s inequality, we have P (|T2 − E [T2 ]| >



s) ≤ Cτ f (q∗τ )c2 C 2 η.

(7)

Using the Cauchy–Schwarz inequality, assumption C2, and a similar argument as in the proof of getting the bound of V (T2 ), we can show that for n sufficiently large

E [T2 ] = nE

  



0

1 T n xi δ

[1(yi − xTi θτ∗ < t ) − 1(yi − xTi θτ∗ < 0)]dt

  



1 2

f (q∗τ )c1 C 2 s.

(8)

Combining (7) with (8), we see that for sufficiently large C and sufficiently small η, T2 > 41 f (q∗τ )c1 Cm C 2 with probability at least 1 − 2ϵ . This, coupled with (4) implies (3) is positive with probability at least 1 − ϵ and (2) is satisfied. Combining (A1)–(A3) together, we can see that we only require s log2 (s) ∼ o(n). This completes our proof.



References Belloni, A., Chernozhukov, V., 2011. l1 penalized quantile regression in high-dimensional sparse models. The Annals of Statistics 39, 82–130. Bickel, P.J., 1975. One-step Huber estimates in the linear model. Journal of the American Statistical Association 70, 428–433. Candes, E., Tao, T., 2007. The Dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics 35, 2313–2351. Fan, J., Li, R., 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360. Fan, J., Peng, H., 2004. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics 32, 928–961. He, X., Shao, Q., 2000. On parameters of increasing dimensions. Journal of Multivariate Analysis 73, 120–135. Huang, J., Horowitz, J.L., Ma, S., 2008a. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. The Annals of Statistics 36, 587–613. Huang, J., Ma, S., Zhang, C., 2008b. Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica 18, 1603–1618. Knight, K., 1998. Limiting distributions for L1 regression estimators under general conditions. The Annals of Statistics 26, 755–770. Portnoy, S., 1984. Asymptotic behavior of M-estimators of p regression parameter when p2 /n is large. I. Consistency. The Annals of Statistics 13, 1402–1417. Van Der Vaart, A.W., Wellner, J.A., 1996. Weak Convergence and Empirical Processes. Springer, New York, MR1385671. Welsh, A.H., 1989. On M-processes and M-estimation. The Annals of Statistics 17, 337–361.

The growth rate of significant regressors for high ...

journal homepage: www.elsevier.com/locate/stapro. The growth ... However, we wonder if we can have a faster growth rate for quantile regression so that the.

359KB Sizes 0 Downloads 202 Views

Recommend Documents

Transforms for High-Rate Distributed Source Coding
As for quantization for distributed source coding, optimal design of ... expected Lagrangian cost J = D+λ R, with λ a nonnegative real number, for high rate R.

Population Growth, the Natural Rate of Interest ... - Wiwi Uni-Frankfurt
Nov 14, 2016 - ther include fertility shocks in a standard business cycle model. .... is all the more likely that the greater advantages of future generations will ... more capital and better technology it will be possible to support a sufficiently l

Implementation of Viterbi decoder for a High Rate ...
Code Using Pre-computational Logic for TCM Systems .... the same number of states and all the states in the same cluster will be extended by the same BMs.

Growth rate limiting mechanisms of YBa2Cu3O7 films ...
E-mail address: [email protected] (V.F. Solovyov). ... The application of YBCO conductors to electric power or ..... specimen surface due to the bulk gas flow is much.

The Real Exchange Rate and Economic Growth
measure of real exchange rate undervaluation (to be defined more precisely below) against the countryms economic growth rate in the corresponding period.

High growth somewhat priced in
Figure 5: Example of Bio-Life products. Source: Bio-Life. Meanwhile, its joint venture with MALEE, called Mega Malee Co., Ltd., is another potential. This JV is.

on the Growth Rate and Condition of the Crayfish
rusticus and that host molting is a critical time for branchiobdellids. INTRODUCTION ... crayfish's righting response time out of water (dry) were taken.

Absence of significant dissent should be sufficient for ...
However policymakers must balance the good of saving lives against the need to ... into up to eight living recipients. Assuming that .... allowing coroners to take into account the views of the deceased and the family, current post- .... instructions

Absence of significant dissent should be sufficient for ...
consent requirements for deceased donor organ procurement will remain a constraint on .... but in cases of suspicious death they are not obliged to comply with.

The most significant values of the Preventive System
The socio-economic realities of our countries at the beginning of the 21st century are so ... Let us not forget that violence is the most natural manner of managing ...

High-Order Integration of Fatigue Crack Growth Using ...
of Florida, Gainesville, FL 32611-6250, [email protected], Student Member. † ..... fixed N , kriging is used to fit the data up to the current cycle and extrapolate ...

High-Order Integration of Fatigue Crack Growth Using ...
designs, fitting a surrogate, then replacing the costly simulations by the surrogate model6-9. The basic ..... Here for a fixed crack growth increment of ai/10 the.

Induction - Uncertainties & Significant Figures
The farmer has to rescue it by attaching a rope and harness, and lifting it using a pulley and his tractor (as shown in the diagram). The tractor has a mass of 1500 ...

Growth and developmental outcomes of three high-risk ...
which, given our small sample size, may invalidate the t-tests. A P-value of o0.05 was considered statistically significant. Weight data were analyzed for mean.

Growth-Rate and Uncertainty Shocks in ... - Columbia University
American Economic Journal: Macroeconomics 2017, 9(1): 1–39 ... Nakamura: Graduate School of Business, Columbia University, 3022 Broadway, New York, .... focused on vanishingly small growth-rate shocks—too small to ever identify in the.

pdf-1443\the-non-significant-journal-of-business-consumer ...
... of the apps below to open or edit this item. pdf-1443\the-non-significant-journal-of-business-consumer-psychology-volume-2-by-katharina-wittgens.pdf.

persisters and mutation rate at exponential growth ...
higher mutation rate in MGYA7, and not directly linked to its high persistence, we compared the ... MGYA7 is a mutation of MGY producing more persister cells.

Growth-Rate and Uncertainty Shocks in ... - Columbia University
Nakamura: Graduate School of Business, Columbia University, 3022 ... Go to https://doi.org/10.1257/mac.20150250 to visit the article page for ...... Perhaps the best way to illustrate the importance of long-run risks in our esti- ...... to sample fro

SIGNIFICANT CHANGES AND AMENDMENTS TO THE 58TH ...
Page 1 of 5. 27 July 2016 IATA Cargo. SIGNIFICANT CHANGES AND AMENDMENTS TO THE 58TH EDITION (2017). The 58th edition of the IATA Dangerous Goods Regulations incorporates all amendments made by. the ICAO Dangerous Goods Panel in developing the conten

Electrochemical Growth of Pd for the Synthesis of ...
Nov 3, 2007 - and hydrogen storage.5-10 Pd is one of the most used catalysts for the gas sensors11-13 and has also been used as catalytic material for the ...

A Generalization of the Rate-Distortion Function for ...
side information Y and independent Gaussian noise, while the source data X must be the ... However, the reduction of the general noisy WZ problem requires.