2013.5.7.

Minimax lower bounds via Neyman-Pearson lemma Kengo Kato Suppose that there is a scalar dependent variable Y and a scalar covariate X which we assume has support in [0, 1]. Consider the nonparametric regression model Y = f (X) + ϵ, ϵ ⊥ ⊥ X, ϵ ∼ N (0, σ 2 ), σ 2 > 0. We fix the distribution of X and σ 2 > 0. Let {ϕj }∞ j=1 be an orthonormal 2 system in L ([0, 1]). We assume that L := sup E[ϕ2j (X)] < ∞. j≥1

For given α > 0 and C1 > 0, suppose that f belongs to the class F(α, C1 ) = {f ∈ L2 ([0, 1]) : |⟨f, ϕj ⟩| ≤ C1 j −α , ∀j ≥ 1}, where ⟨·, ·⟩ denotes the inner product in L2 ([0, 1]). Denote by ∥ · ∥ the L2 ([0, 1])-norm. Let (Y1 , X1 ), . . . , (Yn , Xn ) be i.i.d. observations of (Y, X). The purpose of this note is to prove (in a self-contained manner) the following (well-known) theorems by means of a simple application of the Neyman-Pearson lemma.1 Theorem 1. Under the above setup, we have inf

sup

fˆ f ∈F (α,C1 )

Ef [∥fˆ − f ∥2 ] ≳ n−(2α−1)/(2α) ,

where the infimum is taken over all estimators fˆ of f . Remark 1. The idea of the proof of Theorem 1 is borrowed from [1, 2] where minimax lower bounds in the problem of estimating structural functions in nonparametric instrumental variables models and slope functions in functional linear models are derived [see also the proof of 4, Theorem 7]. However, in [1, 2], detailed proofs for the minimax lower bounds are not presented (though the proofs are correct). Hence I hope that this note would be of some help in understanding their proofs. Alternatively, we have the following theorem. Theorem 2. There exists a small constant c > 0 such that lim inf inf n→∞

sup

fˆ f ∈F (α,C1 )

Pf (∥fˆ − f ∥2 > cn−(2α−1)/(2α) ) > 0.

1For various techniques to derive minimax lower bounds in nonparametric statistical

problems, see [3]. 1

2

Proof of Theorem 1. Let Mn be the integer part of n1/(2α) . For θMn = (θMn +1 , . . . , θ2Mn )T ∈ RMn , define 2M ∑n

fθMn (·) = C1

j −α θj ϕj (·).

j=Mn +1

Clearly, fθMn ∈ F (α, C1 ) whenever θMn ∈ [0, 1]Mn . Lemma 1. We have inf

sup

fˆ f ∈F (α,C1 )

Ef [∥fˆ − f ∥2 ] ≥ inf

sup

θˆMn θ Mn ∈{0,1}Mn

EθMn [∥fθˆMn − fθMn ∥2 ],

where the infimum on the right side is taken over all estimators θˆMn ∈ [0, 1]Mn of θMn . Proof of Lemma 1. For arbitrary fˆ, we have sup f ∈F (α,C1 )

Ef [∥fˆ − f ∥2 ] ≥

sup θ Mn ∈{0,1}Mn

EθMn [∥fˆ − fθMn ∥2 ].

Moreover, by Bessel’s inequality, ∥fˆ − f ∥2 ≥

∞ ∑

(⟨fˆ, ϕj ⟩ − ⟨f, ϕj ⟩)2 ,

j=1

so that when f = fθMn for some θMn ∈ {0, 1}Mn , it is enough to consider the estimator of the form fˆ(·) =

2M ∑n

α ˆ j ϕj (·),

j=Mn +1

where α ˆ j are data-dependent. By defining θˆj = C1−1 j α α ˆ j , fˆ is of the form 2M ∑n

fˆ(·) = fθˆMn (·) = C1

j −α θˆj ϕj (·).

j=Mn +1

We need to show that we can restrict θˆj in such a way that 0 ≤ θˆj ≤ 1. For given θˆj , define  ˆ  1, if θj > 1, θ˜j = θˆj , if 0 ≤ θˆj ≤ 1,   0, if θˆj < 0. Then whenever θMn ∈ {0, 1}Mn , ∥fθˆMn − fθMn ∥2 ≥ ∥fθ˜Mn − fθMn ∥2 . This completes the proof of the lemma.



3

For the notational convenience, write Mn = (θMn +1 , . . . , θj−1 , θj+1 , . . . , θ2Mn )T , Mn + 1 ≤ j ≤ 2Mn . θ−j

Observe that sup θ Mn ∈{0,1}Mn



=

1 2Mn C12 2Mn

EθMn [∥fθˆMn − fθMn ∥2 ] ∑

EθMn [∥fθˆMn − fθMn ∥2 ]

θ Mn ∈{0,1}Mn 2M ∑n

j −2α

j=Mn +1



{

EθMn ,θj =0 [(θˆj − θj )2 ] −j

Mn θ−j ∈{0,1}Mn −1

} + EθMn ,θj =1 [(θˆj − θj )2 ] . −j

(1)

We want to lower bound EθMn ,θj =0 [(θˆj − θj )2 ] + EθMn ,θj =1 [(θˆj − θj )2 ]. −j

−j

To this end, we make use of a variant of Neyman-Pearson lemma combined with Le Cam’s inequality [3, Lemma 2.3]. Lemma 2. Let (S, S, µ) be a measure space, and let p, q be probability density functions with respect to µ. Then (i) (A variant of Neyman-Pearson lemma): {∫ } ∫ ∫ inf φpdµ + ψqdµ : φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1 ≥ (p ∧ q)dµ. (ii) (Le Cam’s inequality): (∫ )2 ∫ 1 √ pqdµ . (p ∧ q)dµ ≥ 2 Proof of Lemma 2. Part (i): Let φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1. Then ∫ ∫ ∫ ∫ φpdµ + ψqdµ ≥ (φ ∧ 1)pdµ + (1 − φ ∧ 1)qdµ. We lower bound the right side with respect to φ. Clearly, we may assume that φ ≤ 1. The desired conclusion follows from the inequality ∫ (p − q)(φ − 1(p < q))dµ ≥ 0. ∫ ∫ Part (ii): Since (p ∨ q)dµ + (p ∧ q)dµ = 2, we have (∫ )2 (∫ )2 ∫ ∫ √ √ pqdµ = (p ∧ q)(p ∨ q)dµ ≤ (p ∧ q)dµ (p ∨ q)dµ { } ∫ ∫ ∫ = (p ∧ q)dµ 2 − (p ∨ q)dµ ≤ 2 (p ∧ q)dµ, so that the desired inequality is obtained.



4 Mn For a while, fix Mn+1 +1 ≤ j ≤ 2Mn and θ−j ∈ {0, 1}Mn −1 . Let pθj (y | x) denote the conditional density function of Y given X = x when f = fθMn ,θj :

{ )2 } 1 1 ( pθj (y | x) = √ exp − 2 y − fθMn ,θj (x) . −j 2σ 2πσ 2

−j

Then EθMn ,θj =0 [θˆj2 ] + EθMn ,θj =1 [(θˆj − 1)2 ] −j −j [∫ n ∏ =E θˆj2 ((y1 , X1 ), . . . , (yn , Xn )) pθj =0 (yi | Xi )dy1 · · · dyn ∫ { +

i=1

1 − θˆj ((y1 , X1 ), . . . , (yn , Xn ))

n }2 ∏

] pθj =1 (yi | Xi )dy1 · · · dyn . (2)

i=1

Note that θˆj2 + (1 − θˆj )2 ≥ 1/2, i.e., 2θˆj2 + 2(1 − θˆj )2 ≥ 1, and  ( )2    f (x) + f (x) Mn Mn θ−j ,θj =0 θ−j ,θj =1 1 1 pθj =0 (y | x)pθj =1 (y | x) = exp − y −  σ2  2πσ 2 2 { } 1 × exp − 2 (fθMn ,θj =1 (x) − fθMn ,θj =0 (x))2 , −j −j 4σ so that, by Lemma 2, ∫ n ∏ 2 ˆ θj ((y1 , X1 ), . . . , (yn , Xn )) pθj =0 (yi | Xi )dy1 · · · dyn i=1

∫ { n }2 ∏ ˆ + 1 − θj ((y1 , X1 ), . . . , (yn , Xn )) pθj =1 (yi | Xi )dy1 · · · dyn ≥

1 exp 4

{

C 2 j −2α − 1 2 8σ

n ∑

}

i=1

ϕ2j (Xi ) .

i=1

By convexity of the map x 7→ e−x , we have { } { } C12 j −2α n 1 C12 j −2α nL 1 2 E[ϕj (X)] ≥ exp − (2) ≥ exp − . 4 8σ 2 4 8σ 2 For j ≥ Mn + 1, j −2α n ≤ (Mn + 1)−2α n ≤ 1, so that whenever j ≥ Mn + 1, EθMn ,θj =0 [θˆj2 ] + EθMn ,θj =1 [(θˆj − 1)2 ] ≥ −j

−j

{ } 1 C 2L exp − 1 2 . 4 8σ

5 Mn Since Mn + 1 ≤ j ≤ 2Mn and θ−j ∈ {0, 1}Mn −1 are arbitrary, combining this inequality with (1), we have

inf

sup

θˆMn θ Mn ∈{0,1}Mn

≥ ∑2Mn

Since j=Mn +1 conclusion.

j −2α

EθMn [∥fθˆMn − fθMn ∥2 ]

{ } C12 C 2L exp − 1 2 8 8σ ∼

Mn−2α+1



2M ∑n

j −2α .

(3)

j=Mn +1

n−(2α−1)/(2α) ,

we obtain the desired □

Proof of Theorem 2. It is not difficult to see that inf sup Pf (∥fˆ − f ∥2 > cn−(2α−1)/(2α) ) fˆ f ∈F (α,C1 )

≥ inf

sup

θˆMn θ Mn ∈{0,1}Mn

PθMn (∥fθˆMn − fθMn ∥2 > cn−(2α−1)/(2α) ),

where inf θˆMn is taken over all estimators θˆMn ∈ [0, 1]Mn of θMn . Denote by δ1n the right side on (3). Fix arbitrary estimator θˆMn ∈ [0, 1]Mn −1 of θMn . Since {0, 1}Mn is a finite set and the supremum over {0, 1}Mn is attained, there exists a sequence θMn such that EθMn [∥fθˆMn − fθMn ∥2 ] ≥ δ1n . Moreover, 2M ∑n

∥fθˆMn − fθMn ∥ ≤ C1 2

j

−2α

j=Mn +1

so that E[∥fθˆMn − fθMn

∥4 ]



2 . δ2n

2M ∑n

(θˆj − θj )2 ≤ C12

j −2α =: δ2n ,

j=Mn +1

We recall the Paley-Zygmund inequality.

Lemma 3 (Paley-Zygmund inequality). Let Z be a real valued random variable with finite second moment. Then for every λ ∈ (0, 1), P(Z ≥ λE[Z]) ≥ (1 − λ)2

(E[Z])2 . E[Z 2 ]

Apply the Paley-Zygmund inequality with λ = 1/2 and Z = ∥fθˆMn − fθMn ∥2 . Then ( ) PθMn ∥fθˆMn − fθMn ∥2 ≥ δ1n /2 ) ( 1 2 2 ≥ PθMn ∥fθˆMn − fθMn ∥ ≥ EθMn [∥fθˆMn − fθMn ∥ ] 2 { } 2 1 1 (E[∥fθˆMn − fθMn ∥2 ])2 δ1n C12 L ≥ ≥ 2 ≥ exp − 2 . 4 E[∥fθˆMn − fθMn ∥4 ] 256 4σ 4δ2n Therefore, we conclude that

{ } C12 L 1 2 ˆ exp − 2 . lim inf inf sup Pf (∥f − f ∥ > δ1n /2) ≥ n→∞ fˆ f ∈F (α,C ) 256 4σ 1

6

Since δ1n ∼ n−(2α−1)/(2α) , we obtain the desired conclusion.



References [1] Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Statist. 33 2904-2929. [2] Hall, P. and Horowitz, J.L. (2007). Methodology and convergence rates for functional linear regression. Ann. Statist. 35 70-91. [3] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer. [4] Yuan, M. and Cai, T. (2010). A reproducing kernel Hilbert space approach to functional linear regression. Ann. Statist. 38 3412-3444.

Minimax lower bounds via Neyman-Pearson lemma

May 7, 2013 - with Le Cam's inequality [3, Lemma 2.3]. Lemma 2. Let (S,S,µ) be a measure space, and let p, q be probability den- sity functions with respect to ...

59KB Sizes 1 Downloads 199 Views

Recommend Documents

Message Lower Bounds via Efficient Network ...
Nov 28, 2016 - wireless sensor networks, where processors are powered by batteries with .... advantage of synchronicity in order to exploit silence, and uses ...

Setting Lower Bounds on Truthfulness
prove our lower bounds for this problem in the inter-domain routing setting ..... 100 otherwise. Let M be a deterministic, weakly-monotone, mechanism that ...... We assume that for every i vi(∅) = 0 (free disposal) and for every two bundles of ...

Improved Lower Bounds for Non-Utilitarian Truthfulness
Allocation algorithm – attends to the algorithmic issue. (solves the ... Given any optimal allocation algorithm, there is no payment scheme ...... Open Questions.

Lower Complexity Bounds for Interpolation Algorithms
Jul 3, 2010 - metic operations in terms of the number of the given nodes in order to represent some ..... Taking into account that a generic n–.

Lower Bounds on Deterministic Schemes for the ...
of space and 2 probes is presented in a paper by Radhakrishnan, Raman and Rao[2]. ... BigTable. Google uses nondeterministic space efficient data structures ...

Non-trivial derandomization implies weak lower bounds
Aug 15, 2017 - fixed size and depth, one needs a non-trivial derandomization algorithm .... Journal of Computer and System Sciences, 65(4):672–694, 2002. 3 ...

Non-trivial derandomization implies weak lower bounds
Oct 17, 2017 - randomness r ∈ {0, 1}n , and a proof-oracle 7x ∈ {0, 1}2n for x, where n = n +. O(log(n)); the verifier issues poly(n) queries to 7x, verifies the ...

Lower Bounds on the Minimum Pseudo-Weight of ...
Nov 30, 2003 - indices are in Vr. We call C a (j, k)-regular code if the uniform column weight ..... Proof: In App. E of [14] the above lower bound for the minimum ...

LOWER BOUNDS FOR RESONANCES OF INFINITE ...
D(z) of resonances at high energy i.e. when |Re(z)| → +∞. The second ...... few basic facts about arithmetic group. Instead of detailing the ..... An alternative way to contruct similar convex co-compact subgroups of. PSL2(Z) with δ close to 1 i

Non-trivial derandomization implies weak lower bounds
Oct 25, 2017 - (Indeed, in this text we focus on derandomization of circuits with one-sided error.) ... The current text presents an alternative argument, in which. “weak lower .... Eli Ben-Sasson and Madhu Sudan. Simple PCPs with poly-log rate and

Cramer-Rao Lower Bounds for Time Delay and ...
ZHANG Weiqiang and TAO Ran. (Department of Electronic ... searches are not system atic and exten

Square Deal: Lower Bounds and Improved Relaxations ...
Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery. Cun Mu, Bo Huang, John Wright and Donald Goldfarb. Introduction. Recovering a ...

MiniMax-Manual.pdf
Page 2 of 24. 2. CONTENTS. TABLE OF CONTENTS. Important Safety Instructions 3-4. Parts—Exploded View & Identification. MiniMax Desktop 5. MiniMax ...

Lower bounds for a conjecture of Erd˝os and Turán
Jun 1, 2009 - n ∈ Nj. It follows that the union of the odd Nj satisfies. ⋃ j odd. Nj = {2a : a ∈ A}. In the language of their representation functions,. S1(z) + S3(z) ...

Lower Bounds on Black-Box Reductions of Hitting to ...
Start with tentative set S={0,1}n. 2. Allow “big” queries; answer honestly. 3. Disallow “small” queries; erase from tentative set. ⇒ def of “small” decays exponentially across iterations. ⇒ fixing any queried-set Q, we won't erase too

Curriculum Vitae Christina Pawlowitsch - LEMMA - Paris 2
1 Education and academic degrees. 1998–2004. Undergraduate studies in Economics, .... Work related interests. Academic writing, theories of writing, and ...

Curriculum Vitae Christina Pawlowitsch - LEMMA - Paris 2
1 Education and academic degrees. 1998–2004. Undergraduate ... “Game theory for Linguists,” 4 ECTS, (in English) for the Master program in Linguistics ... I have been a reviewer for the National Science Fund, USA. Co-organizer of the Paris ...

A new method to obtain lower bounds for polynomial ...
Dec 10, 1999 - Symposium on Applied Algebra, Algebraic Algorithms and Error Correcting. Codes, AAECC-5, Lecture Notes in Computer Science vol.

Lower bounds for a conjecture of Erd˝os and Turán
Jun 1, 2009 - Abstract. In this work we study representation functions of asymptotic ad- ditive bases and more general subsets of N (sets with few nonrepre-.

Prosody and lemma selection
long in comparison with RTs found in typical picture- .... preted as suggesting that people are able to plan the met- ..... phones in speech production. Journal of ...

FICHA TECNICA TALADRO MULTIPLE ADVANCE 21 MINIMAX ...
FICHA TECNICA TALADRO MULTIPLE ADVANCE 21 MINIMAX ITALIA.pdf. FICHA TECNICA TALADRO MULTIPLE ADVANCE 21 MINIMAX ITALIA.pdf. Open.

Minimax Optimal Algorithms for Unconstrained ... - NIPS Proceedings
regret, the difference between his loss and the loss of a post-hoc benchmark strat- ... While the standard benchmark is the loss of the best strategy chosen from a.