Lecture notes on strong approximation M. Lifshits

Viewer
Transcript

Lecture notes on strong approximation M.

Lifshits∗

St-Petersburg State University, Faculty of Mathematics and Mechanics, 198904, Stary Peterhof, Bibliotechnaya pl.,2, Russia and Statistique et Probabilités F.R.E. CNRS 2222 Bât. M2, Université des Sciences et Technologies de Lille F-59655 Villeneuve d'Ascq Cedex France

Abstract We give an account of strong approximation theorems for sums of indePn pendent random variables Sn = 1 Xj . This type of theorems provides a joint construction of Sn and approximating Gaussian variables with the same covariance, while approximation rate depends on the moment properties of Xj . In view of rich application eld, our goal was to provide an interested reader with a suitable "approximation toolbox" rather than to ght in the depths of proofs and constructions. Starting from classical results of Komlós, Major, and Tusnády, we discuss less known results of Einmahl, Sakhanenko and Zaitsev concerning vector and non-uniformly distributed variables.

Résumé Nous présentons les théorèmes d'approximation Pn forte pour les sommes de variables aléatoires ind pendantes Sn = Xj . Ces théorèmes four1 nissent une construction conjointe sur le même espace probabilisé de Sn et des variables gaussiennes approximantes de même covariance, la vitesse de convergence dépendant des propriétés de moments des Xj . En raison de l'ampleur du champ des applications, notre but a été de fournir au lecteur intéressé un outillage opérationnel plutôt que de ferrailler avec les ultimes détails des preuves et constructions. A partir des résultats classiques de Komlós, Major et Tusnády, nous discutons ceux moins connus de Einmahl, Sakhanenko et Zaitsev relatifs aux vecteurs et variables aléatoires non identiquement distribués. ∗ PUB.

IRMA, LILLE 2000, Vol.53, N 13. Research supported by Russian Foundation of

Basic Research, grant N. 00-15-96019 and Universities of Russia N. 99-26-11.

1

1. Introduction Central limit theorem states that the distribution of sums of independent or weakly dependent random variables is close to the normal distribution. Functional limit theorem goes further and states that the distribution of the process of sequential sums in appropriate functional space is close the distribution of the Wiener process. In both cases only the similarity of the distributions is stated, not for the stochastic objects themselves. The same eect of normalization of the sums may be expressed dierently - by a construction of the two mutually close objects on the same probability

X = {X1 , . . . , Xj , . . .} be a sequence of Y = {Y1 , . . . , Yj , . . .} the corresponding sequence

space. Let, for example,

independent

variables and

of Gaussian

Yj are the Xj ). We try to construct on a probability space the sequences ˜ = {X ˜1, . . . , X ˜ j , . . .} and Y˜ = {Y˜1 , . . . , Y˜j , . . .} equidistributed with X and Y , X variables (which means that the expectation and the variance of

same as for

respectively so that the random variable

X k X k ˜ ˜ ˜ ˜ ∆n (X, Y ) = max Xj − Yj 1≤k≤n j=1 j=1

(1.1)

would be as small as possible. We can understand this goal in two ways. a) in terms of inequality, trying to provide small probability of for

˜ Y˜ ) ∆n (X,

to take big values. b) in terms of almost sure convergence - showing that at

˜ Y˜ ) ∆n (X,

n → ∞ the sequence

increases not faster than a known deterministic function dened by

moment characteristics of the sequence

X.

The statements of type (a) are fundamental while the statements of type (b) can be easily derived from them and are widely used in the dierent domains of probability and statistics. Since the statements of type (b) describe mutual approximation of two sequences with probability one, by analogy with the strong law of large numbers the whole research domain got the name of

strong approximation theory.

The proofs of the statements (a) and (b) are usually very dicult but a remarkably wide class of useful corollaries follow from them quite straightforward. Therefore, the instruments of strong approximation should be at hand of everybody concerned with theoretical research related to the sums of independent variables. The rst precise method of the strong approximation (so called method of embedding in Wiener process) was proposed by A.V. Skorokhod. During some time, his estimates were considered as the optimal ones.

It was discovered,

eventually that one can obtain much better results for sums of independent variables (while for martingale sequences Skorokhod estimates are really optimal). The optimal estimate for approximation of sums of independent identically distributed sums of random variables was discovered by Hungarian mathematicians J. Komlós, P. Major, G. Tusnády (1975) who suggested a method of dyadic ap-

2

proximation (called more often KMT-construction by authors' names, or Hungarian construction). We present below their main result. Further progress is primarily related to the works of A.I. Sakhanenko (sums of the independent non-identically distributed variables), U. Einmahl and A.Yu. Zaitsev (sums of independent vectors). The contents of these lecture notes partially represent a simplied contents of what during many years explained me A.Yu. Zaitsev. The introduction of his unnished monograph (1995) was also used here.

Without this friendly

assistance I could hardly prepare these lectures.

2. Theorem of Komlós, P. Major, G. Tusnády The next theorem presents an optimal estimate in strong approximation of sums of independent identically distributed random variables.

Theorem 2.1. (KMT inequality) Let X = {X1 , . . . , Xj , . . .} be a sequence of independent identically distributed random variables having a nite exponential moment, i.e. for some z > 0 it holds E exp{z|Xj |} < ∞. Then there exist positive constants C1 , C2 depending on the common distribution F of the variables Xj , such that for every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and a sequence of independent Gaussian variables {Y˜1 , . . . , Y˜n } having the same expectations and variances, on a common ˜ Y˜ ) dened in (1.1) would satisfy probability space so that the dierence ∆n (X,

n o ˜ Y˜ ) ≤ 1 + C2 n1/2 . E exp C1 ∆n (X,

Corollary 2.2.

(2.1)

Under assumptions of Theorem 2.1 we have for every

n o ˜ Y˜ ) ≥ x ≤ exp {−C1 x} (1 + C2 n1/2 ). P ∆n (X,

x>0 (2.2)

This inequality immediately follows from (2.1) by application of exponential Chebyshev inequality. The KMT inequality is more often referred in this form. However, the statement (2.1) proposed by Sakhanenko looks more natural since it does not contain unnecessary parameter

x.

We will discuss the dependence of parameters

C1 , C2

on the distribution

F

later (see section 3) while investigating strong approximation of sums of nonidentically distributed variables. Note that the construction provided by Theorem 2.1 concerns of variables.

nite number

However, one can easily derive from this result an information

about strong approximation of innite sequence.

3

Corollary 2.3.

Under assumptions of Theorem 2.1 one can construct on a

common probability space an innite sequence tributed with

X

˜ = {X ˜1, . . . , X ˜ j . . .} equidisX {Y˜1 , . . . , Y˜j . . .}

and a sequence of independent Gaussian variables

with the same expectations and variances such that with probability one it holds

n X

˜j − X

j=1

n X

Y˜j = O(log n).

(2.3)

j=1

m

2 and Nm = P Proof of the Corollary. For m = 1, 2, . . . let nm = 2 k≤m nk , and also N0 = 0. According to Corollary 2.2, for every m we construct ˜ n(m) ˜ (m) , . . . , X ˜ (m) = {X on appropriate probability space Ωm the sequences X m } 1 ˜ (m) , . . . , Y˜n(m) and {Y } satisfying the estimate (2.2), i.e. m 1

n o ˜ (m) , Y˜ (m) ) ≥ x ≤ exp {−C1 x} (1 + C2 n1/2 P ∆nm (X m ). In particular, the probability series

X

n o X ˜ (m) , Y˜ (m) ) ≥ A log nm ≤ P ∆nm (X exp {−C1 A log nm } (1 + C2 n1/2 m )

m

m

(2.4) converges if

C1 A > 1/2. Ω = ˜ j , Y˜j , Nm−1 < j ≤ Nm as X

Now we transfer all constructions on the common probability space

Q

m Ωm dening for follows,

ω = (ω1 , ω2 , . . .)

the variables

˜ j (ω) = X ˜ (m) X j−Nm−1 , Note that for every

m

(m) Y˜j (ω) = Y˜j−Nm−1 .

it holds

˜ Y˜ ) ≤ ∆Nm (X,

m X

˜ (m) , Y˜ (m) ). ∆nk (X

k=1 Taking into account the estimate (2.4) and Borel-Cantelli lemma we have almost surely

˜ Y˜ ) = O ∆Nm (X,

m X

! A log nk

= O(2m ) = O(log Nm ).

k=1

˜ Y˜ ) is non-decreasing and for every n ∈ (Nm−1 , Nm ] Since the sequence ∆n (X, it holds 2 log n ≥ log Nm , we can pass from the subsequence {∆Nm } to entire sequence {∆n } and obtain the required estimate (2.3). It is not dicult to show that the estimate (2.3) is optimal. In fact, let us consider the sequence of independent random variables with symmetric exponential distribution, i.e.

P{|Xj | ≥ r} = exp{−r}. Then X X P{|Xj | ≥ a log j} = j −a j

j

4

for

a>0

the series

converges i

a > 1.

By Borel-Cantelli lemma we have

lim sup |Xj |/ log j = 1 j→∞ almost surely. On the other hand, it is well known that for Gaussian variables

√ Y˜j = O( log j) = o(log j).

Therefore, by any construction of

˜ Y˜ X

we have

˜ Y˜ )/ log n ≥ 1/2 lim sup ∆n (X, j→∞ with probability one, and the optimality of (2.3) follows. In fact, the optimality of (2.3) holds in much stronger and even surprising sense: if in (2.3) one can replace

O(log j)

by

o(log j),

then the variables

Xj

are

themselves normal, as shown by Bártfai (1966). The strength of KMT estimate (2.3) becomes even more obvious, if we recall that the preceding technique of Skorokhod yielded similar asymptotics only of the order

O(n1/4 ).

It might seem from the rst glance that the nitness of the exponential moment assumed in Theorem 2.1 severely restricts applications of this result. In fact, truncation manipulations enable to deduce from Theorem 2.1 the similar approximation estimates for the variables obeying much less restrictive moment assumptions.

Theorem 2.4. Let X = {X1 , . . . , Xj , . . .} be a sequence of independent identically distributed variables with nite absolute moment of the order p > 2, i.e. ˜ = {X ˜1, . . . , X ˜ j . . .} equidisE|Xj |p < ∞. Than we can construct a sequence X tributed with X and a sequence of independent Gaussian variables {Y˜1 , . . . , Y˜j . . .} with the same expectations and variances on a common probability space so that it holds with probability one, n X j=1

˜j − X

n X

Y˜j = o(n1/p ).

(2.5)

j=1

We deduce this theorem from more general results in the next section. Of special interest is Major's (1979) result which considers the case

p=2

(not covered by Theorem 2.4), the most interesting from the point of view of CLT's conditions.

Theorem 2.5. Let X = {X1 , . . . , Xj , . . .} be a sequence of independent identically distributed random variables with zero means and unit variances. Then we can construct a sequence X˜ = {X˜ 1 , . . . , X˜ j . . .} equidistributed with X and a sequence {Y˜1 , . . . , Y˜j . . .} of independent Gaussian variables with zero means and variances

EY˜j2 = Var(Xj 1|Xj |≤n ) → 1, 5

on a common probability space so that n X

˜j − X

j=1

Dividing (2.6) by

n1/2

n X

Y˜j = o(n1/2 ).

(2.6)

j=1

and considering the distributions of the normalized

sums we obtain, as a trivial corollary, Lévy's CLT. It is interesting to notice that without changing the variances one can only obtain

n X

˜j − X

j=1

n X

Y˜j = o((n log log n)1/2 )

j=1

( Strassen (1964)) and this estimate is optimal.

Remark.

There is another KMT result similar to Theorem 2.1. It is mostly

important in mathematical statistics and concerns the strong approximation of empirical distribution function.

3. Strong approximation of non-identically distributed variables In this section we consider the analogues of KMT-results on the strong approximation of sums of independent random variables having

dierent distribu-

tions. It is intuitively clear that one should assume a certain uniform boundedness, or at least, their uniform closeness to the class of Gaussian variables. One can understand this uniform boundedness very dierently - in the language of usual moments (Bernstein condition), in the language of exponential moments (Sakhanenko), or in the language of characteristic functions (Zaitsev).

Our

exposition concerns all involved parameters and their relations.

Cramér condition.

Random variable

X

has a nite exponential moment, i.e. for some

satises Cramér's condition, if it

h>0

it holds

E exp{h|X|} < ∞. Respectively, we introduce

Cramér's parameter

h(X) = sup{h : E exp{h|X|} < ∞} as a characteristic of concentration of the distribution of r.v.

X.

In what fol-

lows, we see that the most interesting strong approximation results hold under Cramér's condition but Cramér's parameter is inappropriate tool for quantitative estimates.

6

Sakhanenko parameter. Let

X

be a random variable with zero mean and satisfying Cramér condition.

Dene Sakhanenko parameter, following his work (Sakhanenko (1984)). Let

λ(X) = sup λ : λE|X|3 exp{λ|X|} ≤ EX 2 . E|X|3 exp{λ|X|} on the right-hand side is nite for λ < h(X), so that 0 < λ(X) < ∞ ( with exception of degenerate case X = 0, when λ(X) = ∞). Obviously, λ(X) depends only on the distribution of X . Note also −1 that λ(·) is homogeneous functional of degree −1, i.e. λ(cX) = |c| λ(X). If the variable X is bounded, it is easy to estimate λ(X). Indeed, let |X| ≤ a. For all x ∈ (0, a) write an inequality

The expression

λx3 eλx = λxeλx x2 ≤ λaeλa x2 . Letting

λ = 1/2a,

we obtain

λ E|X|3 eλX ≤ Therefore,

λ(X) ≥ 1/2a

We can also connect

p e/4 EX 2 < EX 2 .

λ(x)−1 ≤ 2a. variance of X . Indeed,

or, equivalently,

λ(X)

with

write

(EX 2 )3/2 ≤ E|X|3 ≤ E|X|3 eλ|X| ≤ λ−1 EX 2 . Hence

VarX = EX 2 ≤ λ(X)−2 .

(3.1)

Now we connect the Sakhanenko parameter with another, more classical object,

Bernstein parameter b(X) which is also dened for random variables with zero mean but may be expressed in terms of classical polynomial moments of

b(X) = inf{τ : |EX m | ≤

m! m−2 τ Var(X), 2

m = 3, 4, . . .}

|X| ≤ a, then b(X) ≤ a. Bernstein parameter controls directly the |X|. Show now that we can also control odd absolute moments. τ = b(X). For every odd number m ≥ 3 we have, by Hölder inequality,

Note that if

even moments of Denote

X:

E|X|m

= E(|X|(m−1)/2 |X|(m+1)/2 ) ≤ [EX m−1 ]1/2 [EX m+1 ]1/2 1/2 (m − 1)! m−3 (m + 1)! m−1 ≤ τ Var(X) · τ Var(X) 2 2 m! m−2 = τ Var(X)[(m + 1)/m]1/2 2 m! ≤ √ τ m−2 Var(X). 3

7

Therefore, for every real

E|X|3 e|uX|

∞ X

=

E|X|m+3 |u|m /m!

m=0 ∞ X

(m + 3)! Var(X)τ m+1 |u|m √ 3 m! m=0

≤

∞ Var(X)τ X √ (m + 3)(m + 2)(m + 1)(τ |u|)m = 3 m=0

=

=

u

Var(X)x √ 6(1 − x)−1 + 18x(1 − x)−2 + 18x2 (1 − x)−3 + 6x3 (1 − x)−4 , 3|u|

where

x = τ |u|.

Letting

u = (7τ )−1 , x = 1/7,

we have

E|X|3 e|uX| ≤ 0.92|u|−1 Var(X), λ(X) ≥ (7τ )−1

i.e.

or, equivalently,

almost obvious. Let

λ = λ(X).

λ(X)−1 ≤ 7b(X). m≥3

Then for every

The inverse inequality is one has

E|X|3 (λ|X|)m−3 ≤ E|X|3 e|λX| ≤ λ−1 Var(X), (m − 3)! which implies

E|X|m ≤ (m − 3)!λ2−m Var(X) ≤

m! 2−m λ Var(X) 2

and

b(X) ≤ λ(X)−1 .

Discuss now briey the relations between Bernstein (or equivalent, as we have seen, Sakhanenko) parameter and Cramér parameter.

We have an easy

inequality

b(X) ≥ h(X)−1 . In fact, if

h b(X) < 1,

Eeh|X| ≤

X hm |EX m | h2 EX 2 X hm b(X)m−2 VarX ≤ 1+h E|X|+ + < ∞. m! 2 2

then

m≥0

m≥3

In opposite direction one can show only that

b(X) ≤ 2 inf

h>0

Eeh|X| , h3 VarX

which means that Bernstein and Sakhanenko parameters are nite i Cramér parameter is. But the following example shows that there is no upper estimate

8

b(X) via h(X). Let R ≥ 0 and dene the XR by its density p(x) = 12 eR−|x| 1[R,∞) (|x|). of

b(XR ) ≥

distribution of a random variable Then

h(XR ) = 1

but

4 EXR R2 ∼ → ∞ (R → ∞). 2 12 EXR 12

The following theorem presents the optimal bound of strong approximation of sums of independent variables in terms of Sakhanenko parameter.

Theorem 3.1. (Sakhanenko exponential inequality) Let X = {Xj } be a sequence of independent random variables with nite h(Xj ). Then for every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n }, equidistributed with {X1 , . . . , Xn }, and the sequence of Gaussian variables {Y˜1 , . . . , Y˜n }, with the same expectations and variances on the common probability space so that for the dierence ˜ Y˜ ) dened in (1.1) one has ∆n (X,

n o ˜ Y˜ ) ≤ 1 + λBn , E exp C3 λ∆n (X,

where C3 is a numeric constant, λ = inf j≤n λ(Xj −EXj ), and Bn2 = Corollary 3.2.

(3.2) Pn

j=1

VarXj .

KMT inequality (2.1) obviously follows from (3.2). More-

over, one can let therein

C1 = C3 λ(X1 ), C2 = λ(X1 )(VarX1 )1/2 .

Explicit dependence of all parameters in (3.2) on the distribution enables its application to the truncated random variables when the latter do not have nite exponential moments. As a result one can obtain some versions of (3.2), though obviously more modest. For example, for the variables with moments of the order

p > 2,

the following analogue of Theorem 3.1 holds.

Theorem 3.3. (Polynomial Sakhanenko inequality) Let X = {Xj } be a sequence of independent random variables with nite moments of order p > 2. Then for every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and the sequence of independent Gaussian variables {Y˜1 , . . . , Y˜n } with the same averages and variances on a common probability ˜ Y˜ ) dened in (1.1) one has space so that for the dierence ∆n (X,

˜ Y˜ )p ≤ C(p) E∆n (X,

n X

E|Xj − EXj |p ,

(3.3)

j=1

and the constant C(p) depends only on p. There exists such constant C 0 (p) that the construction from preceding theorem is possible so that (3.3) holds for all n with the change of C(p) with C 0 (p). Proposition 3.4.

9

Proof. Assume rst that there is a partition of the set of all positive integers in blocks

{Nm }

and

X

E|Xj − EXj |p ≤ Tm .

j∈Nm Then, by Sakhanenko's Theorem 3.3 we can construct for each block approximating Gaussian sequences

∆m

{Y˜j , j ∈ Nm }

so that the corresponding distances

satisfy

E∆pm ≤ C(p)Tm . Without loss of generality, we may assume that all constructions are realized on a common probability space. Then we have inequalities

1/p E∆m ≤ C(p)1/p Tm , Furthermore, letting

˜ m |p E|∆ ˜ 2m E∆ Let

S=

P

m

2/p E∆2m ≤ C(p)2/p Tm .

˜ m = ∆m − E∆m , ∆

we have similar relations

≤ E∆pm + (E∆m )p ≤ 2E∆pm ≤ 2 C(p)Tm , 2/p ≤ E∆2m ≤ C(p)2/p Tm .

∆m , S˜ = S − ES.

By Rosenthal inequality (see Petrov, Theorem

3.5.19) we have

 X ˜ p ≤ cR (p)  ˜ m |p + E|S| E|∆

X

m

m

 cR (p)C(p) 2

X

X

Tm +

m with some constant

cR (p)

X

!p/2  2/p  Tm

m

depending on

ES =

!p/2  ˜ 2m ≤ E∆

p

only. We also have

E∆m ≤ C(p)1/p

m

X

1/p Tm .

m

Therefore,

˜ p + (ES)p ] ≤ ES p = E(S˜ + ES)p ≤ 2p [E|S|  !p/2 !p  X X X 2/p 1/p . 2p C(p) 2cR (p) Tm + cR (p) Tm + Tm m

m

(3.4)

m

Now we prove Proposition 3.4. Without loss of generality, we may let

EXj = 0.

Consider two cases.

p j E|Xj | = by the formula

a) Let

m < ∞}

P

Nm

∞.

In this case we construct the blocks

    X E|Xj |p ≤ 2m . = n : 2m−1 <   j≤n

10

{Nm , −∞ <

Accordingly one may let

Tm = 2m .

M X

For all

˜ Y˜ ) ≤ ∆n (X,

and all

n ∈ NM

it holds

∆m .

m≤M

Tm

By substitution of the values of

in (3.4), we have

p

 X

˜ Y˜ )p ≤ E  E∆n (X,

∆m  ≤

m≤M p+1+M

2

2/p

C(p) 2cR (p) + cR (p) 2

while

X

−p/2 −p 1/p −1 + 2 −1 ,

E|Xj |p > 2M −1

j≤n and relation (3.3) is established.

p j E|Xj | < ∞. The simple construction from the previous case does not work since one of the classes would contain innite number of indices b) Let

n.

T =

P

Now we construct two types of blocks. As in previous case, for

Nm

m<0

let

    X = n : 2m−1 T < E|Xj |p ≤ 2m T   j≤n

and

m

Tm = 2 T . m > 0 let

For

Nm

    X = n : 2−m T < E|Xj |p ≤ 21−m T   j≥n

and

Tm = 2−m T .

By this construction the only index which is not covered is

n0 = inf{n :

X

E|Xj |p > T /2}.

j≥n It will become the contents of the block For indices

∪m≥0 Nm

n ∈ ∪m<0 Nm

N0

and we let

T0 = T .

the estimates of case a) still work.

we have by (3.4)

˜ Y˜ )p ≤ E E∆n (X,

∞ X

!p ∆m

m=−∞ while

X

E|Xj |p > T /2

j≤n and (3.3) is again established.

11

≤ C 0 (p)T,

For

n ∈

Next, we show how to use the estimate (3.3) for getting rather arbitrary approximation rates. Here is the result for the variables with known estimates of polynomial moments.

Theorem 3.5. (Q.Shao (1995) Let X = {Xj } be a sequence of random variables with zero averages, Hj % ∞ be a positive sequence, and for some

p>2

∞ X E|Xj |p j=1

Hjp

< ∞.

(3.5)

Then one can construct a sequence X˜ = {X˜ j } equidistributed with X and a Gaussian sequence Y˜ = {Y˜j } of independent variables satisfying EY˜j = 0 and VarY˜j = VarXj on a common probability space so that for the dierence ˜ Y˜ ) dened in (1.1) it holds ∆n (X, ˜ Y˜ ) = o(Hn ). ∆n (X,

The next theorem shows how one can obtain KMT-type estimates for the case when the tails of the variables are uniformly bounded.

In the case of

identically distributed variables this theorem immediately implies Theorem 2.5, and after some extra work, Theorem 2.4.

Theorem 3.6. (Q.Shao, U.Einmahl) Let Z be a positive random variable, let X = {Xj } be a sequence of independent variables, and G : R+ → R+ , be a function such that following conditions hold. a) For some α > 1 the function x → G(x)/xα is non-decreasing; b) For some q > 0 the function x → G(x)/xq is non-increasing; c) For some c > 0 and all r ≥ 0 it holds

sup P{|Xj | ≥ r} ≤ c P{Z ≥ r}; j

d) EG(Z) < ∞. Then one can construct a sequence X˜ = {X˜ j } equidistributed with X and sequence of independent Gaussian variables Y˜ = {Y˜j } with expectations EXj and variances Var Y˜j2 = Var(Xj 1G(|Xj |)≤j ),

on the common probability space so that almost surely n X j=1

˜j − X

n X

Y˜j = o(G−1 (n)).

j=1

12

(3.6)

Proof of Theorem 3.5. consider a sequence

Hj0

First, we take something slightly less than

Hj0 = o(Hj )

such that

∞ X E|Xj |p

Let

Xj0 = Xj /Hj0

< ∞.

(Hj0 )p

j=1

and construct on a common probability space

responding Gaussian sequence

Y˜ 0

i.e.

˜ 0 , Y˜ 0 )p ≤ C(p) E∆n (X

so that for every

n

n X

n X E|Xj |p

E|Xj0 |p = C(p)

j=1

(Hj0 )p

Since the right-hand side of (3.7) is uniformly bounded over

∞ X E|Xj |p j=1

where

˜ 0 , Y˜ 0 ) = supn ∆n (X ˜ 0 , Y˜ 0 ). ∆∞ (X Sn0 =

n X

˜ 0 , Y˜ 0 ) |Sn0 | ≤ ∆n (X

and a cor-

n,

.

(3.7)

we have

< ∞,

Therefore, the sequence

˜0 − X j

j=1 which satises

(Hj0 )p

˜0 X

inequality (3.3) holds,

j=1

˜ 0 , Y˜ 0 )p ≤ C(p) E∆∞ (X

Hj :

yet still

n X

Y˜j0

j=1

is a.s. bounded.

On the same probability space, we bring variables back to the right scale by letting

0 ˜0 ˜j = H 0 X ˜0 ˜ X j j , Yj = Hj Yj .

dierences.

k X

˜j − X

j=1 k X

k X j=1

We obtain the following estimates for

Y˜j =

k X

˜ j0 − Y˜j0 ) = Hj0 (X

j=1

0 Hj0 (Sj0 − Sj−1 ) = Hk0 Sk0 −

j=1

k−1 X

0 Sj0 (Hj+1 − Hj0 ).

j=1

Hence,

X k X k ˜j − ˜j ≤ 2Hk0 sup |Sj0 | = O(Hk0 ) = o(Hk ). X Y 1≤j<∞ j=1 j=1 Since the variables

˜j X

and

Y˜j

have the prescribed distribution and are indepen-

dent, the proof of Theorem 3.5 is complete.

Proof of Theorem 3.6.

Let

EXj = 0.

Dene

Hj = G−1 (j)

and split

Xj

in three variables,

Xj = Xj 1|Xj |≤Hj − E[Xj 1|Xj |≤Hj ] + Xj 1|Xj |>Hj − E[Xj 1|Xj |>Hj ]

13

(3.8)

In fact, only the rst term merits an approximation (using Theorem 3.5). We check that assumption (3.5) holds for the variables

p>q

Let us verify that for every

Xj 1|Xj |≤Hj −E[Xj 1|Xj |≤Hj ].

it holds

∞ X E|Xj |p 1|Xj |≤Hj

Hjp

j=1

< ∞.

(3.9)

Using integration by parts and assumption (c) we obtain that for every and every

u>0

j

E|Xj |p 1|Xj |≤u =

Z

u

pxp−1 P{|Xj | ≥ v}dv − up P{|Xj > u} ≤

0

Z c

u

pxp−1 P{Z ≥ v}dv = c EZ p 1Z≤u + c up P{Z > u}.

0 Applying this estimate to

u = Hj ,

we need next to estimate the series

∞ X EZ p 1Z≤Hj

Hjp

j=1

∞ X

,

P{Z > Hj }.

j=1

For the second series, assumption (d) yields

∞ X

[G(Z)]

P{Z ≥ Hj } = E

j=1

X

1 ≤ EG(Z) < ∞.

(3.10)

j=1

For the rst series

∞ X EZ p 1Z≤Hj

Hjp

j=1 where

N = N (Z) =

≤ EZ p

∞ X

Hj−p ,

j=N (Z)

[G(Z)] + 1, G(Z),

if if

G(Z) G(Z)

is not integer; is integer.

Furthermore, it follows from assumption (b) of our Theorem that the function

y → G−1 (y)y −1/q

is non-decreasing. Hence, for every

j≥N

Hj G−1 (j) G−1 (N ) = ≥ , j 1/q j 1/q N 1/q j 1/q G−1 (N ) , N 1/q ∞ X ≤ G−1 (N )−p N p/q j −p/q ≤ Hj ≥

∞ X

Hj−p

j=N

j=N

c(p, q)G−1 (N )−p N ≤ c(p, q)Z −p N. 14

we have

G−1 (N (Z)) ≥ Z ).

( The last passage makes use of the inequality

∞ X EZ p 1Z≤Hj

Hjp

j=1

Hence,

≤ c(p, q)EN (Z) ≤ c(p, q)(EG(Z) + 1) < ∞.

Therefore, condition (3.9) is veried.

Since for every random variable

V

by

Jensen inequality

E|V − EV |p ≤ 2p E max{|V |p ; |EV |p } ≤ 2p (E|V |p + |EV |p ) ≤ 2p+1 E|V |p , we observe that the rst terms in (3.8) obey the assumption of Theorem 3.5; hence, they admit a Gaussian approximation with required error

o(Hn ) =

o(G−1 (n)). Now we prove that the second and the third term in (3.8) are negligible. Indeed, for the second term we infer from (3.10) that

X

X

P{|Xj | > Hj } ≤ c

j

P{Z ≥ Hj } < ∞.

j

Therefore, by Borel-Cantelli lemma, the second term eventually vanishes. For the estimate of the third term we just use the well known Kronecker lemma. It states that for every sequence convergence

xj j cj

P

<∞

yields

1 cn

P

j≤n

xj and every positive sequence cn % ∞ xj → 0. Therefore, while checking

1 X E[Xj 1|Xj |>j ] → 0, Hn j≤n

which would kill the third term, it suces to show that the sum

∞ X E|Xj |1|Xj |>Hj

Hj

j=1 is nite. Since for every

u>0

and every

j Z

∞

E|Xj |1|Xj |>u = u P{|Xj | > u} +

P{|Xj | > v}dv ≤ u

Z

∞

c u P{Z > u} + c

P{Z > v}dv = c EZ1Z>u , u

our problem reduces to the study of expectation

 E Z

 X

Hj−1  .

j
G−1 (x)x−1/α is j ≤ m we have

non-decreasing. Hence, for

m = m(Z) = [G(Z)] + 1

Hj G−1 (j) G−1 (m) Hm = ≥ = 1/a , j 1/a j 1/a m1/a m 15

x→

and every

Hj ≥ α > 1,

and, by

X

Hm j 1/a , m1/a

−1 Hj−1 ≤ m1/α Hm

j≤m

X

−1 −1 j −1/α ≤ m1/α Hm c(α)m1−1/α = c(α) m Hm .

j≤m

Using inequality

Z ≤ Hm(Z) , we obtain     X X Hj−1  ≤ E Z Hj−1  ≤ E Z j

j≤m(Z)

−1 c(α)E Zm(Z)Hm(Z) ≤ c(α)Em(Z) ≤ c(α)(EG(Z) + 1) < ∞. This is sucient for application of Kronecker lemma and killing the third term in (3.8).

Proof of Theorem 2.4.

Without loss of generality, we may assume that

Xj have zero expectations and unit variG(x) = xp we construct Gaussian variables Y˜j o(n1/p ), although their variances are not unit, as

our identically distributed variables ances. Applying Theorem 3.6 to providing required error order

required. To repair it, we just let

Yj = (VarY˜j )−1/2 Y˜j . It is sucient to check that the dierences

n−1/p

n X

Zj = Yj − Y˜j

satisfy the estimate

Zj → 0.

j=1 By Kolmogorov's law of large numbers, it suces to check that the variance series converges, namely

∞ X

j −2/p Var(Zj ) < ∞.

(3.11)

j=1 Let

vj = VarYj

and write down the identity

−1/2

Zj = (vj

− 1)Y˜j =

1 − vj 1/2 −1/2 vj (vj

+ 1)

Y˜j .

By our construction from the proof of Theorem 3.3 we have

1 − vj

Hj = j 1/p

= EXj2 − EXj2 1|Xj |≤Hj + (EXj 1|Xj |≤Hj )2 = EXj2 1|Xj |>Hj + (EXj 1|Xj |>Hj )2 ≤ 2EXj2 1|Xj |>Hj = 2σj2 & 0. 16

and

Moreover,

∞ X

 σj2 j −2/p = E X12

j=1

∞ X

 1|X1 |>j 1/p j −2/p  ≤

j=1

c(p)E X12 [|X1 |p ]1−2/p ≤ c(p) E|X1 |p < ∞. Since

{σj2 }

is monotone sequence, we infer

 −1 n X 2−p j −2/p  ≤ c n p . σn2 ≤ c(p) E|X1 |p  j=1

p > 2,

Finally, using

∞ X

j

−2/p

we have

Var(Zj ) ≤

j=1

∞ X j=1

2−p

∞ X 2−2p (cj p )2 ≤c j p < ∞, (1 + o(1))j 2/p j=1

and (3.11) is established.

Zaitsev parameter.

Let

X

be a random variable.

Consider its complex

exponential moments,

Λ(u) = E exp{uX}, and dene

Zaitsev parameter1

u ∈ C,

by

τz (X) = inf{τ : |(log Λ)000 (u)| ≤ τ Var(X),

∀u : |u| ≤ τ −1 }.

Λ(·) is dened and dierentiable in the interior of {u : |u| ≤ τ −1 }. Obviously, when τz (X) ≥ τ , the real exponential mo−1 −1 ments E exp{λX}, λ ∈ (−τ , τ )) are also nite. Zaitsev parameter τz (X) shows how close is the distribution of r.v. X to the class of Gaussian distributions. Obviously, the condition τz (X) = 0 is equivalent to the fact that X is (3) Gaussian. Indeed, (log Λ) (·) = 0 means that log Λ is the polynomial of second degree. Unlike Sakhanenko parameter, if τz (X) is small, it does not yet mean that X is small (consider e.g. Gaussian variable with big variance). The functional τz (X) is homogeneous of degree 1, i.e. τz (cX) = |c|τz (X). We mean that the function

the circle

There is also a remarkable agreement with summation of independent variables (in other words with the convolution of distributions): for all independent variables

X, Y

it holds

τz (X + Y ) ≤ max{τz (X), τz (Y )}.

The next theorem presents an optimal bound for strong approximation in terms of Zaitsev parameter.

Theorem 3.7. (One-dimensional Zaitsev inequality) Let X = {Xj } be a sequence of independent random variables with zero expectations. Then for 1 A.Yu.

Zaitsev (1986)

17

every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and the sequence {Y˜1 , . . . , Y˜n } of Gaussian variables with the same (zero) expectations and variances, on a common probability space, so that for ˜ Y˜ ) dened in (1.1), it holds the dierence ∆n (X, n o ˜ Y˜ ) ≤ exp{C2 log+ (n/τ 2 )}, E exp C1 τ −1 ∆n (X,

(3.12)

with absolute numeric constants C1 , C2 , with τ = τn = supj≤n τz (Xj ), and log+ b = max{1, log v}. x>0 n o ˜ Y˜ ) ≥ C2 τ log+ (n/τ 2 )/C1 + x ≤ exp {−C1 x/τ } P ∆n (X,

We obtain immediately the exponential inequality - for every

it holds

(3.13)

The KMT approximation of type (2.3) also follows, if for innite sequence it holds

sup τz (Xj ) < ∞. 1≤j<∞

Links between Zaitsev and Sakhanenko parameters. Zaitsev parameter via Sakhanenko parameter. Let

X

First, we bound

be a random variable and

λ = λ(X) < ∞. Let D = Var(X) and Λ(u) = EeuX . We have already shown −2 that D ≤ λ(X) . In order to evaluate the parameter τz (X), we have to treat the value of

(log Λ)000 (u) =

Λ000 3Λ00 Λ0 2(Λ0 )3 (u) − (u) + (u) 2 Λ Λ Λ3

u ∈ C. Let |u| ≤ cλ. We give Λ0 , Λ00 , Λ000 appearing in (3.14).

for not very big derivatives

now some bounds for

(3.14) Λ

and its

0) It follows from the expansion

Λ(u) = EeuX = E(1 + uX + u2 X 2 /2 ± (|uX|3 /6 + ...)) that

|Λ(u)| ≥ 1 − |u|2 D/2 − |u|3 E(|X|3 e|uX| )/6 ≥ 1 − |u|2 D/2 − |u|3 λ−1 D/6 ≥ 1 − (c2 /2 + c3 /6) = c0 . 1) It follows from the expansion

Λ0 (u) = EXeuX = E(X + uX 2 ± |X|(|uX|2 /2 + ...)) that

|Λ0 (u)| ≤ D|u| + |u|2 E|X|3 e|uX| /2 ≤ D|u| + |u|2 λ−1 D/2 ≤ D|u|(1 + c/2). 2) It follows from the expansion

Λ00 (u) = EX 2 euX = E(X 2 ± X 2 (|uX| + ...)) 18

that

|Λ00 (u)| ≤ D + |u|E|X|3 e|uX| ≤ D + |u|λ−1 D ≤ D(1 + c). 3) We infer from representation

Λ000 (u) = EX 3 euX

that

|Λ000 (u)| ≤ E|X|3 e|uX| ≤ λ−1 D. Bringing up together all estimates, we infer

λ−1 D 3D(1 + c)D|u|(1 + c/2) 2(D|u|(1 + c/2))3 + + ≤ c0 c20 c30 1 3c(1 + c)(1 + c/2) 2c3 (1 + c/2)3 −1 λ D + + . c0 c20 c30

|(log Λ)000 (u)| ≤

Letting here

c = 1/3

yields

|(log Λ)000 (u)| ≤ 3λ−1 D = c−1 λ−1 D, hence

τz (X) ≤ c−1 λ−1 = 3λ(X)−1 .

Therefore, for every random variable

X

it

holds

τz (X) ≤ 3λ−1 (X). Inverse estimate of Sakhanenko (or, equivalently, of Bernstein) parameter via Zaitsev parameter, needs additional assumption on the variance. Assume

τz (X) ≤ 1, EX = 0 and D = Var(X) ≤ 1. Then we have for u : |u| ≤ 1 the |(log Λ)000 (u)| ≤ D, as well as the initial conditions (log Λ)00 (0) = D, 0 (log Λ) (0) = (log Λ)(0) = 0. Hence, integration yields that

inequality

|(log Λ)0 (u)| ≤ D(|u| + |u|2 /2) and

|(log Λ)(u)| ≤ D(|u|2 /2 + |u|3 /6). We obtain the moment inequality for all real

u,

|EX 2 (euX − 1)| = |Λ00 (u) − Λ00 (0)| ≤

Z

u

|Λ0 (v)|dv =

0

Z

u

|(log Λ)0 (v)Λ(v)|dv ≤

0

Z

u

D(|v| + |v|2 /2) exp{D(|u|2 /2 + |u|3 /6)}dv ≤

0

D(|u|2 /2 + |u|3 /6) exp{2/3} ≤ Letting

u = ±1

2 exp{2/3}D = c D. 3

and averaging we arrive to

X ∞ X e + e−X X 2m+2 c D ≥ EX 2 − 1 = E . 2 (2m)! m=1 For every even moment, we obtain

EX 2m+2 ≤ c(2m)!D. 19

For odd absolute moments, Hölder inequality yields

EX 2m+1

Since for all

23/2 c < 4.

m ≥ 3

it

≤ [EX 2m ]1/2 [EX 2m+2 ]1/2 p ≤ c D(2m − 1)! 2m/(2m − 1) √ ≤ 2cD(2m − 1)!. √ 2c ≤ (23/2 c)m−2 /2, we nally holds

obtain

b(X) ≤

By homogeneity of Zaitsev and Bernstein parameters for all zero

mean random variables, it holds

b(X) ≤ 4 max{τz (X),

p Var(X)}.

4. Strong approximation of sums of multivariate variables In this section we consider the same problem of approximation of sums as-

Rd . Therefore, we use such notions d d as Euclidean norm k · k, scalar product (·, ·) in R and in C . Recall that expecd d d d tation EX ∈ R and covariance operator D = CovX : R → R for R -valued variable X are dened by the relations

suming that the variables

Xj

take values in

(EX, v) = E(X, v), (Dv, w) = Cov((X, v), (X, w)) = E(X − EX, v)(X − EX, w). We denote by

∂v f

the partial derivative of

f

in direction

v ∈ Rd .

Similarly,

∂v2 f

denotes the partial derivative of second order. The results which one can obtain in multivariate case are similar to those one has in one-dimensional case. However, one has to take into consideration a new disturbing factor - the eventual degeneration of covariance operator. This is not so important for the case of identically distributed variables where linear variable change reduces the situation to the investigation of sums of vectors with unit covariances, but may really bring trouble if the we sum up the variables with covariances degenerated in dierent directions. Dene multivariate versions of Cramér condition and Bernstein, Sakhanenko and Zaitsev parameters.

We say that the random variable

Cramér condition, if for some

h>0

X ∈ Rd

satises

it holds

E exp{h||X||} < ∞. As in one-dimensional case, the most interesting results hold when Cramér condition is satised.

Sakhanenko parameter.

Let

X ∈ Rd

be a random variable with zero

mean and satisfying Cramér condition. Dene

Sakhanenko parameter by

λ(X) = sup λ : λ E (X, v)2 |(X, w)| exp{λ|(X, w)|} ≤ E(X, w)2 ,

20

∀v, w ∈ Rd : kvk = kwk = 1 . Bernstein parameter

is dened for the variables with zero mean and ex-

pressed in terms of the moments of r.v.

X:

b(X) = inf{τ : |E(X, v)2 (X, w)m−2 | ≤ ∀v, w ∈ Rd , ||w|| = 1, Zaitsev parameter.

Let

X ∈ R

d

m! m−2 τ E(X, v)2 , 2

∀m = 3, 4, . . . }.

be a random variable.

Consider its

complex exponential moments

Λ(u) = E exp{(u, X)}, and dene

u ∈ Cd ,

Zaitsev parameter by τz (X) = inf τ : ∂w ∂v2 (log Λ)(u)| ≤ τ (CovX v, v), ∀u ∈ Cd , v, w ∈ Rd : |u| ≤ τ −1 , kwk = kvk = 1 .

The relations between the three parameters are the same as in one-dimensional case. Bernstein and Sakhanenko parameters are equivalent, i.e.

[7 λ(X)]−1 ≤ b(X) ≤ λ(X)−1 . One can estimate Zaitsev parameter by Sakhanenko parameter, i.e. it holds for some absolute constant

c

that

τz (X) ≤ c b(X). For getting inverse estimate, one needs to use the maximal eigenvalue operator

Cov(X)

B2

of the

(in one-dimensional case we used the variance).

For the variables with zero mean and some absolute constant

c,

it holds

b(X) ≤ c max{τz (X), B}.

Theorem 4.1. ( Multivariate Zaitsev inequality) Let α, D, β1 , β2 > 0 and let X = {Xj } be a sequence of independent variables wit zero means and covariance operators Dj satisfying the assumption of uniform non-degeneracy,

β1 ||v||2 ≤ D2 (Dj v, v) ≤ β2 ||v||2 , ∀v ∈ Rd .

(4.1)

Then one can construct for every n the sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and the sequence of independent Gaussian variables {Y˜1 , . . . , Y˜n } with zero means and covariances Dj on a common probability space so that for the dierence

X k X

k ˜ ˜ ˜ ˜

Yj Xj − ∆n (X, Y ) = max

1≤k≤n

j=1 j=1 21

it holds

( E exp

˜ Y˜ ) C1 D∆n (X, τ d9/2 log+ d

) ≤ exp{C2 d3+α log+ (n/τ 2 )},

(4.2)

where C1 , C2 are some constants depending on α, β1 , β2 and τ = max{1, τz (X1 ), . . . , τz (Xn )}

. Corollary 4.2.

Under assumptions of the theorem the exponential inequal-

ity holds - for every

x>0

we have

n o ˜ Y˜ ) ≥ C2 τ d15/2+α log+ d log+ (n/τ 2 ) + x ≤ exp − P C1 ∆n (X,

Corollary 4.3.

x 9/2 τ d log+ d

If the covariance operators are uniformly bounded in the

sense of (4.1), and

sup τz (Xj ) < ∞, 1≤j<∞ then the KMT-type approximation for innite sequence holds,

n X

˜j − X

j=1

n X

Y˜j = O(log n).

j=1

In particular, this is true for the sums of identically distributed

Rd -valued

ran-

dom variables, satisfying Cramér condition.

Remark 4.4.

In view of the relations between Bernstein, Sakhanenko, and

Zaitsev parameters which we mentioned above, one can replace or by

λ(Xj )−1

τz (Xj ) by b(Xj )

in the statement of the theorem.

Remark 4.5.

On may meet some problems in application of the theorem

while checking the uniform bound (4.1). In this direction, the following gener-

2 Split the segment of integers

alization of Theorem 4.1 is of interest. consecutive blocks

N1 , . . . , Nl

covariances of blockwise sums,

β1 ||v||2 ≤ D2

X

(Dj v, v) ≤ β2 ||v||2 , ∀k ≤ l,

∀v ∈ Rd .

j∈Nk Then (4.2) holds with

2 This

n

[1..n]

in

and put, instead of (4.1) the restriction on the

replaced by

l

in the right-hand side.

is exactly Zaitsev's original result

22

Remark 4.6.

In some special cases one can obtain better dependence of

the parameters on the dimension. covariance operators, then

( E exp where

C1 , C2

For example, if all variables

3 one can write, instead of (4.2),

˜ Y˜ ) C1 D∆n (X, + 3 τ d log d

Xj

have unit

) ≤ exp{C2 d9/4+α log+ n},

are some constants depending on

α.

On the other hand, if all third-order moments of the summands vanish, one can drop the

log+ d

in the right-hand side of (4.2).

REFERENCES Bártfai, P. (1966), Die Bestimmung der zu einem wiederkehrenden Prozess gehörenden Verteilungfunktion aus den mit Fehlern behafteten Daten einer einzigen Realisation, Studia Sci. Math. Hungar., 1, 161168. Csörg®, M., Révész, P. (1981) Strong approximations in probability and statistics, New York, Academic Press. Einmahl, U. (1987) Strong invariance principles for partial sums of independent random vectors. Ann. Probab., 15, 1419-1440. Einmahl, U. (1989) Extensions of results of Komlós, Major, and Tusnády to the multivariate case. J.Multivar.Anal., 28, 20-68. Götze, F., Zaitsev, A.Yu. (1997) Hungarian construction for almost Gaussian vectors, Preprint N 97-071, Universität Bielefeld, 29 p. Komlós, J., Major, P., and Tusnády, G. (1975) An approximation of partial sums of independent RV'-s and the sample DF.I, Z. Wahrscheinlichkeitstheor. verw. Geb., 32, 111131. Komlós, J., Major, P., and Tusnády, G. (1976) An approximation of partial sums of independent RV'-s and the sample DF.II, Z. Wahrscheinlichkeitstheor. verw. Geb., 34, 3458. Major, P., (1979) An improvement of Strassen's invariance principle, Ann. Probab., 7, 55-61. Sakhanenko, A.I. (1984) Rate of convergence in the invariance principles for variables with exponential moments that are not identically distributed., In: Trudy Inst. Mat. SO AN SSSR, 3, Nauka, Novosibirsk, 449 (Russian). Shao, Q., (1995) Strong approximation theorems for independent random variables and their applications, J. Multivar. Anal., 52, 107-130. Strassen, V.,(1964) An invariance principle for the law of iterated logarithm, Z. Wahrscheinlichkeitstheor. verw. Geb., 3, 211-226. Zaitsev, A.Yu.

(1986) Estimates of the LévyProkhorov distance in the

multivariate central limit theorem for random variables with nite exponential moments, Theory of Probability and its Applications, 31, 203-220. Zaitsev, A.Yu.

(1998a), Multidimensional version of the results of Kom-

lós, Major, and Tusnády for vectors with nite exponential moments, ESAIM: Probability and Statistics, 2, 41-108.

3 Zaitsev

(1998a)

23

Zaitsev, A.Yu. (1998b), Multidimensional version of a result of Sakhanenko in the invariance principle for vectors with nite exponential moments, Preprint N 98-045, Universität Bielefeld, 82 p.

24