Lecture notes on strong approximation M.
Lifshits∗
St-Petersburg State University, Faculty of Mathematics and Mechanics, 198904, Stary Peterhof, Bibliotechnaya pl.,2, Russia and Statistique et Probabilités F.R.E. CNRS 2222 Bât. M2, Université des Sciences et Technologies de Lille F-59655 Villeneuve d'Ascq Cedex France
Abstract We give an account of strong approximation theorems for sums of indePn pendent random variables Sn = 1 Xj . This type of theorems provides a joint construction of Sn and approximating Gaussian variables with the same covariance, while approximation rate depends on the moment properties of Xj . In view of rich application eld, our goal was to provide an interested reader with a suitable "approximation toolbox" rather than to ght in the depths of proofs and constructions. Starting from classical results of Komlós, Major, and Tusnády, we discuss less known results of Einmahl, Sakhanenko and Zaitsev concerning vector and non-uniformly distributed variables.
Résumé Nous présentons les théorèmes d'approximation Pn forte pour les sommes de variables aléatoires ind pendantes Sn = Xj . Ces théorèmes four1 nissent une construction conjointe sur le même espace probabilisé de Sn et des variables gaussiennes approximantes de même covariance, la vitesse de convergence dépendant des propriétés de moments des Xj . En raison de l'ampleur du champ des applications, notre but a été de fournir au lecteur intéressé un outillage opérationnel plutôt que de ferrailler avec les ultimes détails des preuves et constructions. A partir des résultats classiques de Komlós, Major et Tusnády, nous discutons ceux moins connus de Einmahl, Sakhanenko et Zaitsev relatifs aux vecteurs et variables aléatoires non identiquement distribués. ∗ PUB.
IRMA, LILLE 2000, Vol.53, N 13. Research supported by Russian Foundation of
Basic Research, grant N. 00-15-96019 and Universities of Russia N. 99-26-11.
1
1. Introduction Central limit theorem states that the distribution of sums of independent or weakly dependent random variables is close to the normal distribution. Functional limit theorem goes further and states that the distribution of the process of sequential sums in appropriate functional space is close the distribution of the Wiener process. In both cases only the similarity of the distributions is stated, not for the stochastic objects themselves. The same eect of normalization of the sums may be expressed dierently - by a construction of the two mutually close objects on the same probability
X = {X1 , . . . , Xj , . . .} be a sequence of Y = {Y1 , . . . , Yj , . . .} the corresponding sequence
space. Let, for example,
independent
variables and
of Gaussian
Yj are the Xj ). We try to construct on a probability space the sequences ˜ = {X ˜1, . . . , X ˜ j , . . .} and Y˜ = {Y˜1 , . . . , Y˜j , . . .} equidistributed with X and Y , X variables (which means that the expectation and the variance of
same as for
respectively so that the random variable
X k X k ˜ ˜ ˜ ˜ ∆n (X, Y ) = max Xj − Yj 1≤k≤n j=1 j=1
(1.1)
would be as small as possible. We can understand this goal in two ways. a) in terms of inequality, trying to provide small probability of for
˜ Y˜ ) ∆n (X,
to take big values. b) in terms of almost sure convergence - showing that at
˜ Y˜ ) ∆n (X,
n → ∞ the sequence
increases not faster than a known deterministic function dened by
moment characteristics of the sequence
X.
The statements of type (a) are fundamental while the statements of type (b) can be easily derived from them and are widely used in the dierent domains of probability and statistics. Since the statements of type (b) describe mutual approximation of two sequences with probability one, by analogy with the strong law of large numbers the whole research domain got the name of
strong approximation theory.
The proofs of the statements (a) and (b) are usually very dicult but a remarkably wide class of useful corollaries follow from them quite straightforward. Therefore, the instruments of strong approximation should be at hand of everybody concerned with theoretical research related to the sums of independent variables. The rst precise method of the strong approximation (so called method of embedding in Wiener process) was proposed by A.V. Skorokhod. During some time, his estimates were considered as the optimal ones.
It was discovered,
eventually that one can obtain much better results for sums of independent variables (while for martingale sequences Skorokhod estimates are really optimal). The optimal estimate for approximation of sums of independent identically distributed sums of random variables was discovered by Hungarian mathematicians J. Komlós, P. Major, G. Tusnády (1975) who suggested a method of dyadic ap-
2
proximation (called more often KMT-construction by authors' names, or Hungarian construction). We present below their main result. Further progress is primarily related to the works of A.I. Sakhanenko (sums of the independent non-identically distributed variables), U. Einmahl and A.Yu. Zaitsev (sums of independent vectors). The contents of these lecture notes partially represent a simplied contents of what during many years explained me A.Yu. Zaitsev. The introduction of his unnished monograph (1995) was also used here.
Without this friendly
assistance I could hardly prepare these lectures.
2. Theorem of Komlós, P. Major, G. Tusnády The next theorem presents an optimal estimate in strong approximation of sums of independent identically distributed random variables.
Theorem 2.1. (KMT inequality) Let X = {X1 , . . . , Xj , . . .} be a sequence of independent identically distributed random variables having a nite exponential moment, i.e. for some z > 0 it holds E exp{z|Xj |} < ∞. Then there exist positive constants C1 , C2 depending on the common distribution F of the variables Xj , such that for every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and a sequence of independent Gaussian variables {Y˜1 , . . . , Y˜n } having the same expectations and variances, on a common ˜ Y˜ ) dened in (1.1) would satisfy probability space so that the dierence ∆n (X,
n o ˜ Y˜ ) ≤ 1 + C2 n1/2 . E exp C1 ∆n (X,
Corollary 2.2.
(2.1)
Under assumptions of Theorem 2.1 we have for every
n o ˜ Y˜ ) ≥ x ≤ exp {−C1 x} (1 + C2 n1/2 ). P ∆n (X,
x>0 (2.2)
This inequality immediately follows from (2.1) by application of exponential Chebyshev inequality. The KMT inequality is more often referred in this form. However, the statement (2.1) proposed by Sakhanenko looks more natural since it does not contain unnecessary parameter
x.
We will discuss the dependence of parameters
C1 , C2
on the distribution
F
later (see section 3) while investigating strong approximation of sums of nonidentically distributed variables. Note that the construction provided by Theorem 2.1 concerns of variables.
nite number
However, one can easily derive from this result an information
about strong approximation of innite sequence.
3
Corollary 2.3.
Under assumptions of Theorem 2.1 one can construct on a
common probability space an innite sequence tributed with
X
˜ = {X ˜1, . . . , X ˜ j . . .} equidisX {Y˜1 , . . . , Y˜j . . .}
and a sequence of independent Gaussian variables
with the same expectations and variances such that with probability one it holds
n X
˜j − X
j=1
n X
Y˜j = O(log n).
(2.3)
j=1
m
2 and Nm = P Proof of the Corollary. For m = 1, 2, . . . let nm = 2 k≤m nk , and also N0 = 0. According to Corollary 2.2, for every m we construct ˜ n(m) ˜ (m) , . . . , X ˜ (m) = {X on appropriate probability space Ωm the sequences X m } 1 ˜ (m) , . . . , Y˜n(m) and {Y } satisfying the estimate (2.2), i.e. m 1
n o ˜ (m) , Y˜ (m) ) ≥ x ≤ exp {−C1 x} (1 + C2 n1/2 P ∆nm (X m ). In particular, the probability series
X
n o X ˜ (m) , Y˜ (m) ) ≥ A log nm ≤ P ∆nm (X exp {−C1 A log nm } (1 + C2 n1/2 m )
m
m
(2.4) converges if
C1 A > 1/2. Ω = ˜ j , Y˜j , Nm−1 < j ≤ Nm as X
Now we transfer all constructions on the common probability space
Q
m Ωm dening for follows,
ω = (ω1 , ω2 , . . .)
the variables
˜ j (ω) = X ˜ (m) X j−Nm−1 , Note that for every
m
(m) Y˜j (ω) = Y˜j−Nm−1 .
it holds
˜ Y˜ ) ≤ ∆Nm (X,
m X
˜ (m) , Y˜ (m) ). ∆nk (X
k=1 Taking into account the estimate (2.4) and Borel-Cantelli lemma we have almost surely
˜ Y˜ ) = O ∆Nm (X,
m X
! A log nk
= O(2m ) = O(log Nm ).
k=1
˜ Y˜ ) is non-decreasing and for every n ∈ (Nm−1 , Nm ] Since the sequence ∆n (X, it holds 2 log n ≥ log Nm , we can pass from the subsequence {∆Nm } to entire sequence {∆n } and obtain the required estimate (2.3). It is not dicult to show that the estimate (2.3) is optimal. In fact, let us consider the sequence of independent random variables with symmetric exponential distribution, i.e.
P{|Xj | ≥ r} = exp{−r}. Then X X P{|Xj | ≥ a log j} = j −a j
j
4
for
a>0
the series
converges i
a > 1.
By Borel-Cantelli lemma we have
lim sup |Xj |/ log j = 1 j→∞ almost surely. On the other hand, it is well known that for Gaussian variables
√ Y˜j = O( log j) = o(log j).
Therefore, by any construction of
˜ Y˜ X
we have
˜ Y˜ )/ log n ≥ 1/2 lim sup ∆n (X, j→∞ with probability one, and the optimality of (2.3) follows. In fact, the optimality of (2.3) holds in much stronger and even surprising sense: if in (2.3) one can replace
O(log j)
by
o(log j),
then the variables
Xj
are
themselves normal, as shown by Bártfai (1966). The strength of KMT estimate (2.3) becomes even more obvious, if we recall that the preceding technique of Skorokhod yielded similar asymptotics only of the order
O(n1/4 ).
It might seem from the rst glance that the nitness of the exponential moment assumed in Theorem 2.1 severely restricts applications of this result. In fact, truncation manipulations enable to deduce from Theorem 2.1 the similar approximation estimates for the variables obeying much less restrictive moment assumptions.
Theorem 2.4. Let X = {X1 , . . . , Xj , . . .} be a sequence of independent identically distributed variables with nite absolute moment of the order p > 2, i.e. ˜ = {X ˜1, . . . , X ˜ j . . .} equidisE|Xj |p < ∞. Than we can construct a sequence X tributed with X and a sequence of independent Gaussian variables {Y˜1 , . . . , Y˜j . . .} with the same expectations and variances on a common probability space so that it holds with probability one, n X j=1
˜j − X
n X
Y˜j = o(n1/p ).
(2.5)
j=1
We deduce this theorem from more general results in the next section. Of special interest is Major's (1979) result which considers the case
p=2
(not covered by Theorem 2.4), the most interesting from the point of view of CLT's conditions.
Theorem 2.5. Let X = {X1 , . . . , Xj , . . .} be a sequence of independent identically distributed random variables with zero means and unit variances. Then we can construct a sequence X˜ = {X˜ 1 , . . . , X˜ j . . .} equidistributed with X and a sequence {Y˜1 , . . . , Y˜j . . .} of independent Gaussian variables with zero means and variances
EY˜j2 = Var(Xj 1|Xj |≤n ) → 1, 5
on a common probability space so that n X
˜j − X
j=1
Dividing (2.6) by
n1/2
n X
Y˜j = o(n1/2 ).
(2.6)
j=1
and considering the distributions of the normalized
sums we obtain, as a trivial corollary, Lévy's CLT. It is interesting to notice that without changing the variances one can only obtain
n X
˜j − X
j=1
n X
Y˜j = o((n log log n)1/2 )
j=1
( Strassen (1964)) and this estimate is optimal.
Remark.
There is another KMT result similar to Theorem 2.1. It is mostly
important in mathematical statistics and concerns the strong approximation of empirical distribution function.
3. Strong approximation of non-identically distributed variables In this section we consider the analogues of KMT-results on the strong approximation of sums of independent random variables having
dierent distribu-
tions. It is intuitively clear that one should assume a certain uniform boundedness, or at least, their uniform closeness to the class of Gaussian variables. One can understand this uniform boundedness very dierently - in the language of usual moments (Bernstein condition), in the language of exponential moments (Sakhanenko), or in the language of characteristic functions (Zaitsev).
Our
exposition concerns all involved parameters and their relations.
Cramér condition.
Random variable
X
has a nite exponential moment, i.e. for some
satises Cramér's condition, if it
h>0
it holds
E exp{h|X|} < ∞. Respectively, we introduce
Cramér's parameter
h(X) = sup{h : E exp{h|X|} < ∞} as a characteristic of concentration of the distribution of r.v.
X.
In what fol-
lows, we see that the most interesting strong approximation results hold under Cramér's condition but Cramér's parameter is inappropriate tool for quantitative estimates.
6
Sakhanenko parameter. Let
X
be a random variable with zero mean and satisfying Cramér condition.
Dene Sakhanenko parameter, following his work (Sakhanenko (1984)). Let
λ(X) = sup λ : λE|X|3 exp{λ|X|} ≤ EX 2 . E|X|3 exp{λ|X|} on the right-hand side is nite for λ < h(X), so that 0 < λ(X) < ∞ ( with exception of degenerate case X = 0, when λ(X) = ∞). Obviously, λ(X) depends only on the distribution of X . Note also −1 that λ(·) is homogeneous functional of degree −1, i.e. λ(cX) = |c| λ(X). If the variable X is bounded, it is easy to estimate λ(X). Indeed, let |X| ≤ a. For all x ∈ (0, a) write an inequality
The expression
λx3 eλx = λxeλx x2 ≤ λaeλa x2 . Letting
λ = 1/2a,
we obtain
λ E|X|3 eλX ≤ Therefore,
λ(X) ≥ 1/2a
We can also connect
p e/4 EX 2 < EX 2 .
λ(x)−1 ≤ 2a. variance of X . Indeed,
or, equivalently,
λ(X)
with
write
(EX 2 )3/2 ≤ E|X|3 ≤ E|X|3 eλ|X| ≤ λ−1 EX 2 . Hence
VarX = EX 2 ≤ λ(X)−2 .
(3.1)
Now we connect the Sakhanenko parameter with another, more classical object,
Bernstein parameter b(X) which is also dened for random variables with zero mean but may be expressed in terms of classical polynomial moments of
b(X) = inf{τ : |EX m | ≤
m! m−2 τ Var(X), 2
m = 3, 4, . . .}
|X| ≤ a, then b(X) ≤ a. Bernstein parameter controls directly the |X|. Show now that we can also control odd absolute moments. τ = b(X). For every odd number m ≥ 3 we have, by Hölder inequality,
Note that if
even moments of Denote
X:
E|X|m
= E(|X|(m−1)/2 |X|(m+1)/2 ) ≤ [EX m−1 ]1/2 [EX m+1 ]1/2 1/2 (m − 1)! m−3 (m + 1)! m−1 ≤ τ Var(X) · τ Var(X) 2 2 m! m−2 = τ Var(X)[(m + 1)/m]1/2 2 m! ≤ √ τ m−2 Var(X). 3
7
Therefore, for every real
E|X|3 e|uX|
∞ X
=
E|X|m+3 |u|m /m!
m=0 ∞ X
(m + 3)! Var(X)τ m+1 |u|m √ 3 m! m=0
≤
∞ Var(X)τ X √ (m + 3)(m + 2)(m + 1)(τ |u|)m = 3 m=0
=
=
u
Var(X)x √ 6(1 − x)−1 + 18x(1 − x)−2 + 18x2 (1 − x)−3 + 6x3 (1 − x)−4 , 3|u|
where
x = τ |u|.
Letting
u = (7τ )−1 , x = 1/7,
we have
E|X|3 e|uX| ≤ 0.92|u|−1 Var(X), λ(X) ≥ (7τ )−1
i.e.
or, equivalently,
almost obvious. Let
λ = λ(X).
λ(X)−1 ≤ 7b(X). m≥3
Then for every
The inverse inequality is one has
E|X|3 (λ|X|)m−3 ≤ E|X|3 e|λX| ≤ λ−1 Var(X), (m − 3)! which implies
E|X|m ≤ (m − 3)!λ2−m Var(X) ≤
m! 2−m λ Var(X) 2
and
b(X) ≤ λ(X)−1 .
Discuss now briey the relations between Bernstein (or equivalent, as we have seen, Sakhanenko) parameter and Cramér parameter.
We have an easy
inequality
b(X) ≥ h(X)−1 . In fact, if
h b(X) < 1,
Eeh|X| ≤
X hm |EX m | h2 EX 2 X hm b(X)m−2 VarX ≤ 1+h E|X|+ + < ∞. m! 2 2
then
m≥0
m≥3
In opposite direction one can show only that
b(X) ≤ 2 inf
h>0
Eeh|X| , h3 VarX
which means that Bernstein and Sakhanenko parameters are nite i Cramér parameter is. But the following example shows that there is no upper estimate
8
b(X) via h(X). Let R ≥ 0 and dene the XR by its density p(x) = 12 eR−|x| 1[R,∞) (|x|). of
b(XR ) ≥
distribution of a random variable Then
h(XR ) = 1
but
4 EXR R2 ∼ → ∞ (R → ∞). 2 12 EXR 12
The following theorem presents the optimal bound of strong approximation of sums of independent variables in terms of Sakhanenko parameter.
Theorem 3.1. (Sakhanenko exponential inequality) Let X = {Xj } be a sequence of independent random variables with nite h(Xj ). Then for every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n }, equidistributed with {X1 , . . . , Xn }, and the sequence of Gaussian variables {Y˜1 , . . . , Y˜n }, with the same expectations and variances on the common probability space so that for the dierence ˜ Y˜ ) dened in (1.1) one has ∆n (X,
n o ˜ Y˜ ) ≤ 1 + λBn , E exp C3 λ∆n (X,
where C3 is a numeric constant, λ = inf j≤n λ(Xj −EXj ), and Bn2 = Corollary 3.2.
(3.2) Pn
j=1
VarXj .
KMT inequality (2.1) obviously follows from (3.2). More-
over, one can let therein
C1 = C3 λ(X1 ), C2 = λ(X1 )(VarX1 )1/2 .
Explicit dependence of all parameters in (3.2) on the distribution enables its application to the truncated random variables when the latter do not have nite exponential moments. As a result one can obtain some versions of (3.2), though obviously more modest. For example, for the variables with moments of the order
p > 2,
the following analogue of Theorem 3.1 holds.
Theorem 3.3. (Polynomial Sakhanenko inequality) Let X = {Xj } be a sequence of independent random variables with nite moments of order p > 2. Then for every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and the sequence of independent Gaussian variables {Y˜1 , . . . , Y˜n } with the same averages and variances on a common probability ˜ Y˜ ) dened in (1.1) one has space so that for the dierence ∆n (X,
˜ Y˜ )p ≤ C(p) E∆n (X,
n X
E|Xj − EXj |p ,
(3.3)
j=1
and the constant C(p) depends only on p. There exists such constant C 0 (p) that the construction from preceding theorem is possible so that (3.3) holds for all n with the change of C(p) with C 0 (p). Proposition 3.4.
9
Proof. Assume rst that there is a partition of the set of all positive integers in blocks
{Nm }
and
X
E|Xj − EXj |p ≤ Tm .
j∈Nm Then, by Sakhanenko's Theorem 3.3 we can construct for each block approximating Gaussian sequences
∆m
{Y˜j , j ∈ Nm }
so that the corresponding distances
satisfy
E∆pm ≤ C(p)Tm . Without loss of generality, we may assume that all constructions are realized on a common probability space. Then we have inequalities
1/p E∆m ≤ C(p)1/p Tm , Furthermore, letting
˜ m |p E|∆ ˜ 2m E∆ Let
S=
P
m
2/p E∆2m ≤ C(p)2/p Tm .
˜ m = ∆m − E∆m , ∆
we have similar relations
≤ E∆pm + (E∆m )p ≤ 2E∆pm ≤ 2 C(p)Tm , 2/p ≤ E∆2m ≤ C(p)2/p Tm .
∆m , S˜ = S − ES.
By Rosenthal inequality (see Petrov, Theorem
3.5.19) we have
X ˜ p ≤ cR (p) ˜ m |p + E|S| E|∆
X
m
m
cR (p)C(p) 2
X
X
Tm +
m with some constant
cR (p)
X
!p/2 2/p Tm
m
depending on
ES =
!p/2 ˜ 2m ≤ E∆
p
only. We also have
E∆m ≤ C(p)1/p
m
X
1/p Tm .
m
Therefore,
˜ p + (ES)p ] ≤ ES p = E(S˜ + ES)p ≤ 2p [E|S| !p/2 !p X X X 2/p 1/p . 2p C(p) 2cR (p) Tm + cR (p) Tm + Tm m
m
(3.4)
m
Now we prove Proposition 3.4. Without loss of generality, we may let
EXj = 0.
Consider two cases.
p j E|Xj | = by the formula
a) Let
m < ∞}
P
Nm
∞.
In this case we construct the blocks
X E|Xj |p ≤ 2m . = n : 2m−1 < j≤n
10
{Nm , −∞ <
Accordingly one may let
Tm = 2m .
M X
For all
˜ Y˜ ) ≤ ∆n (X,
and all
n ∈ NM
it holds
∆m .
m≤M
Tm
By substitution of the values of
in (3.4), we have
p
X
˜ Y˜ )p ≤ E E∆n (X,
∆m ≤
m≤M p+1+M
2
2/p
C(p) 2cR (p) + cR (p) 2
while
X
−p/2 −p 1/p −1 + 2 −1 ,
E|Xj |p > 2M −1
j≤n and relation (3.3) is established.
p j E|Xj | < ∞. The simple construction from the previous case does not work since one of the classes would contain innite number of indices b) Let
n.
T =
P
Now we construct two types of blocks. As in previous case, for
Nm
m<0
let
X = n : 2m−1 T < E|Xj |p ≤ 2m T j≤n
and
m
Tm = 2 T . m > 0 let
For
Nm
X = n : 2−m T < E|Xj |p ≤ 21−m T j≥n
and
Tm = 2−m T .
By this construction the only index which is not covered is
n0 = inf{n :
X
E|Xj |p > T /2}.
j≥n It will become the contents of the block For indices
∪m≥0 Nm
n ∈ ∪m<0 Nm
N0
and we let
T0 = T .
the estimates of case a) still work.
we have by (3.4)
˜ Y˜ )p ≤ E E∆n (X,
∞ X
!p ∆m
m=−∞ while
X
E|Xj |p > T /2
j≤n and (3.3) is again established.
11
≤ C 0 (p)T,
For
n ∈
Next, we show how to use the estimate (3.3) for getting rather arbitrary approximation rates. Here is the result for the variables with known estimates of polynomial moments.
Theorem 3.5. (Q.Shao (1995) Let X = {Xj } be a sequence of random variables with zero averages, Hj % ∞ be a positive sequence, and for some
p>2
∞ X E|Xj |p j=1
Hjp
< ∞.
(3.5)
Then one can construct a sequence X˜ = {X˜ j } equidistributed with X and a Gaussian sequence Y˜ = {Y˜j } of independent variables satisfying EY˜j = 0 and VarY˜j = VarXj on a common probability space so that for the dierence ˜ Y˜ ) dened in (1.1) it holds ∆n (X, ˜ Y˜ ) = o(Hn ). ∆n (X,
The next theorem shows how one can obtain KMT-type estimates for the case when the tails of the variables are uniformly bounded.
In the case of
identically distributed variables this theorem immediately implies Theorem 2.5, and after some extra work, Theorem 2.4.
Theorem 3.6. (Q.Shao, U.Einmahl) Let Z be a positive random variable, let X = {Xj } be a sequence of independent variables, and G : R+ → R+ , be a function such that following conditions hold. a) For some α > 1 the function x → G(x)/xα is non-decreasing; b) For some q > 0 the function x → G(x)/xq is non-increasing; c) For some c > 0 and all r ≥ 0 it holds
sup P{|Xj | ≥ r} ≤ c P{Z ≥ r}; j
d) EG(Z) < ∞. Then one can construct a sequence X˜ = {X˜ j } equidistributed with X and sequence of independent Gaussian variables Y˜ = {Y˜j } with expectations EXj and variances Var Y˜j2 = Var(Xj 1G(|Xj |)≤j ),
on the common probability space so that almost surely n X j=1
˜j − X
n X
Y˜j = o(G−1 (n)).
j=1
12
(3.6)
Proof of Theorem 3.5. consider a sequence
Hj0
First, we take something slightly less than
Hj0 = o(Hj )
such that
∞ X E|Xj |p
Let
Xj0 = Xj /Hj0
< ∞.
(Hj0 )p
j=1
and construct on a common probability space
responding Gaussian sequence
Y˜ 0
i.e.
˜ 0 , Y˜ 0 )p ≤ C(p) E∆n (X
so that for every
n
n X
n X E|Xj |p
E|Xj0 |p = C(p)
j=1
(Hj0 )p
Since the right-hand side of (3.7) is uniformly bounded over
∞ X E|Xj |p j=1
where
˜ 0 , Y˜ 0 ) = supn ∆n (X ˜ 0 , Y˜ 0 ). ∆∞ (X Sn0 =
n X
˜ 0 , Y˜ 0 ) |Sn0 | ≤ ∆n (X
and a cor-
n,
.
(3.7)
we have
< ∞,
Therefore, the sequence
˜0 − X j
j=1 which satises
(Hj0 )p
˜0 X
inequality (3.3) holds,
j=1
˜ 0 , Y˜ 0 )p ≤ C(p) E∆∞ (X
Hj :
yet still
n X
Y˜j0
j=1
is a.s. bounded.
On the same probability space, we bring variables back to the right scale by letting
0 ˜0 ˜j = H 0 X ˜0 ˜ X j j , Yj = Hj Yj .
dierences.
k X
˜j − X
j=1 k X
k X j=1
We obtain the following estimates for
Y˜j =
k X
˜ j0 − Y˜j0 ) = Hj0 (X
j=1
0 Hj0 (Sj0 − Sj−1 ) = Hk0 Sk0 −
j=1
k−1 X
0 Sj0 (Hj+1 − Hj0 ).
j=1
Hence,
X k X k ˜j − ˜j ≤ 2Hk0 sup |Sj0 | = O(Hk0 ) = o(Hk ). X Y 1≤j<∞ j=1 j=1 Since the variables
˜j X
and
Y˜j
have the prescribed distribution and are indepen-
dent, the proof of Theorem 3.5 is complete.
Proof of Theorem 3.6.
Let
EXj = 0.
Dene
Hj = G−1 (j)
and split
Xj
in three variables,
Xj = Xj 1|Xj |≤Hj − E[Xj 1|Xj |≤Hj ] + Xj 1|Xj |>Hj − E[Xj 1|Xj |>Hj ]
13
(3.8)
In fact, only the rst term merits an approximation (using Theorem 3.5). We check that assumption (3.5) holds for the variables
p>q
Let us verify that for every
Xj 1|Xj |≤Hj −E[Xj 1|Xj |≤Hj ].
it holds
∞ X E|Xj |p 1|Xj |≤Hj
Hjp
j=1
< ∞.
(3.9)
Using integration by parts and assumption (c) we obtain that for every and every
u>0
j
E|Xj |p 1|Xj |≤u =
Z
u
pxp−1 P{|Xj | ≥ v}dv − up P{|Xj > u} ≤
0
Z c
u
pxp−1 P{Z ≥ v}dv = c EZ p 1Z≤u + c up P{Z > u}.
0 Applying this estimate to
u = Hj ,
we need next to estimate the series
∞ X EZ p 1Z≤Hj
Hjp
j=1
∞ X
,
P{Z > Hj }.
j=1
For the second series, assumption (d) yields
∞ X
[G(Z)]
P{Z ≥ Hj } = E
j=1
X
1 ≤ EG(Z) < ∞.
(3.10)
j=1
For the rst series
∞ X EZ p 1Z≤Hj
Hjp
j=1 where
N = N (Z) =
≤ EZ p
∞ X
Hj−p ,
j=N (Z)
[G(Z)] + 1, G(Z),
if if
G(Z) G(Z)
is not integer; is integer.
Furthermore, it follows from assumption (b) of our Theorem that the function
y → G−1 (y)y −1/q
is non-decreasing. Hence, for every
j≥N
Hj G−1 (j) G−1 (N ) = ≥ , j 1/q j 1/q N 1/q j 1/q G−1 (N ) , N 1/q ∞ X ≤ G−1 (N )−p N p/q j −p/q ≤ Hj ≥
∞ X
Hj−p
j=N
j=N
c(p, q)G−1 (N )−p N ≤ c(p, q)Z −p N. 14
we have
G−1 (N (Z)) ≥ Z ).
( The last passage makes use of the inequality
∞ X EZ p 1Z≤Hj
Hjp
j=1
Hence,
≤ c(p, q)EN (Z) ≤ c(p, q)(EG(Z) + 1) < ∞.
Therefore, condition (3.9) is veried.
Since for every random variable
V
by
Jensen inequality
E|V − EV |p ≤ 2p E max{|V |p ; |EV |p } ≤ 2p (E|V |p + |EV |p ) ≤ 2p+1 E|V |p , we observe that the rst terms in (3.8) obey the assumption of Theorem 3.5; hence, they admit a Gaussian approximation with required error
o(Hn ) =
o(G−1 (n)). Now we prove that the second and the third term in (3.8) are negligible. Indeed, for the second term we infer from (3.10) that
X
X
P{|Xj | > Hj } ≤ c
j
P{Z ≥ Hj } < ∞.
j
Therefore, by Borel-Cantelli lemma, the second term eventually vanishes. For the estimate of the third term we just use the well known Kronecker lemma. It states that for every sequence convergence
xj j cj
P
<∞
yields
1 cn
P
j≤n
xj and every positive sequence cn % ∞ xj → 0. Therefore, while checking
1 X E[Xj 1|Xj |>j ] → 0, Hn j≤n
which would kill the third term, it suces to show that the sum
∞ X E|Xj |1|Xj |>Hj
Hj
j=1 is nite. Since for every
u>0
and every
j Z
∞
E|Xj |1|Xj |>u = u P{|Xj | > u} +
P{|Xj | > v}dv ≤ u
Z
∞
c u P{Z > u} + c
P{Z > v}dv = c EZ1Z>u , u
our problem reduces to the study of expectation
E Z
X
Hj−1 .
j
G−1 (x)x−1/α is j ≤ m we have
non-decreasing. Hence, for
m = m(Z) = [G(Z)] + 1
Hj G−1 (j) G−1 (m) Hm = ≥ = 1/a , j 1/a j 1/a m1/a m 15
x→
and every
Hj ≥ α > 1,
and, by
X
Hm j 1/a , m1/a
−1 Hj−1 ≤ m1/α Hm
j≤m
X
−1 −1 j −1/α ≤ m1/α Hm c(α)m1−1/α = c(α) m Hm .
j≤m
Using inequality
Z ≤ Hm(Z) , we obtain X X Hj−1 ≤ E Z Hj−1 ≤ E Z j
j≤m(Z)
−1 c(α)E Zm(Z)Hm(Z) ≤ c(α)Em(Z) ≤ c(α)(EG(Z) + 1) < ∞. This is sucient for application of Kronecker lemma and killing the third term in (3.8).
Proof of Theorem 2.4.
Without loss of generality, we may assume that
Xj have zero expectations and unit variG(x) = xp we construct Gaussian variables Y˜j o(n1/p ), although their variances are not unit, as
our identically distributed variables ances. Applying Theorem 3.6 to providing required error order
required. To repair it, we just let
Yj = (VarY˜j )−1/2 Y˜j . It is sucient to check that the dierences
n−1/p
n X
Zj = Yj − Y˜j
satisfy the estimate
Zj → 0.
j=1 By Kolmogorov's law of large numbers, it suces to check that the variance series converges, namely
∞ X
j −2/p Var(Zj ) < ∞.
(3.11)
j=1 Let
vj = VarYj
and write down the identity
−1/2
Zj = (vj
− 1)Y˜j =
1 − vj 1/2 −1/2 vj (vj
+ 1)
Y˜j .
By our construction from the proof of Theorem 3.3 we have
1 − vj
Hj = j 1/p
= EXj2 − EXj2 1|Xj |≤Hj + (EXj 1|Xj |≤Hj )2 = EXj2 1|Xj |>Hj + (EXj 1|Xj |>Hj )2 ≤ 2EXj2 1|Xj |>Hj = 2σj2 & 0. 16
and
Moreover,
∞ X
σj2 j −2/p = E X12
j=1
∞ X
1|X1 |>j 1/p j −2/p ≤
j=1
c(p)E X12 [|X1 |p ]1−2/p ≤ c(p) E|X1 |p < ∞. Since
{σj2 }
is monotone sequence, we infer
−1 n X 2−p j −2/p ≤ c n p . σn2 ≤ c(p) E|X1 |p j=1
p > 2,
Finally, using
∞ X
j
−2/p
we have
Var(Zj ) ≤
j=1
∞ X j=1
2−p
∞ X 2−2p (cj p )2 ≤c j p < ∞, (1 + o(1))j 2/p j=1
and (3.11) is established.
Zaitsev parameter.
Let
X
be a random variable.
Consider its complex
exponential moments,
Λ(u) = E exp{uX}, and dene
Zaitsev parameter1
u ∈ C,
by
τz (X) = inf{τ : |(log Λ)000 (u)| ≤ τ Var(X),
∀u : |u| ≤ τ −1 }.
Λ(·) is dened and dierentiable in the interior of {u : |u| ≤ τ −1 }. Obviously, when τz (X) ≥ τ , the real exponential mo−1 −1 ments E exp{λX}, λ ∈ (−τ , τ )) are also nite. Zaitsev parameter τz (X) shows how close is the distribution of r.v. X to the class of Gaussian distributions. Obviously, the condition τz (X) = 0 is equivalent to the fact that X is (3) Gaussian. Indeed, (log Λ) (·) = 0 means that log Λ is the polynomial of second degree. Unlike Sakhanenko parameter, if τz (X) is small, it does not yet mean that X is small (consider e.g. Gaussian variable with big variance). The functional τz (X) is homogeneous of degree 1, i.e. τz (cX) = |c|τz (X). We mean that the function
the circle
There is also a remarkable agreement with summation of independent variables (in other words with the convolution of distributions): for all independent variables
X, Y
it holds
τz (X + Y ) ≤ max{τz (X), τz (Y )}.
The next theorem presents an optimal bound for strong approximation in terms of Zaitsev parameter.
Theorem 3.7. (One-dimensional Zaitsev inequality) Let X = {Xj } be a sequence of independent random variables with zero expectations. Then for 1 A.Yu.
Zaitsev (1986)
17
every n one can construct a sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and the sequence {Y˜1 , . . . , Y˜n } of Gaussian variables with the same (zero) expectations and variances, on a common probability space, so that for ˜ Y˜ ) dened in (1.1), it holds the dierence ∆n (X, n o ˜ Y˜ ) ≤ exp{C2 log+ (n/τ 2 )}, E exp C1 τ −1 ∆n (X,
(3.12)
with absolute numeric constants C1 , C2 , with τ = τn = supj≤n τz (Xj ), and log+ b = max{1, log v}. x>0 n o ˜ Y˜ ) ≥ C2 τ log+ (n/τ 2 )/C1 + x ≤ exp {−C1 x/τ } P ∆n (X,
We obtain immediately the exponential inequality - for every
it holds
(3.13)
The KMT approximation of type (2.3) also follows, if for innite sequence it holds
sup τz (Xj ) < ∞. 1≤j<∞
Links between Zaitsev and Sakhanenko parameters. Zaitsev parameter via Sakhanenko parameter. Let
X
First, we bound
be a random variable and
λ = λ(X) < ∞. Let D = Var(X) and Λ(u) = EeuX . We have already shown −2 that D ≤ λ(X) . In order to evaluate the parameter τz (X), we have to treat the value of
(log Λ)000 (u) =
Λ000 3Λ00 Λ0 2(Λ0 )3 (u) − (u) + (u) 2 Λ Λ Λ3
u ∈ C. Let |u| ≤ cλ. We give Λ0 , Λ00 , Λ000 appearing in (3.14).
for not very big derivatives
now some bounds for
(3.14) Λ
and its
0) It follows from the expansion
Λ(u) = EeuX = E(1 + uX + u2 X 2 /2 ± (|uX|3 /6 + ...)) that
|Λ(u)| ≥ 1 − |u|2 D/2 − |u|3 E(|X|3 e|uX| )/6 ≥ 1 − |u|2 D/2 − |u|3 λ−1 D/6 ≥ 1 − (c2 /2 + c3 /6) = c0 . 1) It follows from the expansion
Λ0 (u) = EXeuX = E(X + uX 2 ± |X|(|uX|2 /2 + ...)) that
|Λ0 (u)| ≤ D|u| + |u|2 E|X|3 e|uX| /2 ≤ D|u| + |u|2 λ−1 D/2 ≤ D|u|(1 + c/2). 2) It follows from the expansion
Λ00 (u) = EX 2 euX = E(X 2 ± X 2 (|uX| + ...)) 18
that
|Λ00 (u)| ≤ D + |u|E|X|3 e|uX| ≤ D + |u|λ−1 D ≤ D(1 + c). 3) We infer from representation
Λ000 (u) = EX 3 euX
that
|Λ000 (u)| ≤ E|X|3 e|uX| ≤ λ−1 D. Bringing up together all estimates, we infer
λ−1 D 3D(1 + c)D|u|(1 + c/2) 2(D|u|(1 + c/2))3 + + ≤ c0 c20 c30 1 3c(1 + c)(1 + c/2) 2c3 (1 + c/2)3 −1 λ D + + . c0 c20 c30
|(log Λ)000 (u)| ≤
Letting here
c = 1/3
yields
|(log Λ)000 (u)| ≤ 3λ−1 D = c−1 λ−1 D, hence
τz (X) ≤ c−1 λ−1 = 3λ(X)−1 .
Therefore, for every random variable
X
it
holds
τz (X) ≤ 3λ−1 (X). Inverse estimate of Sakhanenko (or, equivalently, of Bernstein) parameter via Zaitsev parameter, needs additional assumption on the variance. Assume
τz (X) ≤ 1, EX = 0 and D = Var(X) ≤ 1. Then we have for u : |u| ≤ 1 the |(log Λ)000 (u)| ≤ D, as well as the initial conditions (log Λ)00 (0) = D, 0 (log Λ) (0) = (log Λ)(0) = 0. Hence, integration yields that
inequality
|(log Λ)0 (u)| ≤ D(|u| + |u|2 /2) and
|(log Λ)(u)| ≤ D(|u|2 /2 + |u|3 /6). We obtain the moment inequality for all real
u,
|EX 2 (euX − 1)| = |Λ00 (u) − Λ00 (0)| ≤
Z
u
|Λ0 (v)|dv =
0
Z
u
|(log Λ)0 (v)Λ(v)|dv ≤
0
Z
u
D(|v| + |v|2 /2) exp{D(|u|2 /2 + |u|3 /6)}dv ≤
0
D(|u|2 /2 + |u|3 /6) exp{2/3} ≤ Letting
u = ±1
2 exp{2/3}D = c D. 3
and averaging we arrive to
X ∞ X e + e−X X 2m+2 c D ≥ EX 2 − 1 = E . 2 (2m)! m=1 For every even moment, we obtain
EX 2m+2 ≤ c(2m)!D. 19
For odd absolute moments, Hölder inequality yields
EX 2m+1
Since for all
23/2 c < 4.
m ≥ 3
it
≤ [EX 2m ]1/2 [EX 2m+2 ]1/2 p ≤ c D(2m − 1)! 2m/(2m − 1) √ ≤ 2cD(2m − 1)!. √ 2c ≤ (23/2 c)m−2 /2, we nally holds
obtain
b(X) ≤
By homogeneity of Zaitsev and Bernstein parameters for all zero
mean random variables, it holds
b(X) ≤ 4 max{τz (X),
p Var(X)}.
4. Strong approximation of sums of multivariate variables In this section we consider the same problem of approximation of sums as-
Rd . Therefore, we use such notions d d as Euclidean norm k · k, scalar product (·, ·) in R and in C . Recall that expecd d d d tation EX ∈ R and covariance operator D = CovX : R → R for R -valued variable X are dened by the relations
suming that the variables
Xj
take values in
(EX, v) = E(X, v), (Dv, w) = Cov((X, v), (X, w)) = E(X − EX, v)(X − EX, w). We denote by
∂v f
the partial derivative of
f
in direction
v ∈ Rd .
Similarly,
∂v2 f
denotes the partial derivative of second order. The results which one can obtain in multivariate case are similar to those one has in one-dimensional case. However, one has to take into consideration a new disturbing factor - the eventual degeneration of covariance operator. This is not so important for the case of identically distributed variables where linear variable change reduces the situation to the investigation of sums of vectors with unit covariances, but may really bring trouble if the we sum up the variables with covariances degenerated in dierent directions. Dene multivariate versions of Cramér condition and Bernstein, Sakhanenko and Zaitsev parameters.
We say that the random variable
Cramér condition, if for some
h>0
X ∈ Rd
satises
it holds
E exp{h||X||} < ∞. As in one-dimensional case, the most interesting results hold when Cramér condition is satised.
Sakhanenko parameter.
Let
X ∈ Rd
be a random variable with zero
mean and satisfying Cramér condition. Dene
Sakhanenko parameter by
λ(X) = sup λ : λ E (X, v)2 |(X, w)| exp{λ|(X, w)|} ≤ E(X, w)2 ,
20
∀v, w ∈ Rd : kvk = kwk = 1 . Bernstein parameter
is dened for the variables with zero mean and ex-
pressed in terms of the moments of r.v.
X:
b(X) = inf{τ : |E(X, v)2 (X, w)m−2 | ≤ ∀v, w ∈ Rd , ||w|| = 1, Zaitsev parameter.
Let
X ∈ R
d
m! m−2 τ E(X, v)2 , 2
∀m = 3, 4, . . . }.
be a random variable.
Consider its
complex exponential moments
Λ(u) = E exp{(u, X)}, and dene
u ∈ Cd ,
Zaitsev parameter by τz (X) = inf τ : ∂w ∂v2 (log Λ)(u)| ≤ τ (CovX v, v), ∀u ∈ Cd , v, w ∈ Rd : |u| ≤ τ −1 , kwk = kvk = 1 .
The relations between the three parameters are the same as in one-dimensional case. Bernstein and Sakhanenko parameters are equivalent, i.e.
[7 λ(X)]−1 ≤ b(X) ≤ λ(X)−1 . One can estimate Zaitsev parameter by Sakhanenko parameter, i.e. it holds for some absolute constant
c
that
τz (X) ≤ c b(X). For getting inverse estimate, one needs to use the maximal eigenvalue operator
Cov(X)
B2
of the
(in one-dimensional case we used the variance).
For the variables with zero mean and some absolute constant
c,
it holds
b(X) ≤ c max{τz (X), B}.
Theorem 4.1. ( Multivariate Zaitsev inequality) Let α, D, β1 , β2 > 0 and let X = {Xj } be a sequence of independent variables wit zero means and covariance operators Dj satisfying the assumption of uniform non-degeneracy,
β1 ||v||2 ≤ D2 (Dj v, v) ≤ β2 ||v||2 , ∀v ∈ Rd .
(4.1)
Then one can construct for every n the sequence X˜ = {X˜ 1 , . . . , X˜ n } equidistributed with {X1 , . . . , Xn } and the sequence of independent Gaussian variables {Y˜1 , . . . , Y˜n } with zero means and covariances Dj on a common probability space so that for the dierence
X k X
k ˜ ˜ ˜ ˜
Yj Xj − ∆n (X, Y ) = max
1≤k≤n
j=1 j=1 21
it holds
( E exp
˜ Y˜ ) C1 D∆n (X, τ d9/2 log+ d
) ≤ exp{C2 d3+α log+ (n/τ 2 )},
(4.2)
where C1 , C2 are some constants depending on α, β1 , β2 and τ = max{1, τz (X1 ), . . . , τz (Xn )}
. Corollary 4.2.
Under assumptions of the theorem the exponential inequal-
ity holds - for every
x>0
we have
n o ˜ Y˜ ) ≥ C2 τ d15/2+α log+ d log+ (n/τ 2 ) + x ≤ exp − P C1 ∆n (X,
Corollary 4.3.
x 9/2 τ d log+ d
If the covariance operators are uniformly bounded in the
sense of (4.1), and
sup τz (Xj ) < ∞, 1≤j<∞ then the KMT-type approximation for innite sequence holds,
n X
˜j − X
j=1
n X
Y˜j = O(log n).
j=1
In particular, this is true for the sums of identically distributed
Rd -valued
ran-
dom variables, satisfying Cramér condition.
Remark 4.4.
In view of the relations between Bernstein, Sakhanenko, and
Zaitsev parameters which we mentioned above, one can replace or by
λ(Xj )−1
τz (Xj ) by b(Xj )
in the statement of the theorem.
Remark 4.5.
On may meet some problems in application of the theorem
while checking the uniform bound (4.1). In this direction, the following gener-
2 Split the segment of integers
alization of Theorem 4.1 is of interest. consecutive blocks
N1 , . . . , Nl
covariances of blockwise sums,
β1 ||v||2 ≤ D2
X
(Dj v, v) ≤ β2 ||v||2 , ∀k ≤ l,
∀v ∈ Rd .
j∈Nk Then (4.2) holds with
2 This
n
[1..n]
in
and put, instead of (4.1) the restriction on the
replaced by
l
in the right-hand side.
is exactly Zaitsev's original result
22
Remark 4.6.
In some special cases one can obtain better dependence of
the parameters on the dimension. covariance operators, then
( E exp where
C1 , C2
For example, if all variables
3 one can write, instead of (4.2),
˜ Y˜ ) C1 D∆n (X, + 3 τ d log d
Xj
have unit
) ≤ exp{C2 d9/4+α log+ n},
are some constants depending on
α.
On the other hand, if all third-order moments of the summands vanish, one can drop the
log+ d
in the right-hand side of (4.2).
REFERENCES Bártfai, P. (1966), Die Bestimmung der zu einem wiederkehrenden Prozess gehörenden Verteilungfunktion aus den mit Fehlern behafteten Daten einer einzigen Realisation, Studia Sci. Math. Hungar., 1, 161168. Csörg®, M., Révész, P. (1981) Strong approximations in probability and statistics, New York, Academic Press. Einmahl, U. (1987) Strong invariance principles for partial sums of independent random vectors. Ann. Probab., 15, 1419-1440. Einmahl, U. (1989) Extensions of results of Komlós, Major, and Tusnády to the multivariate case. J.Multivar.Anal., 28, 20-68. Götze, F., Zaitsev, A.Yu. (1997) Hungarian construction for almost Gaussian vectors, Preprint N 97-071, Universität Bielefeld, 29 p. Komlós, J., Major, P., and Tusnády, G. (1975) An approximation of partial sums of independent RV'-s and the sample DF.I, Z. Wahrscheinlichkeitstheor. verw. Geb., 32, 111131. Komlós, J., Major, P., and Tusnády, G. (1976) An approximation of partial sums of independent RV'-s and the sample DF.II, Z. Wahrscheinlichkeitstheor. verw. Geb., 34, 3458. Major, P., (1979) An improvement of Strassen's invariance principle, Ann. Probab., 7, 55-61. Sakhanenko, A.I. (1984) Rate of convergence in the invariance principles for variables with exponential moments that are not identically distributed., In: Trudy Inst. Mat. SO AN SSSR, 3, Nauka, Novosibirsk, 449 (Russian). Shao, Q., (1995) Strong approximation theorems for independent random variables and their applications, J. Multivar. Anal., 52, 107-130. Strassen, V.,(1964) An invariance principle for the law of iterated logarithm, Z. Wahrscheinlichkeitstheor. verw. Geb., 3, 211-226. Zaitsev, A.Yu.
(1986) Estimates of the LévyProkhorov distance in the
multivariate central limit theorem for random variables with nite exponential moments, Theory of Probability and its Applications, 31, 203-220. Zaitsev, A.Yu.
(1998a), Multidimensional version of the results of Kom-
lós, Major, and Tusnády for vectors with nite exponential moments, ESAIM: Probability and Statistics, 2, 41-108.
3 Zaitsev
(1998a)
23
Zaitsev, A.Yu. (1998b), Multidimensional version of a result of Sakhanenko in the invariance principle for vectors with nite exponential moments, Preprint N 98-045, Universität Bielefeld, 82 p.
24