1 Kernel density estimation, local time and chaos ...

Viewer
Transcript

1 Kernel density estimation, local time and chaos expansion Ciprian A. TUDOR Laboratoire Paul Painlev´e, Universit´e de Lille 1, F-59655 Villeneuve d’Ascq, France, email: [email protected].

Summary. In this paper we develop an asymptotic theory for some regression models involving standard Brownian motion and standard Brownian sheet.

2010 AMS Classification Numbers: 60F05, 60H05, 91G70. Key words: limit theorems, Brownian motion, Brownian sheet, regression model, weak convergence.

1.1 Introduction The motivation of our work comes from the econometric theory. Consider a regression model of the form yi = f (xi ) + ui , i≥0 (1.1) where (ui )i≥0 is the ”error” and (xi )i≥0 is the regressor. The purpose is to estimate the function f based on the observation of the random variables yi , i ≥ 0. The conventional kernel estimate of f (x) is Pn Kh (xi − x)yi ˆ f (x) = Pi=0 n i=0 Kh (xi − x) R R where K is a nonnegative real kernel function satisfying R K 2 (y)dy = 1 and R yK(y)dy = 0 and Kh (s) = h1 K( ns ). The bandwidth parameter h ≡ hn satisfies hn → 0 as n → ∞. We will choose in our work hn = nα with 0 < α < 21 . The asymptotic behavior of the estimator fˆ is usually related to the behavior of the sequence n X Vn = Kh (xi − x)ui . i=1

The limit in distribution as n → ∞ of the sequence Vn has been widely studied in the literature in various situations. We refer, among others, to [5] and [6] for the case where xt is a recurrent Markov chain, to [13] for the case where xt is a partial sum of a general linear process, and [14] for a more general situation. See also [10] or [11]. An important assumption in the main part of the above references is the fact that ui is a martingale difference sequence. In our work we will consider the following situation: first the error ui is chosen to be ui = Wi+1 − Wi for every i ≥ 0, where (Wt )t≥0

2

Ciprian A. TUDOR

denotes a standard Wiener process and xi = Wi for i ≥ 0. Note that in this case, although for every i the random variables ui and xi are independent, there is not global independence between the regressor (xi )i≥0 and (ui )i≥0 . However, this case has been already treated in previous works (see e.g. [13], [14]). See also [2] for models related with fractional Brownian motion. In this case, the sequence Vn reduces to (we will also restrict to the case x = 0 because the estimation part is not addressed in this paper) n−1 X Sn = K(nα Wi ) (Wi+1 − Wi ) . (1.2) i=0

The second case we consider concerns a two-parameter model: yi,j = f (xi,j ) + ei,j , (2)

(2)

i, j ≥ 0

(1.3)

(2)

(2)

where ei,j = Wi+1,j+1 −Wi+1,j −Wi,j+1 +Wi,j are the rectangular increments of a Wiener sheet W (2) (see Section 2 for the definition of the Wiener sheet). This case seems to be new in the literature. But in this situation, because of the complexity of the stochastic calculus for two-parameter processes, we will restrict ourselves to case when the regressor xi,j is independent by the error ui,j . That is, (1) we assume that xi,j = Wi,j where W (1) is a Wiener sheet independent by W (2) . The model (1.3) leads to the study of the sequence Tn =

n−1 X

³ ´³ ´ (1) (2) (2) (2) (2) K nα Wi,j Wi+1,j+1 − Wi+1,j − Wi,j+1 + Wi,j .

i,j=0

We will assume that the kernel K is the standard Gaussian kernel x2 1 K(x) = √ e− 2 . 2π

(1.4)

The limits in distribution of Sn and Tn will be c1 βLW (1,0) and c2 βLW (1) (1,0) respectively, where LW (1)

and LW denote the local time of W and W (1) respectively, β is a Brownian motion independent by W and W (1) and c1 , c2 are explicit positive constants.

1.2 The one parameter case Let (Wt )t≥0 be a standard Brownian motion on a standard probability space (Ω, F, P ) and let us consider the sequence Sn given by (1.2) with 0 < α < 12 and the kernel function K given by (1.4). Denote by Ft the filtration generated by W. Our first step is to estimate the L2 mean of Sn . Lemma 1. As n → ∞ it holds that

√ n

α− 12

ESn2

→C=

2 . 2π

Proof: Recall that, if Z is a standard normal random variable, and if 1 + 2c > 0 ´ ³ 2 1 E e−cZ = √ . 1 + 2c

(1.5)

Since the increments of the Brownian motion are independent and Wi+1 − Wi is independent by Fi for every i, it holds that (here Z denotes a standard normal random variable)

1 Kernel density estimation, local time and chaos expansion

ESn2 = E

n−1 X

2

K 2 (nα Wi ) (Wi+1 − Wi ) = E

i=0

= and this behaves as

1 2π

√

2 −α+ 12 2π n

n−1 X

3

K 2 (nα Wi )

i=0

n−1 X

2α

Ee−n

iZ 2

=

i=0

1 2π

n−1 X

¡ ¢− 1 1 + 2n2α i 2

i=0

when n tends to infinity. 1

α

In the following our aim is to prove that the sequence n− 4 + 2 Sn converges in distribution to a non-trivial limit. Note that the sequence Sn can be written as Sn =

n−1 X i=0

=

n−1 X Z i+1 i=0

n−1 X Z i+1

K(nα Wi )(Wi+1 − Wi ) =

i

Z K(nα W[s] )dWs =

i=0

0

n

K(nα Wi )dWs

i

K(nα W[s] )dWs

where [s] denotes the integer part of the real number s. Define, for every t ≥ 0, Z t n K(nα W[s] )dWs . St =

(1.6)

0

Then for every n ≥ 1 the process (Stn )t≥0 is a Ft martingale (recall that Ft denotes the sigma algebra generated by the Wiener process W ). The bracket of the martingale (St )t≥0 will be given by, for every t ≥ 0 Z t

hS n it = 0

K 2 (nα W[s] )ds.

This bracket plays a key role in order to understand the behavior of Sn . Let us first understand the limit of the sequence hS n in . Its asymptotic behavior is related to the local time of the Brownian motion. We recall its definition. For any t ≥ 0 and x ∈ R we define LW (t, x) as the density of the occupation measure (see [1], [3]) Z t µt (A) = 1A (Ws )ds, A ∈ B(R). 0

The local time LW (t, x) satisfies the occupation time formula Z t Z f (Ws )ds = LW (t, x)f (x)dx 0

(1.7)

R

for any measurable function f . The local time is H¨older continuous with respect to t and with respect to x. Moreover, it admits a bicontinuous version with respect to (t, x). We will denote by pε the Gaussian kernel with variance ε > 0 given by pε (x) = Note that 1 1 1 nα+ 2 K 2 (nα+ 2 Wt ) = √ p 1 −α− 12 (Wt ) 2 π 2n and by the scaling property of the Brownian motion

2

x √ 1 e− 2ε 2πε

.

4

Ciprian A. TUDOR 1

1

n−1 X

1

n−1 X

n− 2 +α hS n in = n− 2 +α

K 2 (nα Wi )

i=0

=(d) n− 2 +α

i=0

1

K 2 (nα+ 2 W ni ) =

n−1 1X 1 √ p 1 −2α−1 (W i ) n n i=0 2 π 2 n

. where =(d) means the equality in distribution. A key point of our paper is the following result which gives the convergence of the ”bracket”. Pn−1 Lemma 2. The sequence n1 i=0 p 1 −α− 21 (W ni ) converges in L2 (Ω), as n → ∞ to LW (1, 0). 2n

R1

Proof: Let us recall that 0 pε (Ws )ds converges as ε → 0 to LW (1, 0) in L2 (Ω) and almost surely (see e.g. [9]). Using this fact, it suffices to show that the quantity µZ 1 ³ ´ ¶2 (1.8) In := E pαn (Ws ) − pαn (W [ns] ) ds n

0

1

converges to zero as n → ∞, where we denoted by αn = 12 n−α− 2 . We have Z 1Z 1 ´³ ´ ³ In = E dsdt pαn (Ws ) − pαn (W [ns] ) pαn (Wt ) − pαn (W [nt] ) 0

0

Z

Z

1

= 2E

n

t

dt 0

0

n

³

´³ ´ ds pαn (Ws ) − pαn (W [ns] ) pαn (Wt ) − pαn (W [nt] ) . n

n

Notice that for every s, t ∈ [0, 1], s ≤ t, Epε (Ws )pε (Wt ) = E (E [pε (Ws )pε (Wt )|Fs ]) = E (pε (Ws )E [pε (Wt )|Fs ]) = E (pε (Ws )E [pε (Wt − Ws + Ws )|Fs ]) . By the independence of Wt − Ws and FsW we get E [pε (Wt − Ws + Ws )|Fs ] = (Epε (Wt − Ws + x))x=Ws = pε+t−s (Ws ). We will obtain Epε (Ws )pε (Wt ) = Epε (Ws )pε+t−s (Ws ) µ ¶ 12 1 s = √ . 2π εs + ε(t − s + ε) + s(t − s + ε) This sequence converges to same limit.

√1 2π

as ε → 0. If we replace s or t by

[ns] s

or

[nt] n

(1.9) respectively, we get the

As a consequence of the Lemma 2 we obtain 1

Proposition 1 The sequence n− 2 +α hS n in converges in distribution, as n → ∞, to µZ ¶ 1 2 K (y)dy LW (1, 0) = √ LW (1, 0) 2 π R where LW denotes the local time of the Brownian motion W .

1 Kernel density estimation, local time and chaos expansion

5

Proof: The conclusion follows because 1

1

n− 2 +α hS n in = n− 2 +α

n−1 X

1

K 2 (nα Wi ) =(d) n− 2 +α

i=0

n−1 X i=0

1

K 2 (nα+ 2 W ni )

and this converges to LW (1, 0) in L2 (Ω) from Lemma 2. Remark 1. Intuitively, the result in Proposition 1 follows because n

− 12 +α

n−1 X

2

− 12 +α

α

K (n Wi ) =(d) n

i=0

Z

∼n

1

1 2 +α

i=0

Z 2

α+ 12

K (n

1

K 2 (nα+ 2 W ni ) Z

2

Ws )ds =

0

n−1 X

K (n

α+ 12

W

2

x)L (1, x)dx =

R

dyK (y)L R

W

µ 1,

¶

y 1

nα+ 2

where we used the occupation time formula (1.7). The bicontinuity of the local time implies that this last expression converges to the limit in Proposition 1. We state the main result of this part. α

1

Theorem 1. Let Sn be given by (1.2). Then as n → ∞, the sequence n 2 − 4 Sn converges in distribution to µµZ ¶ ¶ 12 K 2 (y)dy LW (1, 0) Z R

where Z is a standard normal random variable independent by LW (1, 0). Proof: A similar argument has already been used in [4]. Obviously, Z 1

³ ´ 1 K nα+ 2 W [ns] dWs := Tn .

1

α

Sn =(d) n 4 + 2

n

0

Let

Z 1

α

Ttn = n 4 + 2

t

³ ´ 1 K nα+ 2 W [ns] dWs . n

0

Ttn

Then is a martingale with respect to the filtration of W . We can show that hT n , W i converges to zero in probability as n → ∞. Indeed, Z t ³ ´ 1 1 α hT n , W it = n 4 + 2 K nα+ 2 W [ns] ds 0

n

and this clearly goes to zero using formula (1.5). It is not difficult to see¢that the convergence is ¡R uniform on compact sets. On the other hand hT n i1 converges to R K 2 (y)dy LW (1, 0) in L2 (Ω) from Lemma 2. The result follows immediately from the asymptotic Knight theorem (see [12], Theorem 2.3 page 524, see also [4]) .

6

Ciprian A. TUDOR

1.3 The multiparameter settings This part concerns the two-parameter model (1.3) defined in the introduction. Let W (1) and W (2) denote two independent Wiener sheets on a probability space (Ω, F, P ). Recall that a Brownian sheet (Wu,v )u,v≥0 is defined as a centered two-parameter Gaussian process with covariance function E (Ws,t Wu,v ) = (s ∧ u)(t ∧ v) for every s, t, u, v ≥ 0. The model (1.3) leads to the study of the sequence Tn =

n−1 X

³ ´³ ´ (1) (2) (2) (2) (2) K nα Wi,j Wi+1,j+1 − Wi+1,j − Wi,j+1 + Wi,j .

(1.10)

i,j=0

As in the previous section, we will first give the renormalization of the L2 norm of Tn as n → ∞. Proposition 2 We have ¡ ¢2 E nα−1 Tn →n→∞

√

2 . π

Proof: By the independence of W (1) and W (2) and by the independence of the increments of the Brownian sheet W (2) we have, using (1.5) ETn2 =

n−1 n−1 ³ ´ ´ (1) 2 1 X ³ −n2α (Wi,j 1 1 X (1) ) √ E K 2 (nα Wi,j ) = E e = 2π i,j=0 2π i,j=0 1 + 2n2α ij i,j=0 n−1 X

Pn−1

√ behaves, when n → ∞ as 2 n. Pn−1 (1) We will first study the ”bracket” hT in = i,j=0 K 2 (nα Wi,j ) which is in some sense the analogous of the bracket of Sn defined in the one-dimensional model. For simplicity, we will still use the notation hT in even if it is not anymore a true martingale bracket (the stochastic calculus for two parameter martingales is more complex, see e.g. [8]). By the scaling property of the Brownian sheet, the sequence nα−1 hT in has the same distribution as

and the conclusion follows because

i=0

nα−1

1 √ i

n−1 X i,j=0

³ ´ (1) K 2 nα+1 W i , j . n n

Note that for every u, v ≥ 0 we can write √ α+1 2 ³ α+1 (1) ´ 1 (1) 1 πn K n Wu,v = p 2(1+α) (Wu,v ). 2 2n As a consequence nα−1 hT in has the same law as n−1 ³ ´ 1 1 X (1) √ 2 1 Wi,j . p 2(1+α) 2 π n i,j=0 2n n n

In the limit of the above sequence, the local time of the Brownian sheet W (1) will be involved. This local time can be defined as in the one-dimensional case. More precisely, for any s, t ≥ 0 and x ∈ R (1) the local time LW ((s, t), x) is defined as the density of the occupation measure (see [1], [3])

1 Kernel density estimation, local time and chaos expansion

Z tZ µs,t (A) =

s

1A (Wu,v )dudv, 0

A ∈ B(R).

0

and it satisfies the occupation time formula: for any measurable function f Z tZ s Z (1) (1) f (Wu,v )dudv = LW ((s, t), x)f (x)dx. 0

7

0

(1.11)

R

The following lemma is the two-dimensional counterpart of Lemma 2. ³ ´ Pn−1 (1) (1) 1 Lemma 3. The sequence n12 i,j=0 p 2α+2 W i , j converges in L2 (Ω) an n → ∞ to LW (1, 0) 2n

where LW

n n

(1)

(1, 0) denotes the local time of the Brownian sheet W (1) , where 1 = (1, 1). R1R1 Proof: This proof follows the lines of the proof of Lemma 2. Since 0 0 pε (Wu,v )duv converges to (1) LW (1, 0) as ε → 0 (in L2 (Ω) and almost surely, it suffices to check that µZ

1

Z

1

Jn := E 0

³

¶2 ´ pαn (Wu,v ) − pαn (W [un] , [vn] ) dudv n

0

n

converges to zero as n → ∞ with αn = 12 n−2α−2 . This follows from the formula, for every a ≥ u and b≥v E (pε (Wa,b − Wu,v )pε (Wu,v )) = E (pε (Wu,v )pε+ab−uv (Wu,v )) and relation (1.9). Let us now state our main result of this section. α

1

Theorem 2. As n → ∞, the sequence n 2 − 2 Tn converges in distribution to where LW (1, 0) is the local time of the Brownian sheet W (1) , c0 = random variable independent by W

(1)

1 √ 2 π

³ ´ 12 (1) c0 LW (1, 0) Z

and Z is a standard normal

.

Proof: We will compute the characteristic function of ³the Tn . Let ´ λ ∈ R. Since the conditional law Pn−1 (1) 2 α (1) of Tn given W is Gaussian with variance i,j=0 K n Wi,j we can write µ ¶ µ µ ¶¶ α 1 α 1 iλn 2 − 2 Tn iλn 2 − 2 Tn (1) E e =E E e |W µ ¶ ³ ´ ³ λ2 α−1 ´ Pn−1 2 (1) − λ nα−1 i,j=0 K 2 nα Wi,j =E e 2 = E e− 2 n hT in . By the scaling property of the Brownian sheet, the sequence nα−1 hT in =(d) nα−1

n−1 ³ ´ ´ ³ 1 1 X (1) (1) 1 K 2 nα+1 W i , j = √ 2 Wi,j . p 2(1+α) 2 π n i,j=0 2n n n n n i,j=0 n−1 X

The result follows from Lemma 3. Remark 2. A similar remark as Remark 1 is available in the two-parameter settings. Indeed, the basic idea of the result is that

8

Ciprian A. TUDOR α−1

n

α−1

Tn =(d) n

=n

K

2

i,j=0

Z α+1

n−1 X

K

2

R

¡

α+1

n

¢

x L

W (1)

³ n

α+1

(1) Wi,j n n

Z

´ ∼n

α+1

Z

0

Z 2

(1, x)dx =

1

K (y)L

W

³ (1)

R

1

0

1,

³ ´ (1) K 2 nα+1 Wu,v dudv y ´ nα+1

Z K 2 (y)dyLW

dy →n→∞

(1)

(1, 0)

R

by using (1.11) and the bicontinuity of the local time. As a final remark, let us mention that above result (and) the model (1.3) can be relatively easily extended to the case of N -parameter Brownian motion, with N ≥ 2.

References 1. S. Berman (1973): Local nondeterminism and local times of Gaussian processes. Indiana Univ. Math. J. 23, 6994. 2. S. Bourguin and C.A. Tudor (2010): Asymptotic theory for fractional regression models via Malliavin calculus.Journal of Theoretical Probability, to appear. 3. D. Geman and J. Horowitz (1980): Occupation densities. Ann. Probab. 8, 167. 4. Yaozhong Hu and D. Nualart (2009): Stochastic integral representation of the L2 modulus of local time and a centram limit theorem. Electronic Communications in Probability, 14, 529-539. 5. H.A. Karlsen and D. Tjostheim (2001): Nonparametric estimation in null recurrent time series. The Annals of Statistics, 29, 372-416. 6. H.A. Karlsen, T. Mykklebust and D. Tjostheim (2007): Non parametric estimation in a nonlinear cointegrated model. The Annals of Statistics, 35, 252-299. 7. D. Nualart (2006): Malliavin calculus and related topics, 2nd ed. Springer. 8. D. Nualart (1984): On the Quadratic Variation of Two-Parameter Continuous Martingales. Ann. Probab. 12(2), 445-457. 9. D. Nualart and J. Vives : Chaos expansion and local time. Publications Matem` atiques, 36, 827-836. 10. J.Y. Park and P.C.B. Phillips (2001): Nonlinear regression with integrated time series. Econometrica, 74, 117-161. 11. P.C.B. Phillips (1988): Regression theory for near-integrated time series. Econometrica, 56, 1021-1044. 12. D. Revuz and M. Yor (1999): Continuous martingales and Brownian motion. Springer. 13. Q. Wang and P. Phillips (2009): Asymptotic Theory for the local time density estimation and nonparametric cointegrated regression. Econometric Theory, 25, 710-738. 14. Q. Wang and P. Phillips (2009): Structural Nonparametric cointegrating regression. Econometrica, 77(6), 1901-1948.

Fast Conditional Kernel Density Estimation