Journal of Econometrics 140 (2007) 529–573 www.elsevier.com/locate/jeconom

Efﬁcient estimation of general dynamic models with a continuum of moment conditions Marine Carrascoa,, Mikhail Chernovb, Jean-Pierre Florensc, Eric Ghyselsd a

Universite´ de Montre´al, Montre´al, QC, Canada H3C3J7 Columbia Graduate School of Business, New York, NY 10027, USA c University of Toulouse, 31000 Toulouse, France d University of North Carolina, Chapel Hill, NC 27599, USA

b

Available online 1 September 2006

Abstract There are two difﬁculties with the implementation of the characteristic function-based estimators. First, the optimal instrument yielding the ML efﬁciency depends on the unknown probability density function. Second, the need to use a large set of moment conditions leads to the singularity of the covariance matrix. We resolve the two problems in the framework of GMM with a continuum of moment conditions. A new optimal instrument relies on the double indexing and, as a result, has a simple exponential form. The singularity problem is addressed via a penalization term. We introduce HAC-type estimators for non-Markov models. A simulated method of moments is proposed for nonanalytical cases. r 2006 Elsevier B.V. All rights reserved. JEL classification: C13; C22; G12 Keywords: Characteristic function; Efﬁcient estimation; Afﬁne models

1. Introduction This paper proposes a general estimation approach which combines the attractive features of method of moments estimation with the efﬁciency of the maximum likelihood Corresponding author. Tel.: +1 514 343 2394; fax: +1 514 343 7221.

E-mail addresses: [email protected] (M. Carrasco), [email protected] (M. Chernov), ﬂ[email protected] (J.-P. Florens), [email protected] (E. Ghysels). 0304-4076/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2006.07.013

ARTICLE IN PRESS 530

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

estimator (MLE) in one framework. The method exploits the moment conditions computed via the characteristic function (CF) of a stochastic process instead of the likelihood function, as in the recent work by Chacko and Viceira (2005), Jiang and Knight (2002), and Singleton (2001). The most obvious advantage of such an approach is that in many cases of practical interest the CF is available in analytic form, while the likelihood is not. Moreover, the CF contains the same information as the likelihood function. Therefore, a clever choice of moment conditions should provide the same efﬁciency as ML. The main contribution of this paper is the resolution of two major difﬁculties with the estimation via the CF. The ﬁrst one is related to the intuition that the more moments one generates by varying the CF argument, the more information one uses, and, therefore, the estimator becomes more efﬁcient. However, as one reﬁnes the ﬁnite grid of CF argument values, the associated covariance matrix approaches singularity. The second difﬁculty is that in addition to a large set of CF-based moment conditions, one requires an optimal instrument to achieve the ML efﬁciency. Prior work (e.g. Feuerverger and McDunnough, 1981 or Singleton, 2001) derived the optimal instrument, which is a function of the unknown probability density. Such an estimator is clearly hard to implement. We use the extension of GMM to a continuum of moment conditions (referred to as CGMM) of Carrasco and Florens (2000). Instead of selecting a ﬁnite number of grid points, the whole continuum of moment conditions is used, guaranteeing the full efﬁciency of the estimator. To implement the optimal C-GMM estimator, it is necessary to invert the covariance operator, which is the analog of the covariance matrix in ﬁnite dimension. Because of the inﬁnity of moments, the covariance operator is nearly singular and its inverse is highly unstable. To stabilize the inverse, we introduce a penalization parameter, aT . This term may be thought of as the counterpart of the grid width in the discretization. We document the rate of convergence of aT and give a heuristic method for selecting it via bootstrap. In order to ﬁnd an implementable optimal instrument, our paper provides various extensions of the initial work by Carrasco and Florens (2000). While the original work deals with iid data, we derive the asymptotic properties of the C-GMM estimator applied to weakly dependent data and correlated moment functions. The moment functions may be complex valued and be functions of an index parameter taking its values in Rd for an arbitrary dX1 in order to accommodate the speciﬁc features of CF. To solve for the optimal instrument, we distinguish two cases depending on whether the observable variables are Markov or not. In the Markov case, the moment conditions are based on conditional CF. We propose to span the unknown optimal instrument by an inﬁnite basis consisting of simple exponential functions. Since the estimation framework already relies on a continuum of moment conditions, adding a continuum of spanning functions does not pose any problems. As a result, we achieve ML efﬁciency when we use the values of conditional CF indexed by its argument as moment functions. We propose a simulated method of moments type estimator for the cases when the CF is unknown. If one is able to draw from the true conditional distribution, then the conditional CF can be estimated via simulations and ML efﬁciency obtains. This approach can be thought as a simple alternative to the indirect inference proposed by Gourie´roux et al. (1993), the efﬁcient method of moments (EMM) suggested by Gallant and Tauchen

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

531

(1996), and the non-parametric simulated maximum likelihood method (Fermanian and Salanie, 2004).1 If the observations are not Markov, it is in general not possible to construct the conditional CF.2 Therefore, we estimate our parameter using the joint CF of a particular number of data lags. The resulting moment function is not a martingale difference sequence. A remarkable feature of the joint CF is that the usual GMM ﬁrst-stage estimator is not required. While we were not able to obtain optimal moment functions yielding ML efﬁciency in this case, we derived an upper bound on the variance of the resulting estimator. In the worst case scenario, if one uses the joint CF for estimation, the variance of the C-GMM estimator corresponds to that of the ML estimator based on the joint density of the same data lags. As the joint CF is usually unknown, a simulated method of moments becomes handy to do inference in the non-Markovian case. The simulation scheme differs from that used in the Markov case. Instead of simulating conditionally on the observable data, we simulate the full time-series as it is done in Dufﬁe and Singleton (1993). The paper is organized as follows. The ﬁrst section reviews issues related to the estimation via CF and discusses the estimator in the most simple case of moments forming martingale difference sequences. Section 3 extends the C-GMM proposed by Carrasco and Florens (2000) to the case where the moment functions are correlated. It shows how to estimate the long-run covariance and how to implement the C-GMM estimator in practice. Section 4 specializes to the cases where the moment conditions are based either on the conditional characteristic function (CCF) or joint characteristic function (JCF). In Section 4.1, we derive the ML-efﬁcient estimator based on the conditional CF. Then, in Section 4.2, we discuss the properties of the estimator based on joint CF, which is relevant for processes with latent variables. Section 5 establishes the properties of the simulationbased estimators when CF is not available in analytic form. Finally, a Monte Carlo comparison of the C-GMM estimator with other popular estimators is reported in Section 6. The last section concludes. All regularity conditions are collected in Appendix A. All the proofs are provided in Appendix B.

2. Estimation methods based on the CF In this section we discuss the major unresolved issues pertaining to estimation via CF and explain how we propose to tackle them via GMM based on the continuum of moment conditions (C-GMM). 2.1. Preliminaries We assume that the stationary Markov process X t is a p 1-vector of random variables which represents the data-generating process indexed by a ﬁnite dimensional parameter y 2 Y Rq . Suppose the sequence X t ; t ¼ 1; . . . ; T is observed. Singleton (2001) proposes 1

Similarly, Altissimo and Mele (2005) have recently proposed a method to estimate diffusions efﬁciently. It consists in minimizing the distance between two kernel estimates of the conditional density, one based on the actual data and the other based on simulated data. 2 Bates (2003) was able to construct a conditional MLE for certain types of afﬁne models based on the CF.

ARTICLE IN PRESS 532

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

an estimator based on the CCF. The CCF of X tþ1 given X t is deﬁned as cy ðsjX t Þ Ey ðeisX tþ1 jX t Þ

(1)

and is assumed to be known. If it is not known, it can be easily recovered by simulations. Eq. (1) implies that the following unconditional moment conditions are satisﬁed: Ey ½ðeisX tþ1 cy ðsjX t ÞÞAðX t Þ ¼ 0

for all s 2 Rp ,

where AðX t Þ is an arbitrary instrument. Let Y t ¼ ðX t ; X tþ1 Þ0 : There are two issues of interest here: the choice of s and the choice of the instrument AðX t Þ. Besides being a function of X t , A may be a function of an index r either equal to or different from s. The following two types of unconditional moment functions are of particular interest: SI DI

the Single Index moment functions: hðs; Y t ; yÞ ¼ Aðs; X t ÞðeisX tþ1 cy ðsjX t ÞÞ where s 2 Rp and Aðs; X t Þ ¼ Aðs; X t Þ; the Double Index moment functions: hðt; Y t ; yÞ ¼ Aðr; X t ÞðeisX tþ1 cy ðsjX t ÞÞ where t ¼ ðr; sÞ0 2 R2p and Aðr; X t Þ ¼ Aðr; X t Þ.

Note that in either case, the sequence of moment functions fhð:; Y t ; yÞg is a martingale difference sequence with respect to the ﬁltration I t ¼ fX t ; X t1 ; . . . ; X 1 g, hence it is uncorrelated. We now discuss which choice of instruments A is optimal, i.e. yields an efﬁcient GMM-CCF estimator, where ‘‘efﬁcient’’ means as efﬁcient as the MLE. 2.2. Single index moment functions Feuerverger and McDunnough (1981) and Singleton (2001) show that the optimal SI instrument is Z 1 q ln f y Aðs; X t Þ ¼ eisx ðxjX t Þ dx. (2) p ð2pÞ qy The obvious drawback of this instrument is that it requires the knowledge of the unknown conditional likelihood function, f y , of X tþ1 given X t :3 Singleton (2001) addresses the problem of the unknown score by discretizing over t: To simplify the exposition, assume momentarily that p ¼ 1. The method consists in dividing an interval ½Md; Md R into ð2M þ 1Þ equally spaced intervals of width d. Let tj ¼ Md þ dðj 1Þ, j ¼ 1; 2; . . . ; 2M þ 1 be the grid points. Let rðY t ; yÞ be the ð2M þ 1Þ vector with element ðeitj X tþ1 cy ðtj jX t ÞÞ. Applying the results of Hansen (1985), the optimal instrument for this ﬁnite set of moment conditions is AðX t Þ ¼ E½5y rjX t E½rr0 jX t 1 (3) 4 which can be explicitly PT computed as a function of cy . Singleton showsL that the estimator solution of ð1=TÞ t¼1 AðX t ÞrðY t ; yÞ ¼ 0 has an asymptotic variance V d that converges to 3

There are certain parallels between the raised issues and the estimation of univariate subordinated diffusions via an inﬁnitesimal generator in Conley et al. (1997). They show that, assuming a continuous sampling, constructing moment conditions by applying the generator to the likelihood score of the marginal distribution is optimal and, in particular, is more efﬁcient than building moments via the score directly. Being unable to implement in practice the corresponding optimal instrument (or test function) for the discrete sampling case, they still use the score for the empirical application. 4 A depends on the unknown y but an estimator of A can be obtained by replacing y by a consistent ﬁrst step estimator.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

533

the Cramer–Rao bound as M approaches inﬁnity and d goes to zero. However, no rates of convergence for M and d are provided. In practice, if the grid is too ﬁne ðd smallÞ, the covariance matrix E½rr0 jX t becomes singular and Singleton’s estimator is not feasible. The second caveat is that optimal instruments depend on the selected grid, i.e. as one reﬁnes the grid, new instruments have to be selected. Therefore, it is not clear how it is going to impact the estimator in practice. In this paper, we will be able to address the two raised issues (i) optimal selection of instrument and (ii) potential covariance matrix singularity, without relying on the unknown probability density function. 2.3. Double index moment functions When r ¼ s is not imposed, there is a choice of instrument that does not depend on the unknown pdf, while attaining the ML-efﬁciency. The optimal DI instrument is Aðr; X t Þ ¼ eirX t .

(4)

It gives rise to a double index moment function ht ðt; yÞ ¼ ðeisX tþ1 cy ðsjX t ÞÞeirX t , 0

(5)

2p

where t ¼ ðr; sÞ 2 R . Such a choice of instrument is quite intuitive. Although we cannot construct the optimal instrument in (2), we can span it via a set of basis functions fexpðirX t Þg. The resulting GMM estimator will be as efﬁcient as the MLE provided that the full continuum of moment conditions indexed by t is used. For this purpose, we use the method proposed by Carrasco and Florens (2000). This approach has two advantages: (i) the instrument expðirX t Þ has a simple form; (ii) in contrast to Singleton’s instrument (3), it does not depend on the discretization grid involved in numerical implementation of integration over t:5 In the sequel we extend the C-GMM methodology of Carrasco and Florens (2000) so that it could be applicable to the moment function (5). 3. C-GMM with dependent data This section will extend the results of Carrasco and Florens (2000) from the iid case to the case where the data are weakly dependent. We also allow the moment functions to be complex valued and be functions of an index parameter taking its values in Rd for an arbitrary dX1 in order to accommodate the speciﬁc features of CF. The results of this section are not limited to estimation using CF but apply to a wide range of moment conditions. The ﬁrst subsection proves the asymptotic normality and consistency of the CGMM estimator, introduces the covariance operator and its regularized version, which is known to yield the C-GMM estimator with the smallest variance. The next subsection derives the convergence rate of the estimator of the covariance operator. The third subsection proposes a simple way to compute the C-GMM objective function in terms of matrices and vectors. The last subsection discusses the choice of moment conditions to achieve ML efﬁciency. 5

However, as we will see below, a smoothing parameter is introduced to be able to handle the full continuum of moments.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

534

3.1. General asymptotic theory The data are assumed to be weakly dependent (see Assumption A.1 for a formal deﬁnition). The C-GMM estimator is based on the arbitrary set of moment conditions: Ey0 ht ðt; y0 Þ ¼ 0,

(6) 0

where ht ðt; yÞ hðt; Y t ; yÞ with Y t ¼ ðX t ; X tþ1 ; . . . ; X tþL Þ for some ﬁnite integer L; and index t 2 Rd . 6 As a function of t; ht ð:; y0 Þ is supposed to belong to the set L2 ðpÞ as described in Deﬁnition A.2. Moreover all parameters P are identiﬁed by the moment conditions (6), see Assumption A.3. Let h^T ðt; y0 Þ ¼ Tt¼1 ht ðt; y0 Þ=T. In the sequel, we write the functions ht ð:; y0 Þ, h^T ð:; y0 Þ as ht ðy0 Þ and h^T ðy0 Þ or to simplify ht and h^T . fht ðy0 Þg is supposed to satisfy the set of Assumptions A.4, in particular ht should be a measurable function of Y t . Since L is ﬁnite, ht inherits the mixing properties of X t . Finally, ht is assumed to be scalar because the CF itself is scalar and hence we do not need results for a vector ht . If ht is a vector, we can get back to a scalar function by deﬁning h~t ði; tÞ as the ith component of ht ðtÞ, then h~t is a scalar function indexed by ði; tÞ 2 f1; 2; . . . ; Mg Rd . These assumptions allow us to establish the asymptotic normality of the moment functions. Lemma 3.1. Under regularity conditions A.1–A.3, and A.4(i)(ii) we have pﬃﬃﬃﬃ T h^T ðy0 Þ ) Nð0; KÞ as T ! 1, in L2 ðpÞ where Nð0; KÞ is the Gaussian random element of L2 ðpÞ with a zero mean and the covariance operator K : L2 ðpÞ ! L2 ðpÞ satisfying 1 Z X Ey0 ½h1 ðt; y0 Þhj ðl; y0 Þf ðlÞpðlÞ dl ðKf ÞðtÞ ¼ (7) j¼1

for all f in L2 ðpÞ.7 Moreover the operator K is a Hilbert– Schmidt operator.8 We can now establish the standard properties of GMM estimators: consistency, asymptotic normality and optimality. Proposition 3.1. Assume the regularity conditions A.1–A.4 hold. Moreover, let B be a one-toone bounded linear operator defined on L2 ðpÞ or a subspace of L2 ðpÞ. Let BT be a sequence of random bounded linear operators converging to B. The C-GMM estimator y^ T ¼ arg min kBT h^T ðyÞk y2Y

has the following properties: (1) y^ T is consistent and asymptotically normal such that pﬃﬃﬃﬃ L T ðy^ T y0 Þ ! Nð0; V Þ 6

In the previous section we discussed the case corresponding to L ¼ 1. Deﬁnition A.1 describes a Hilbert–space valued random element. 8 For a deﬁnition and the properties of Hilbert–Schmidt operators, see Dunford and Schwartz (1988). As K is a Hilbert–Schmidt operator, it can be approached by a sequence of bounded operators denoted K T . This property will become important when we discuss how to estimate K. 7

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

535

with V ¼ hBEy0 ðry hÞ; BEy0 ðry hÞ0 i1 hBEy0 ðry hÞ; ðBKBn ÞBEy0 ðry hÞ0 i hBEy0 ðry hÞ; BEy0 ðry hÞ0 i1 . (2) Among all admissible weighting operators B, there is one yielding an estimator with minimal variance. It is equal to K 1=2 , where K is the covariance operator defined in (7). As discussed in Carrasco and Florens (2000), the operator K 1=2 does not exist on the whole space L2 ðpÞ but only on a subset, denoted HðKÞ, which corresponds to the so-called reproducing kernel Hilbert space (RKHS) associated with K (see Parzen, 1970; Carrasco and Florens, 2004, for details). The inner product deﬁned on HðKÞ is denoted hf ; giK hK 1=2 f ; K 1=2 gi where f , g 2 HðKÞ: Since the inverse of K is not bounded, the regularized version of the inverse, involving a penalizing term aT , is considered. Namely, the operator K is replaced by some nearby operator that has a bounded inverse. For aT 40, the equation ðK 2 þ aT IÞg ¼ Kf

(8) 2

has a unique stable solution for each f 2 L ðpÞ. The Tikhonov regularized inverse to K is given by ðK aT Þ1 ¼ ðK 2 þ aT IÞ1 K. In order to implement the C-GMM estimator with the optimal weighting operator, we have to estimate K; which can be done via a sequence of bounded operators K T approaching K as the sample size grows because K is a Hilbert–Schmidt operator (see Lemma 3.1). We postpone the explicit construction of K T until the next subsection and establish ﬁrst the asymptotic properties of the optimal C-GMM operator for a given K T . Proposition 3.2. Assume the regularity conditions A.1–A.5 hold. Let K T denote a consistent a estimator of K that satisfies kK T Kk ¼ Op ðT a Þ for some aX0 and ðK TT Þ1 ¼ ðK 2T þ 1 1 aT IÞ K T denote the regularized estimator of K . The optimal GMM estimator of y is obtained by a y^ T ¼ arg min kðK TT Þ1=2 h^T ðyÞk

(9)

y2Y P ^yT ! y0

and satisfies and pﬃﬃﬃﬃ L T ðy^ T y0 Þ ! Nð0; ðhEy0 ðry hÞ; Ey0 ðry hÞ0 iK Þ1 Þ

(10)

as T and T a aT go to infinity and aT goes to zero.9 pﬃﬃﬃﬃ A simple estimator of the asymptotic variance of T ðy^ T y0 Þ will be discussed in Section 3.3. Proposition 3.2 gives a rate of convergence of aT but does not indicate how to choose aT in practice. Recall that the estimator will be consistent for any aT 40 but its variance will be the smallest for the aT decreasing to zero at the right rate. Simulations in 9

Let y ¼ ðy1 ; . . . ; yq Þ0 . By a slight abuse of notation, hEy0 ðry hÞ; Ey0 ðry hÞ0 iK in (10) denotes the q q-matrix with ði; jÞ element hEy0 ðryi hÞ; Ey0 ðryj hÞiK .

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

536

Carrasco and Florens (2002) and in this paper (see Section 6) show that the estimator is not very sensitive to the choice of aT . Of course a data-driven selection method of aT would be preferable. Ideally, aT should a be selected so that it minimizes the mean-square error (MSE) of y^ T . Let y^ T be the C-GMM estimator for a given a. Then we look for aT such that a

aT ¼ arg min E½ðy^ T y0 Þ2 . a

There are two ways to estimate the unknown MSE. The ﬁrst method consists in computing the MSE analytically using a second order expansion as in Donald and Newey (2001). This is the approach taken in Carrasco and Florens (2002) in an iid context and may be very tedious to compute in time-series. The second approach consists in approximating the MSE by block bootstrap along the line of Hall and Horowitz (1996). This second approach avoids the analytical derivation and is easier to implement. 3.2. Convergence rate of the estimator of the covariance operator Note that the covariance operator deﬁned in (7) is an integral operator that can be written as Z Kf ðt1 Þ ¼ kðt1 ; t2 Þf ðt2 Þpðt2 Þ dt2 (11) with kðt1 ; t2 Þ ¼

1 X

Ey0 ðht ðt1 ; y0 Þhtj ðt2 ; y0 ÞÞ.

(12)

j¼1

The function k is called the kernel of the integral operator K. We are interested in estimating the operator K. There are two cases of interest. In the ﬁrst case, fht g are martingale difference sequences of the form (5). Then the kernel of K is particularly simple and can be estimated via T 1 1 1 X k^T ðt1 ; t2 Þ ¼ ht ðt1 ; y^ T Þht ðt2 ; y^ T Þ T t¼1

(13)

1 given the ﬁrst step estimator y^ T : The resulting estimator will satisfy kK T Kk ¼ Op ðT 1=2 Þ. In the second case, moment conditions are based on the CF of Y t . Typically, we have

ht ðt; yÞ ¼ etY t cy ðtÞ. Moment functions of this type enter in a general class where ht ðt; yÞ ¼ Wðt; Y t Þ Ey ½Wðt; Y t Þ,

(14)

where W is an arbitrary function. To estimate K, we use a kernel estimator of the type 1 studied by Andrews (1991) but we do not need a ﬁrst step estimator y^ T because Ey ½Wðt; Y t Þ P can be estimated by a sample mean WT ðtÞ ¼ ð1=TÞ Tt¼1 Wðt; Y t Þ (Parzen, 1957).

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

537

We deﬁne k^T ðt1 ; t2 Þ ¼

T1 X T j ^ GT ðjÞ o T q j¼Tþ1 ST

(15)

with 8 P < 1 Tt¼jþ1 ðWðt1 ; Y t Þ WT ðt1 ÞÞðWðt2 ; Y tj Þ WT ðt2 ÞÞ; ^ T ðjÞ ¼ T P G : T1 Tt¼jþ1 ðWðt1 ; Y tþj Þ WT ðt1 ÞÞðWðt2 ; Y t Þ WT ðt2 ÞÞ;

jX0; jo0;

(16)

where o is a kernel and S T is a bandwidth that will be allowed to diverge at a certain rate. The kernel o is required to satisfy the regularity conditions A.6(i), which are based on the assumptions of Andrews (1991). Denote f ðlÞ the spectral density of Y t at frequency l and f ðnÞ its nth derivative at l ¼ 0. Denote on ¼ ð1=n!Þðdn oðxÞ=dxn Þjx¼0 . Proposition 3.3. (i) Let fht g be a martingale difference sequence and K T be the integral operator with kernel (13) that depends on a first step estimator y^ 1 so that ky^ 1 y0 kE ¼ Op ðT 1=2 Þ where k:kE denotes the Euclidean norm. Suppose that Assumptions A.1–A.5, and A.6(ii) hold. Then kK T Kk ¼ Op ðT 1=2 Þ. (ii) Let ht be given by (14) with jWðt; Y t ÞjoC for some constant C independent of t. Assume that the regularity conditions A.1 to A.6(i) hold and that S 2nþ1 =T ! g 2 ð0; þ1Þ for some T n 2 ð0; þ1Þ for which on ; kf ðnÞ ko1. Then the covariance operator with kernel (15) satisfies kK T Kk ¼ Op ðT n=ð2nþ1Þ Þ. For the Bartlett kernel, n ¼ 1 and for the Parzen, Tuckey-Hanning and QS kernels, n ¼ 2: To obtain the result of Proposition 3.3, we have selected the value of ST that delivers the fastest rate for K T . For this S T , we then select aT such that T a aT goes to inﬁnity according to Proposition 3.2. Instead, we could have chosen S T and aT simultaneously. However, from Proposition 3.2, it seems that the faster the rate for K T , the faster the rate for aT . So our approach seems to guarantee the fastest rate for aT : Note that if fht g are uncorrelated, a ¼ 12. When fht g are correlated, the convergence rate of K T is slower and accordingly the rate of convergence of aT to zero is slower. 3.3. Simplified computation of the C-GMM estimator Carrasco and Florens (2000) propose to write the objective function in terms of the a eigenvalues and eigenfunctions of the operator K TT . The computation of eigenvalues and eigenfunctions can be burdensome, particularly in large samples. We propose here a simple expression of the objective function in terms of vectors and matrices. Note that k^T is a degenerate kernel that can be rewritten as k^T ðt1 ; t2 Þ ¼

T 1 1 1 X ht ðt1 ; y^ T ÞUht ðt2 ; y^ T Þ, T q t¼1

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

538

where 1 1 Uht ðt; y^ T Þ ¼ oð0Þht ðt; y^ T Þ þ

T X j¼1

1 1 j o ðhtj ðt; y^ T Þ þ htþj ðt; y^ T ÞÞ ST

1

using the convention that ht ðt; y^ T Þ ¼ 0 if tp0 or t4T. Proposition 3.4. Solving (9) is equivalent to solving min w0 ðyÞ½aT I T þ C 2 1 vðyÞ,

(17)

y

where C is a T T-matrix with ðt; lÞ element ctl =ðT qÞ, t; l ¼ 1; . . . ; T, I T is the T T identity matrix, v ¼ ½v1 ; . . . ; vT 0 and w ¼ ½w1 ; . . . ; wT 0 with Z 1 vt ðyÞ ¼ Uht ðt; y^ T Þh^T ðt; yÞpðtÞ dt, 1

wt ðyÞ ¼ hht ðt; y^ T Þ; h^T ðt; yÞi, Z 1 1 ctl ¼ Uht ðt; y^ T Þhl ðt; y^ T ÞpðtÞ dt. Note that in the case where the fht g are uncorrelated, the former formulas simplify: 1 1 Uht ¼ h¯ t , vt ¼ wt , ctl ¼ hhl ðt; y^ T Þ; ht ðt; y^ T Þi. pﬃﬃﬃﬃ Similarly, an estimator of the asymptotic variance of T ðy^ T y0 Þ given in (10) can be computed in a simple way. 3=4

Proposition 3.5. Suppose that the assumptions of Proposition 3.3 hold and T, T n=ð2nþ1Þ aT go to infinity and aT goes to zero. Then a consistent estimator of the q q-matrix hEy0 ðry hÞ; Ey0 ðry hÞiK is given by a

hry h^T ðy^ T Þ; ðK TT Þ1 ry h^T ðy^ T Þi ¼

1 w0 ðy^ T Þ½aT I T þ C 2 1 vðy^ T Þ, ðT qÞ

where C is the T T-matrix defined in Proposition 3.4, I T is the T T identity matrix, v ¼ ½v1 ; . . . ; vT 0 and w ¼ ½w1 ; . . . ; wT 0 are T q-matrices with ðt; jÞ element Z 1 ðvt ðyÞÞj ¼ Uht ðt; y^ T Þryj h^T ðt; yÞpðtÞ dt, 1 ðwt ðyÞÞj ¼ hht ðt; y^ T Þ; ryj h^T ðt; yÞi.

3.4. Efficiency In Proposition 3.2, we saw that the asymptotic variance of y^ T is ðhEy0 ðry hÞ; E ðry hÞiK Þ1 . Using results on RKHS (see Carrasco and Florens, 2004, and references therein), it is possible to compute this term and hence to establish conditions under which this variance coincides with the Cramer–Rao efﬁciency bound. We consider arbitrary functions hðt; Y t ; y0 Þ that satisfy the identiﬁcation Assumption A.3 and where, as usual, Y t is the ðL þ 1Þ-vector of r.v.: Y t ¼ ðX t ; X tþ1 ; . . . ; X tþL Þ0 . Let L2 ðY t Þ be the set of random variables of the form gðY t Þ with Ey0 ½jgðY t Þj2 o1. It is assumed that hðt; Y t ; y0 Þ y0

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

539

belongs to L2 ðY t Þ. Let S be the set of all random variables that may be written as Pn j¼1 cj hðtj ; Y t ; y0 Þ for arbitrary integer n, real constants c1 ; c2 ; . . . ; cn and points t1 ; . . . ; tn of I: Denote S its closure, S contains all the elements of S and their limits in L2 ðY t Þ-norm. Proposition 3.6. Assume that the results of Proposition 3.1 hold. Then, y^ T is asymptotically as efficient as MLE if and only if ry ln f y ðxtþL jxtþL1 ; . . . ; xt ; yÞjy¼y0 2 S. A proof of this proposition is given in Carrasco and Florens (2004). It states that the GMM estimator is efﬁcient if and only if the score belongs to the span of the moment conditions. This result is close to that of Gallant and Long (1997) who show that if the auxiliary model is rich enough to encompass the DGP, then the EMM estimator is asymptotically efﬁcient. It is important to remark that p does not affect the efﬁciency as long as p40 on Rd . In small samples however, the choice of p might play a role. 4. C-GMM based on the CF This section studies the properties of moment conditions (6) based on the conditional or JCF. The ﬁrst subsection will focus on Markov processes while the second subsection will discuss mainly the non-Markovian case. 4.1. Using the CCF Suppose an econometrician observes realizations of a Markov process X 2 Rp . The CCF of X tþ1 , cy ðsjX t ; yÞ, deﬁned in (1) is assumed to be known. We denote cy ðsjX t ; y0 Þ by cy ðsjX t Þ. Let Y t ¼ ðX t ; X tþ1 Þ0 . The next proposition establishes that the GMM estimator based on a well-chosen double index (DI) moment function achieves ML efﬁciency. Proposition 4.1. Consider hðt; Y t ; yÞ ¼ eirX t ðeisX tþ1 cy ðsjX t ÞÞ,

(18)

0

with t ¼ ðr; sÞ 2 R2p and denote K the covariance operator of fhð:; Y t ; yÞg. Suppose that Assumptions A.2, A.3, A.7, and A.8 hold. Then the optimal GMM estimator based on (18) P satisfies y^ T ! y0 and pﬃﬃﬃﬃ L T ðy^ T y0 Þ ! Nð0; I 1 y0 Þ as T, T 1=2 aT go to infinity and aT goes to zero. I y0 denotes the information matrix. The efﬁciency resulting from moment functions (18) can be proved from Proposition 3.6. Indeed S¯ the closure of the span of fht g includes all functions in L2 ðY t Þ hence it also includes the score function. Alternatively, one can prove this result directly by computing the asymptotic variance of the GMM estimator and comparing it with the information matrix, see Eq. (B.21) in Appendix. The intuition for the efﬁciency result is as follows. For the GMM estimator to be as efﬁcient as the MLE, the moment conditions need to be sufﬁciently rich to permit to recover the score. The DI moment functions with instruments deﬁned in (4) span all functions in L2 ðY t Þ and the unknown score in particular.

ARTICLE IN PRESS 540

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

Notice that since the moment functions are uncorrelated and the optimal instrument is known to have an exponential form, the computation of the terms C and v in the objective function (17) is simpliﬁed and all elements involving the index r can be computed analytically. Therefore, using the DI instrument does not introduce computational complications. We outline these computations here. Let yt ¼ ðxtþ1 ; xt Þ and p^ be the Fourier transform of p deﬁned as Z ^ t ; xtþ1 Þ ¼ eiðrxt þsxtþ1 Þ pðtÞ dt. pðx (19) Taking a product measure on r and s, we have pðtÞ ¼ pðr; sÞ ¼ pr ðrÞps ðsÞ.

(20)

If p is the pdf of the bivariate normal variable y with zero mean and variance S; then ^ pðyÞ ¼ exp½ðy0 Sy=2Þ where S is diagonal. Consider the moments of the type (18). An element of v is computed as follows: Z 1 1 X vt ðyÞ ¼ hðyt ; t; y^ T Þhðyj ; t; yÞpðtÞ dt T j Z 1 X ðeisxtþ1 c ^ 1 ðsjxt ÞÞeirxt ðeisxjþ1 cy ðsjxj ÞÞeirxj pðtÞ dt ¼ yT T j Z 1 X eisðxjþ1 xtþ1 Þ eirðxj xt Þ pðtÞ dt ¼ T j Z 1 X eiðsxjþ1 þrðxj xt ÞÞ c ^ 1 ðsjxt ÞpðtÞ dt yT T j Z 1 X eiðsxtþ1 þrðxj xt ÞÞ cy ðsjxj ÞpðtÞ dt T j Z 1 X c ^ 1 ðsjxt Þcy ðsjxj Þeirðxj xt Þ pðtÞ dt. þ yT T j P ^ j xt ; xjþ1 xtþ1 Þ. Given (20), the other terms The ﬁrst term is equal to ð1=TÞ j pðx involve: Z ^ j xt ; 0Þ. I r eirðxj xt Þ pr ðrÞ dr ¼ pðx (21) Therefore, the second and third terms have the form Z I 1 ¼ I r eisv cy ðsjwÞps ðsÞ ds

(22)

with opposite signs, and the last term is equal to Z I 2 ¼ I r c ^ 1 ðsjxt Þcy ðsjxj Þps ðsÞ ds.

(23)

yT

The remaining integrals, which have to be evaluated numerically, can be characterized as multidimensional integrals over inﬁnite integration regions with a Gaussian weight function p. Evaluation of such integrals represents an important problem in the evaluation of quantum-mechanical matrix elements with Gaussian wave functions in physics. Hence a

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

541

1 MLE CGMM1 CGMM2

0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0

20

40

60

80

100

120

140

Fig. 1. Plot of real parts of integrands for computing the MLE and C-GMM estimators. We illustrate the degree of the numerical effort involved in computing the integrals necessary for the MLE estimation based on the Fourier inverse technique described in Singleton (2001) and the C-GMM estimation. The integrand is computed for the CIR model, studied in Section 6 drt ¼ ð g 0:02491

k

0:00285

rt Þ dt þ s

0:0275

pﬃﬃﬃﬃ rt dW t

and evaluated at the point ðrtþ1 ; rt Þ ¼ ðg=k; 0:5 g=kÞ ¼ ð8:74; 4:37Þ. CGMM1 (CGMM2) denotes the integrand I 1 in (22) (I 2 in (23)).

plethora of fast and accurate numerical methods have been developed, see e.g. Genz and Keister (1996). Note that integral I 1 in (22) evaluated at ðv; wÞ ¼ ðxtþ1 ; xt Þ looks very similar to the Fourier inverse of the CF used in Singleton (2001, Eq. (14)) to construct conditional density for MLE estimation. Presence of the density p turns out to be critical in the simpliﬁcation of the numerical integration task. Fig. 1 compares the integrand used in Singleton with I 1 and I 2 . It is clear that p dampens off all the oscillating behavior of the integrand needed for MLE. 1 The elements of the matrix C can be similarly computed by replacing y by y^ T . 4.2. Using the JCF Many important models in ﬁnance involve latent factors, the most prominent example being the stochastic volatility (SV) model. In this case, the full system can be described by a Markov vector ðX t ; Xt Þ0 consisting of observable and latent components. As a result, X t is most likely not Markov.10 10

Florens et al. (1993) give necessary and sufﬁcient conditions for the marginal of a jointly Markov process to be itself Markov.

ARTICLE IN PRESS 542

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

For non-Markovian processes, the CCF is usually unknown and difﬁcult to estimate.11 On the other hand, the JCF, if not known, can be computed by simulations.12; 13 Denote the JCF as 0

cLy ðtÞ ¼ Ey ðeis Y t Þ,

(24) 0

0

where s ¼ ðt0 ; t1 ; . . . ; tL Þ , and Y t ¼ ðX t ; X tþ1 ; . . . ; X tþL Þ . Feuerverger (1990) has considered this problem. His estimator is the solution to Z ðcLy ðtÞ cLT ðtÞÞ$ðtÞ dt ¼ 0.

(25)

where cLT ðtÞ denotes the empirical JCF. For a special weighting function $, which is very similar to (2), Feuerverger shows that the estimator is as efﬁcient as the estimator which solves T 1 X ry ln f y ðX tþL jX tþL1 ; . . . ; X t ; yÞ ¼ 0, T t¼1

(26)

where f y ðX tþL jX tþL1 ; . . . ; X t Þ is the true distribution of X tþL conditional on X t ; . . . ; X tþL1 : This result holds even if the process X t is not Markovian of order L (or less). If X t is Markovian of order L then the variance of the resulting estimator is I 1 y ðLÞ with I y ðLÞ ¼ E y ðry ln f y ðX tþL jX tþL1 ; . . . ; X t ; yÞ2 Þ

(27)

which is the Crame´r–Rao efﬁciency bound. If X t is not Markovian of order L then the variance of the estimator has the usual sandwich form because ry ln f y ðX tþL jX tþL1 ; . . . ; X t ; y0 Þ is not a martingale difference sequence with respect to fX tþL ; . . . ; X 1 g. This variance differs from I 1 y ðLÞ and is greater than the Crame´r–Rao efﬁciency bound. Note that (26) should not be confused with quasi-maximum likelihood estimation because f y ðX tþL jX tþL1 ; . . . ; X t ; yÞ, is the exact distribution conditional on a restricted information set. Feuerverger (1990) notes that the estimator based on the JCF can be made arbitrarily efﬁcient provided that ‘‘L (ﬁxed) is sufﬁciently large’’ although no proof is provided. This argument is clearly valid when the process is Markovian of order L. However, in the nonMarkovian case, the only feasible way to achieve the efﬁciency would be to let L go to inﬁnity with the sample size at a certain (slow) rate, the question of the optimal speed of convergence has to the best of our knowledge not been addressed in the literature. The implementation of such approach might be problematic since for L too large, the lack of data to estimate consistently the CF might result in an y^ T with undesirable properties. The approach of Feuerverger based on the JCF of basically the full vector ðX 1 ; X 2 ; . . . ; X T Þ is not realistic because only one observation of this vector is available. Instead, we can avoid using the unknown instrument A in (25) by considering a moment 11

Bates (2003) provides an elegant way to compute conditional likelihood exploiting the fact that analytical form of afﬁne CCF allows for ﬁltering in the frequency domain. However, it appears that his method is limited to cases where dimðX t Þ ¼ dimðXt Þ ¼ 1 due to computational burdens. 12 Jiang and Knight (2002) discuss examples of diffusion models for which JCF is available in analytical form. Yu (2001) derives JCF of the Merton model generalization to self-exciting jump component. 13 Simulations are discussed in Section 5.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

543

condition based on the JCF of Y t : hðt; Y t ; yÞ ¼ eitY t cLy ðtÞ.

(28)

for some small L ¼ 0; 1; 2; . . .14 Assume that the JCF is sufﬁcient to identify the parameters. Now the moments hðt; Y t ; y0 Þ are not a martingale difference sequence (even if X t is Markovian) and the kernel of K is given by kðt1 ; t2 Þ ¼

1 X

Ey0 ½hðt1 ; Y t ; y0 Þhðt2 ; Y tj ; y0 Þ.

j¼1

When X t is Markov of order L, the optimal GMM estimator is efﬁcient as stated below. Proposition 4.2. Assume that X t is Markov of order L and that the assumptions of Proposition 3.3 hold and T, T n=ð2nþ1Þ aT go to infinity and aT goes to zero. Then the optimal GMM estimator using the moments (28) is as efficient as the MLE. As the closure of the span of fht g contains the score ry ln f y ðX tþL jX tþL1 ; . . . ; X t ; y0 Þ, the efﬁciency follows from Proposition 3.6. Note that if X t is Markov, it makes more sense to use moment conditions based on the CCF because the resulting estimator, while being efﬁcient, is easier to implement (as fht g are m.d.s.). If X t is not Markov, the GMM–JCF estimator will not be efﬁcient. However, it might still have some good properties if the temporal dependence dies out quickly. As the computation of the optimal K T may be burdensome (it involves two smoothing parameters S T and aT ), one may decide to use a suboptimal weighting operator obtained by inverting the covariance operator without the autocorrelations. One interesting question is then: What is the resulting loss of efﬁciency? We can answer this question only partially because we are not able to compute the variance of the optimal GMM–JCF estimator when X t is not Markov. However, we have a full characterization of the variance of the suboptimal GMM–JCF estimator. Assume that one ignores the autocorrelations and uses as weighting operator the inverse e associated with the kernel: of the operator K e 1 ; t2 Þ ¼ Ey0 ½hðt1 ; Y t ; y0 Þhðt2 ; Y t ; y0 Þ. kðt

(29)

Proposition 4.3. Assume that the assumptions of Proposition 4.2 hold. The asymptotic variance of the suboptimal GMM– JCF estimator y^ T using (28) and (29) is the same as that of the estimator e yT which is the solution of 1 X ry ln f y ðY t ; yÞ ¼ 0, (30) T t where ln f y ðY t ; yÞ is the exact joint distribution of Y t . Since using the efﬁcient weighting matrix should result in a gain of efﬁciency, the asymptotic variance of e yT (given in Appendix B) can be considered as an upper bound for the variance of the estimator obtained by using the optimal weighting operator that is K 1 : To illustrate the results of Proposition 4.3, consider ﬁrst the case where fX t g is iid 14

Jiang and Knight (2002), in a particular case of an afﬁne stochastic volatility model, arbitrary base the instrument m on the normal density and experiment with values of L from 1 to 5.

ARTICLE IN PRESS 544

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

and L ¼ 1. Then solving (30) is basically (for T large) equivalent to solving 1 X ry ln f y ðX t ; yÞ ¼ 0 2 T t so that the resulting estimator y^ T is efﬁcient. Now turn to the case where fX t g is Markov of order 1 and again L ¼ 1 then (30) is equivalent to 1 X 1 X ry ln f y ðX t jX t1 ; yÞ þ ry ln f y ðX t1 ; yÞ ¼ 0 T t T t which will not deliver an efﬁcient estimator in general. 5. Case where the CF is unknown As pointed out by Singleton (2001), the CF is not always available in closed form, especially if the model involves unobserved latent variables, like in the stochastic volatility model. To deal with this case, he suggests using the simulated method of moments (SMM) along the line of Dufﬁe and Singleton (1993). See also Gourie´roux and Monfort (1996), for a review on SMM. In this section, we consider two ways to estimate the CF via simulations depending on whether the observable variable is Markov or not.

Assume that the observable random variable X t is Markov and that it is possible to draw data from the conditional distribution of X tþ1 given X t . This simulation scheme is called conditional simulation. The CCF is then estimated by an average over the simulated data. Assume now that the observable variable X t is not Markov because of e.g. the presence of unobserved state variables in the model. In this case, it is usually impossible to draw in the conditional distribution. However, it may be possible to simulate a full sequence of random variables that have the same joint distribution as ðX 1 ; . . . ; X T Þ. This simulation scheme is called path simulation. The JCF is then estimated using the simulated data.

The main difference in the properties of the two estimators is that in the ﬁrst case, the estimator is as efﬁcient as the MLE when the number of simulated paths, J, goes to inﬁnity while in the second case, as X t is not Markov, the estimator will never reach the efﬁciency bound even if J goes to inﬁnity. A subsection will be devoted to each case. 5.1. Conditional simulation In this subsection, we assume that X t is a Markov process satisfying X tþ1 ¼ HðX t ; t ; yÞ

(31)

where t is an iid sequence independent of X t with known distribution. For instance, X t may be the solution of a dynamic asset pricing model as that presented by Dufﬁe and Singleton (1993). If X t is a discretely sampled diffusion process then H in (31) can be obtained from an Euler discretization.15 15 However, there is a pitfall with this approach. When the number of discretization intervals per unit of time, N; j is ﬁxed, none of the J simulated paths, X~ t ; is distributed as X t and the estimator y^ T is biased. Broze et al. (1998) document the discretization bias of the Indirect Inference estimator and show that it vanishes when N ! 1 and J

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

545

Moments based on the unknown CCF are used to estimate y. For a given y and y;j conditionally on X t , we generate a sequence fX~ tþ1jt ; j ¼ 1; 2; . . . ; Jg from y;j X~ tþ1jt ¼ HðX t ; ~j;tþ1 ; yÞ, y;j

where f~j;t gj;t are identically and independently distributed as ft g. Note that fX~ tþ1jt gj are iid conditionally on X t and distributed as X tþ1 jX t when y ¼ y0 . The moment conditions become ! T J y;j 1 X 1X J isX~ irX t isX tþ1 ~ tþ1jt hT ðt; yÞ ¼ e e e , T t¼1 J j¼1 where t ¼ ðr; sÞ. To facilitate the discussion, we introduce the following notations: Y t ¼ ðX t ; X tþ1 Þ0 ht ðt; yÞ ¼ eirX t ½eisX tþ1 cy ðsjX t Þ y;j

~ t ; X~ y;j ; tÞ ¼ eirX t ½eisX tþ1 eisX~ tþ1jt hðY tþ1jt h~Jt ðt; yÞ

! J J X ~ y;j 1X 1 y;j is X irX isX t e ~ t ; X~ tþ1 hðY ¼ e tþ1jt . tþ1jt ; tÞ ¼ e J j¼1 J j¼1

The resulting fh~Jt ðt; y0 Þg are martingale difference sequences with respect to fX t ; X t1 ; . . . ; X 1 g and therefore are uncorrelated. Moreover, ~ t ; X~ y;j ; tÞjY t ¼ ht ðt; yÞ. Ey ½hðY tþ1jt Let K be the covariance operator associated with the kernel kðt1 ; t2 Þ ¼ Ey0 ½ht ðt1 ; y0 Þht ðt2 ; y0 Þ

(32)

and let U be the operator with kernel uðt1 ; t2 Þ ¼ Ey0 ½ðh~Jt ht Þðt1 ; y0 Þðh~Jt ht Þðt2 ; y0 Þ. a

~ The Denote K~ the covariance operator of fh~Jt g. Let K~ TT be the regularized estimator of K: J GMM estimator associated with the moments h~ is deﬁned as J y~ T ¼ arg min kh~T ð:; yÞk2K~ aT . y

T

Now we can state the efﬁciency result: Proposition 5.1. Suppose that Assumptions A.2, A.3, A.7, A.8(i), A.9, and A.10 hold for h~Jt P and a fixed J. We have y~ T ! y0 and pﬃﬃﬃﬃ L T ðy~ T y0 Þ ! Nð0; ðhEy0 ð5y hÞ; Ey0 ð5y hÞ0 iK~ Þ1 Þ (footnote continued) is ﬁxed. In a recent paper, Detemple et al. (2002) study estimators of the conditional expectation of diffusions. They show that if J is allowed to diverge too fast relative to N; then the bias of their estimator blows up. The same is likely to be true here. However as there is no limitation on how ﬁne we can discretize (besides the computer precision), we assume that N is chosen sufﬁciently large for the discretization bias to vanish.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

546

as T and T 1=2 aT go to infinity and aT goes to zero. Moreover, K~ ¼ K þ ð1=JÞU and we have the inequality ðhEy0 ð5y hÞ; Ey0 ð5y hÞ0 iðKþð1=JÞUÞ ÞpðhEy0 ð5y hÞ; Ey0 ð5y hÞ0 iK Þ.

(33)

For J large, the SMM estimator will be as efﬁcient as the GMM-CCF estimator which itself has been shown to reach the Crame´r–Rao Efﬁciency bound because we have hEy0 ð5y hÞ; Ey0 ð5y hÞ0 iK ¼ I y0 . 5.2. Path simulation y y Assume that one can generate a sequence of r.v. ðX~ 1 ; . . . ; X~ JðTÞ Þ such that the joint y y distribution of ðX~ 1 ; . . . ; X~ JðTÞ Þ given y and conditional on a starting value X~ 0 ¼ x0 is the same as that of ðX 1 ; . . . ; X JðTÞ Þ given y and a starting value X 0 ¼ x0 : This simulation scheme advocated by Dufﬁe and Singleton (1993) is typically used when X t is the marginal of a Markov process Z t . For instance, Zt ¼ ðX t ; X nt Þ where X nt is a latent variable, e.g. the volatility in a stochastic volatility model and X t only is observable. In such cases, it is usually unknown how to draw from the conditional distribution of X t . Moreover, even though the full system Zt is Markov, X t itself is usually not Markov. Therefore, there is no hope to reach the Crame´r–Rao efﬁciency bound using the JCF when L is ﬁxed, as discussed in Section 4.2. We brieﬂy explain how one can implement a path simulation. Assume, for instance, that Z t is the solution of the recursion (31). For a given y, we generate a sequence fZ yj ; j ¼ 1; 2; . . . ; JðTÞg from

Z yjþ1 ¼ HðZyj ; ~jþ1 ; yÞ, Z y0 ¼ z0 ,

ð34Þ

where f~j g are identically and independently distributed as ft g, z0 is some arbitrary starting y value, and the number of simulations JðTÞ goes to inﬁnity with T. A simulator, X~ j , of X is the y ﬁrst component of Zj . y Contrary to the simulation scheme in the previous subsection, the sequence fX~ j g is completely independent of the observations fX t g. Note that, as the starting value x0 is not y drawn from the stationary distribution of X t , the sequence fX~ j g is in general not y stationary. We assume that X t and consequently fX~ j g are b-mixing with exponential y decay, which guarantees that X~ j becomes stationary exponentially fast. Hence the initial starting value will not affect the distribution of our estimator. The JCF of Y t ¼ ðX t ; X tþ1 ; . . . ; X tþL Þ0 , as deﬁned in (24), is assumed to be unknown and y y will be estimated via simulations. Let Y~ j ¼ ðX~ j ; . . . ; X~ jþL Þ0 . The estimation procedure is based on JðTÞ T T 1X 1 X itY~ j 1 X h~T ðt; yÞ ¼ h~t ðt; yÞ. eitY t e T t¼1 JðTÞ j¼1 T t¼1

If cLy were known, the following moment conditions would be used hT ðt; yÞ ¼

T T 1 X 1 X eitY t cLy ðtÞ ht ðt; yÞ. T t¼1 T t¼1

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

547

Note that fht ðt; yÞg are not a martingale difference sequence and are autocorrelated. Therefore, K, the covariance operator associated with fhT ðt; yÞg, has a more complicated expression than in the previous subsection: kðt1 ; t2 Þ ¼

1 X

0

0

Ey0 ½ðeit1 Y t cLy ðt1 ÞÞðeit2 Y ti cLy ðt2 ÞÞ.

i¼1

We estimate K using the kernel estimator K T described in (15) and (16) where cLy ðt1 Þ is a estimated using the observations Y t . Let K TT be the regularized version of K T . The GMM estimator associated with moments h~ is deﬁned as y~ T ¼ arg min kh~T ð:; yÞk2K aT . y

T

y

y

Note that X~ jþ1 depends on y through the past history of fX~ j g. Sufﬁcient conditions for the uniform weak law of large numbers of h~T ð:; yÞ are discussed in Dufﬁe and Singleton (1993). Let T=JðTÞ converge to z as T goes to inﬁnity. Then, under the additional mixing property of X t (Assumption A.11), we have the following result: Proposition 5.2. Suppose that Assumptions A.2–A.6(i) (for h~t replacing ht and Ey0 denotes the expectation with respect to the stationary distribution of Y t ), and A.11 hold. Let K T be the kernel estimator of K with kernel o and bandwidth ST satisfying the conditions of P Proposition 3.3(ii). Then, y~ T ! y0 and pﬃﬃﬃﬃ L T ðy~ T y0 Þ ! Nð0; ð1 þ zÞðhEy0 ð5y hÞ; Ey0 ð5y hÞ0 iK Þ1 Þ as T and T n=ð2nþ1Þ aT go to infinity and aT goes to zero. It should be noted that the variance of y~ T can be made as close as possible to that of y^ T in Proposition 3.2 by letting T=JðTÞ go to 0. Because of the autocorrelations, the estimation of the optimal weighting operator K is burdensome. To simplify this computation we could use the covariance operator that ignores the autocorrelations but the resulting estimator would be less efﬁcient. Its variance is given by Proposition 4.3 for the non-simulated case. The variance of the C-SMM estimator is again equal to ð1 þ zÞ times the variance obtained in the non-simulated context. 6. Monte-Carlo study In this section we evaluate the performance of the CF-based estimators via Monte-Carlo analysis. For this purpose we consider an example of the CIR, or square-root, interest rate model from ﬁnancial economics. The conditional CF is available in closed-form for this model. We compare the performance of two CF-based estimators—one is using the SI instrument, and the other is using the DI instrument—with that of MLE, QMLE, EMM.16 The CIR square-root process pﬃﬃﬃﬃ (35) drt ¼ ðg krt Þ dt þ s rt dW t 16

We are grateful to Ken Singleton for providing his code for the approximately efﬁcient GMM-CCF estimator based on the SI instrument.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

548

has the following CCF (see e.g. Singleton, 2001): 2 it 2g=s itek cðtjrt Þ ¼ 1 rt , exp c 1 it=c 2k . c¼ 2 s ð1 ek Þ

ð36Þ

We assume that k, g and s are all strictly positive and s2 p 2g. Under these conditions, the square root process is known to have a unique fundamental solution and its marginal density is Gamma and its transition density is a type I Bessel function distribution or noncentral w2 with a fractional order (see e.g. Cox et al., 1985). The following lemma, proved in the Appendix B, guarantees that the assumptions needed for the consistency and asymptotic normality (see Proposition 4.1) of the C-GMM estimator hold. Lemma 6.1. (1) The process frt g is b-mixing with geometric decay and therefore is a-mixing with geometric decay. (2) Assumptions A7 and A8 are satisfied. The simulation design is identical to Zhou (2001). This is done on purpose as it allows us to compare our results with the MLE, QMLE and EMM results reported in Zhou (2001). We consider a sample size T ¼ 500 with a weekly sampling frequency in mind. The parameter estimates are obtained from Gallant and Tauchen (1998). For the CF-based estimator based on the continuum of moment conditions, which we term C-GMM-DI, we have experimented with different values of the penalization term aT ¼ 0:05; 0:1; 0:2; and different volatility values of the Gaussian integrating density pðtÞ: we tried the values of 1 (standard normal) and the inverse of the standard deviation of the data. The standard normal density produces slightly better results. For the Singleton’s approximately efﬁcient estimator, which we term GMM-SI, we have experimented with different values of ðM; dÞ which control the number of grid points and the distance between the grid points respectively. We have considered the following pairs ð3; 0:5Þ; ð6; 1Þ; ð6; 0:75Þ; ð9; 1Þ. We found the combination ð6; 1Þ to be the most successful as coarser grid did not contain enough information about the distribution, and the ﬁner grid led to too many moment conditions generating an ill-behaved weighting matrix. For brevity, Table 1 reports the C-GMM-DI results only for aT ¼ 0:02, and for the standard normal integrating density pðtÞ; and the GMM-SI results only for ðM; dÞ ¼ ð6; 1Þ. We computed the results for 1000 Monte-Carlo paths.17 Results for other conﬁgurations are available upon request. We report the Mean Bias, Median Bias and Root Mean Squared Error of the following estimators: MLE, QMLE, EMM, GMM-SI, and C-GMMDI. The ﬁrst three estimators appeared in Zhou (2001) and we report his results only for the purpose of comparison. In terms of bias, the performance of C-GMM-DI and GMM-SI for g and k is comparable to MLE and vastly better than QMLE and EMM. However, performance of both CF-based estimators is worse for s: In terms of RMSE, MLE’s efﬁciency is triple of that of C-GMM-DI for g and k: Both CF-based estimators dominates the other two methods by far, but underperform for s again. 17

The GMM-SI method did not converge in seven cases, i.e., all the results are based on 993 paths.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

549

Table 1 Monte Carlo comparison of estimation methods based on the CIR model of interest rates True value

Mean bias

Median bias

RMSE

C-GMM-DI g ¼ 0:02491 k ¼ 0:00285 s ¼ 0:02750

0.0090 0.0010 0.0064

0.0040 0.0004 0.0072

0.0374 0.0043 0.0130

GMM-SI g ¼ 0:02491 k ¼ 0:00285 s ¼ 0:02750

0.0172 0.0013 0.0347

0.0235 0.0025 0.0257

0.0453 0.0079 0.0276

MLE g ¼ 0:02491 k ¼ 0:00285 s ¼ 0:02750

0.0123 0.0014 0.0000

0.0119 0.0014 0.0000

0.0125 0.0014 0.0009

QMLE g ¼ 0:02491 k ¼ 0:00285 s ¼ 0:02750

0.0994 0.0113 0.0000

0.0803 0.0091 0.0000

0.1343 0.0153 0.0009

EMM g ¼ 0:02491 k ¼ 0:00285 s ¼ 0:02750

0.0451 0.0054 0.0015

0.0002 0.0000 0.0000

0.1252 0.0149 0.0076

We report three measures of estimation method performance—Mean bias, Median bias, and Root mean squared error (RMSE)—for ﬁve different estimation methods: C-GMM with the optimal DI instrument (C-GMM-DI), CF-based estimator with Singleton’s approximation to the optimal SI instrument (GMM-SI), MLE, QMLE, and EMM (the results for the latter three methods are taken from Zhou, 2001). The simulations are performed based on the CIR model: pﬃﬃﬃﬃ drt ¼ ðg krt Þ dt þ s rt dW t with parameter values from Gallant and Tauchen (1998). All results are based on 1000 replications of samples with 500 observations. We use aT ¼ 0:02 and standard normal integrating density pðtÞ for C-GMM-DI; M ¼ 6 and d ¼ 1 for GMM-SI.

If we compare the two CF-based estimators to each other, the C-GMM-DI fares better. The key improvement of this estimator over GMM-SI is that the distribution of the estimated parameters is far less skewed and leptokurtic. For example, in the case of the parameter k; the skewness and excess kurtosis are 2 and 5, respectively, while in the case of the GMM-SI estimator the numbers are 5 and 38. We illustrate the differences in the distributions by plotting histograms of the square root of the sum of squared errors across all three parameters, qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðg g^ i Þ2 þ ðk k^ i Þ2 þ ðs s^ i Þ2 ; computed along each simulated path i for both methods in Fig. 2. The rationale for combining errors across parameters is that one method could be claimed more efﬁcient than the other only if the overall error is smaller. We observe the effect similar to the one noted with regard to the parameter k: The GMM-SI estimator tends to produce more extreme errors than the C-GMM-DI one does.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

550

C-GMM-DI 832

80

79

60 40 33 20

20

15

(a)

0 0

0.05

0.1

0.15

0.2

4

9 0.25

0.3

8 0.35

0.4

GMM-SI 826

80 60 40

42

38

36 26

20

0

5

13

(b)

0

0.05

0.1

0.15

0.2

7 0.25

0.3

0.35

0.4

Fig. 2. Histogram of the estimation errors of the C-GMM-DI and GMM-SI estimators in the case of the CIR model. We compute, for each simulation path i, the square root of the sum of squared errors across all three qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ parameters, ðg g^ i Þ2 þ ðk k^ i Þ2 þ ðs s^ i Þ2 . The plot shows the histogram of these errors. We cut off the ﬁrst bin at 90, so that other bins could be seen clearly. The count corresponding to each error size category is reported in the corresponding bins.

7. Conclusion This paper showed how to construct maximum likelihood efﬁcient estimators in settings where the maximum likelihood estimation itself is not feasible. The solution is to use GMM and to select moment functions, which are based on the CF, and optimal instruments, which form a basis spanning the unknown likelihood score. The efﬁciency is achieved by using the whole continuum of possible moment conditions resulting from this approach. We provide practical results allowing to construct such an estimator as well as auxiliary results pertaining to the cases when data are not Markov (estimation based on the JCF) and when CF are not available in analytical form (simulated method of moments estimation). Our Monte-Carlo study shows that the method indeed performs on par with MLE, and fares better than other methods. The methodology is applicable to estimation of a wide range of non-linear time series models. It has particular relevance for empirical work in ﬁnance. Asset pricing models are frequently formulated in terms of stochastic differential equations, which have no closed form solution for the conditional density based on discrete-time observations. Motivated by these avenues of application, the future work will have to reﬁne our results on estimation of non-Markovian processes and latent states as well as develop tests in the framework of CF based continuum of moment conditions.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

551

Acknowledgments We would like to thank Yacine Aı¨ t-Sahalia, Lars Hansen, Mike Johannes, Nour Meddahi, Antonio Mele, Jun Pan, Benoit Perron, Eric Renault, Tano Santos, Ken Singleton, the conference and seminar participants at the Canadian Econometric Study Group in Quebec city, CIREQ-CIRANO-MITACS Conference on Univariate and Multivariate Models for Asset Pricing, Econometric Society North American winter meeting in New Orleans, Western Finance Association in Tucson, Chicago, Princeton, ITAM, Michigan State, Montre´al, and USC for comments. We are grateful to the editor, Ronald Gallant, and to anonymous referees for useful comments and to Ken Singleton for the code implementing the approximately efﬁcient estimator. Ruslan Bikbov and Jeremy Petranka have provided outstanding research assistance. Carrasco gratefully acknowledges ﬁnancial support from the National Science Foundation, Grant SES-0211418. Appendix A. Regularity conditions Assumption A.1. The stochastic process X t is a p 1-vector of random variables. X t is P 2 stationary and a-mixing with coefﬁcients aj that satisfy 1 j j¼1 aj o1. The distribution of ðX 1 ; X 2 ; X 3 ; . . .Þ is indexed by a ﬁnite dimensional parameter y 2 Y Rq and Y is compact. The condition on the mixing numbers is satisﬁed if X t is a-mixing of size 3.18 Sufﬁcient conditions for r- and b-mixing (and, therefore, a-mixing) of univariate diffusions can be found in Chen et al. (1999). For subordinated diffusions, they can be found in Carrasco et al. (1999) with many examples. The condition in Assumption A.1 is relatively weak and is expected to be satisﬁed for a large class of processes. The following assumption introduces the Hilbert space of reference. Assumption A.2. p is the pdf of a distribution that is absolutely continuous with respect to Lebesgue measure on Rd and admits all its moments. pðtÞ40 for all t 2 Rd . L2 ðpÞ is the Hilbert space of complex-valued functions that are square integrable with respect to p: Z L2 ðpÞ ¼ g : Rd ! C jgðtÞj2 pðtÞ dto1 2 Denote h:; R :i and k:k the inner product and the norm deﬁned on L ðpÞ. The inner product is hf ; gi ¼ f ðtÞgðtÞpðtÞ dt where gðtÞ denotes the complex conjugate of gðtÞ. If f ¼ ðf 1 ; . . . ; f m Þ0 and g ¼ ðg1 ; . . . ; gm Þ0R are vectors of functions of L2 ðpÞ, we denote hf ; g0 i the m m-matrix with (i; jÞ element f i ðtÞgj ðtÞpðtÞ dt.

We also have to deﬁne a Hilbert-space analog of a random variable: Deﬁnition A.1. An L2 ðpÞ-valued random element g has a Gaussian distribution on L2 ðpÞ with covariance operator K if, for all f 2 L2 ðpÞ, the real-valued random variable hg; f i has a Gaussian distribution on R with variance hKf ; f i.19 18 Note that a size 2 (instead of 3) is sufﬁcient to establish the asymptotic normality of the estimator (Proposition 3.2). However, we need a stronger condition (weaker dependency structure) to show the consistency of covariance operator estimate, K T (Proposition 3.3). 19 Background material on the Hilbert space-valued random elements can be found in, for instance, Chen and White (1998).

ARTICLE IN PRESS 552

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

We assume that the moment conditions (6) identify all the parameters of interest: Assumption A.3. The equation Ey0 ðht ðt; yÞÞ ¼ 0

for all t 2 Rd ; p-almost everywhere

has a unique solution y0 which is an interior point of Y. Ey0 denotes the expectation with respect to the distribution of Y t for y ¼ y0 . fht ðy0 Þg is supposed to satisfy the set of assumptions: Assumption A.4. (i) h is a measurable function from Rd RdimðY Þ Y into C. (ii) ht ðt; yÞ is continuously differentiable with respect to y and ht ðt; yÞ 2 L1 ðp Py Þ where L1 ðp Py Þ is the set of measurable bounded functions of ðt; Y t Þ. pﬃﬃﬃﬃ (iii) supy2Y kh^T ðyÞ Ey0 ht ðyÞk ¼ Op ð1= pTﬃﬃﬃﬃÞ. [email protected] kry h^T ðyÞ Ey0 ry ht ðyÞk ¼ Op ð1= T Þ, where ry denotes the derivative with respect to y and @ is some neighborhood about y0 . Note that we do not try to provide minimal assumptions and that A.4(ii) could certainly be relaxed. However, as our moment conditions are based on the conditional CF and on the joint CF, they will be necessarily bounded. Note that when ht is based on the JCF, then h^T ðyÞ Ey0 ht ðyÞ does not depend on y and ry h^T ðyÞ Ey0 ry ht ðyÞ is identically zero. So that A.4(iii) is easy to check. On the other hand, when ht is based on the CCF, the veriﬁcation is less straightforward and will be undertaken in Proposition A.1. The following assumption about the moment function ht is required for establishing the properties of the optimal C-GMM estimator. We require the null space of K be reduced to zero for the following reason. If NðKÞ is different from f0g, then 0 is an eigenvalue of K and the solution (in f) to the equation Kf ¼ g is not unique, hence K 1 and K 1=2 are not uniquely deﬁned. It would be possible to deﬁne K 1 as the generalized inverse of K, that is K 1 would have for spectrum f1=lj g where lj are the non-zero eigenvalues of K. However, in that case, the null space of K 1=2 (deﬁned as ðK 1 Þ1=2 ) coincides with the null space of K and hence is not empty, as a result y is not identiﬁed. Indeed for y to be identiﬁed, we need kK 1=2 Ey0 ht ðyÞk ¼ 0 ) Ey0 ht ðyÞ ¼ 0 ) y ¼ y0 , which is true if NðK 1=2 Þ ¼ f0g and A3 holds.

pﬃﬃﬃﬃ Assumption A.5. Let K be the asymptotic covariance operator of T h^T ðy0 Þ. (i) The null space of K: NðKÞ ¼ ff 2 L2 ðpÞ : Kf ¼ 0g ¼ f0g. (ii) Ey0 ht ðyÞ 2 HðKÞ for all y 2 Y and (iii) Ey0 ry ht ðyÞ 2 HðKÞ for all y 2 @. The following conditions are used to establish the properties of the covariance estimator. A.6. R(i) The kernel o satisﬁes o : R ! ½1; 1, oð0Þ ¼ 1, oðxÞ ¼ oðxÞ, x 2 R, RAssumption o2 ðxÞ dxo1, joðxÞj dxo1. o is continuous at 0 and at all but a ﬁnite number of points. (ii) Ey0 [email protected] kry ht ð:; yÞko1. The following assumption is needed in Section 4.1 to use the CCF in the Markovian case. Assumption A.7. The stochastic processP X t is a p 1-vector of random variables. X t is 2 stationary, Markov, and a-mixing with 1 j¼1 j aj o1. The conditional pdf of X tþ1 given

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

553

X t , f y ðxtþ1 jxt ; yÞ, is indexed by a parameter y 2 Y Rq and Y is compact. f y ðxtþ1 jxt ; yÞ is continuously differentiable w.r. to y. For brevity, f y ðxtþ1 jxt ; y0 Þ and f y ðxtþ1 jxt ; yÞ are denoted, respectively, f y0 ðxtþ1 jxt Þ and f y ðxtþ1 jxt Þ. Now, we elaborate on the conditions to implement the efﬁcient C-GMM estimator. Some of the assumptions, e.g. Assumption A.5, might seem to be difﬁcult to verify. We can check these conditions using the properties of the RKHS. Below, we give a set of primitive assumptions under which the general Assumptions A.1, A.5, A.4–A.6(ii) are satisﬁed. Assumption A.8. (i) For all y 2 Y, the following inequality holds: 2 !2 3 f y ðrt jrt1 Þ 5 y0 4 E o1 1 f y0 ðrt jrt1 Þ and there exists a neighborhood @ about y0 such that " # 0 y0 r y f y ðxtþ1 jxt Þr y f y ðxtþ1 jxt Þ E o1 f y0 ðxtþ1 jxt Þ2 R for all y 2 @. cy is differentiable and supy2Y jry cy ðsjxt Þj dso1. (ii) cy ðsjX t ; yÞPis twice continuously differentiable in y. Ey0 kry cy ð:jX t ; yÞk2þd o1 for d=ð2þdÞ some d40 and 1 o1. j¼1 aj y0 (iii) E ½ðsupy2Y kry cy ð:jX t ; yÞkÞ2 o1 and Ey0 ½ð[email protected] kryy cy ð:jX t ; yÞkÞ2 o1 where ryy cy denotes the q q matrix of second derivatives of cy . Note that the ﬁrst inequality in A8(i) (which corresponds to A5(ii)) may impose some restrictions on Y as illustrated in Section F3 of the unpublished appendix of Altissimo and Mele (2005). In Lemma 2, we verify that Assumption A7 and A8 are satisﬁed for the CIR model. Proposition A.1. Assumption A.7 implies Assumption A.1. If Assumption A.7 is satisfied and ht is defined by hðt; Y t ; yÞ ¼ eirX t ðeisX tþ1 cy ðsjX t ÞÞ,

(A.1)

with t ¼ ðr; sÞ 2 R2p then Assumption A.8 implies Assumptions A.4, A.5, and A.6(ii). 0

The proof is provided in Appendix B. In the section on the simulated method of moments, our starting point is the following model. Assumption A.9. X t satisﬁes X tþ1 ¼ HðX t ; t ; yÞ for some measurable transition function H : Rp RN Y for some N40. t is an iid sequence of RN -valued random variables independent of X t with a known distribution that does not depend on y. We will need an assumption, which corresponds to Assumption A.8(ii) and (iii) for the particular moments h~Jt used in the conditional simulation case of the simulated method of moments (Section 5.1). Assumption A.10. (i) H is twice continuously differentiable in y.

ARTICLE IN PRESS 554

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

P d=ð2þdÞ (ii) Ey0 E kry HðX t ; t ; yÞkE2þd o1 for some d40 and 1 o1. E denotes the j¼1 aj expectation with respect to the distribution of t . (iii) Ey0 E ½ðsupy2Y kry HðX t ; t ; yÞkE Þ2 o1 and Ey0 E ½ð[email protected] kryy HðX t ; t ; yÞkE Þ2 o1. The following assumption is required for the proof of asymptotic properties of the simulated estimator in case of path simulation (Section 5.2). ~

Assumption A.11. X t is b-mixing with exponential decay and Ey0 supy2Y ½ry eitY j o1. Ey0 denotes the expectation with respect to the stationary distribution of Y t . Appendix B. Proofs The following lemma is used in the proof of Propositions A.1 and 4.3. It gives an expression for the inner product/norm in a RKHS, HðKÞ, that appears in Parzen (1970) and is further discussed in Carrasco and Florens (2004). Let C i ¼ fG : gi ðtÞ ¼ Ey0 ðhðtÞGÞ 8t 2 Rd g. Lemma B.1. Let K be a covariance operator from L2 ðpÞ into L2 ðpÞ with kernel kðt1 ; t2 Þ ¼ Ey0 ðht ðt1 Þht ðt2 ÞÞ. Let g be a L vector of elements of HðKÞ. Then S ¼ hg; g0 iK is the ‘‘smallest’’ L L matrix with ði; jÞ element hgi ; gj iK ¼ Ey0 ðG i G j Þ ~ with elements such that G i 2 C i , i ¼ 1; 2; . . . ; L. That is, for any G~ i 2 C i , the matrix S y0 ~ ~ ~ E ðG i G j Þ satisfies the property that S S is non-negative definite. Proof of Lemma 3.1. To prove this result, we need a functional central limit theorem for weakly dependent process. We use the results of PolitisPand Romano (1994). By 2 Assumptions A.1 and A.4(i), fht g is stationary a-mixing with 1 j¼1 j aj o1. Moreover by Assumption A.4(ii), fht g is bounded with probability one. The result follows directly from Theorem 2.2 of Politis and Romano Note that Politis and Romano require that the Pj (1994). m 2 a coefﬁcient of fht g satisﬁes i aðiÞpKj for all 1pjpT and some mo32 which is i¼1 satisﬁed. Note that K is an integral operator with kernel k deﬁned in Eq. (12). An operator K : L2 ðpÞ ! L2 ðpÞ with kernel k is an operator of Hilbert Schmidt if ZZ jkðt1 ; t2 Þj2 pðt1 Þpðt2 Þ dt1 dt2 o1. As p is a pdf, it is enough to show that kðt1 ; t2 Þo1. As kðt1 ; t2 Þ is the long-run covariance of fht g, it is well-known (see e.g. Politis and Romano, 1994, Remark 2.2) that a sufﬁcient condition for k to be ﬁnite is thatPfht g is bounded with probability one and the acoefﬁcients of fht g are summable i.e. j aðjÞo1. These two conditions are satisﬁed under our assumptions. Hence K is a Hilbert–Schmidt operator. & Proof of Proposition 3.1. The proof of Proposition 3.1(1) is similar to that of Theorem 2 in Carrasco and Florens (2000) and is not repeated here. The optimality argument follows from the proof of Theorem 8 in Carrasco and Florens (2000).

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

555

We need as preliminary result to the proof of Proposition 3.2 the following lemma. It generalizes Theorem 7 of Carrasco and Florens (2000) to the case where K T has typically a slower rate of convergence than T 1=2 . a

Lemma B.2. Assume K T is such that kK T Kk ¼ Op ðT a Þ, ðK TT Þ1 ¼ ðK 2T þ aT IÞ1 K T , and aT goes to zero. We have ! 1 aT 1=2 aT 1=2 ðK Þ k ¼ Op kðK T Þ . 3=4 T a aT Let @ p beﬃﬃﬃﬃa subset of Y (or Y itself). Let f ðyÞ and f T ðyÞ be such that [email protected] kf T ðyÞ f ðyÞk ¼ Op ð1= T Þ. Then, for f ðyÞ 2 HðKÞ for all y 2 @ and [email protected] kf ðyÞko1, we have ! 1 aT 1=2 1=2 f T ðyÞ K f ðyÞk ¼ Op sup kðK T Þ . 3=4 [email protected] T a aT Proof of Lemma B.2. Note that a

1=2

kðK TT Þ1=2 ðK aT Þ1=2 k ¼ kðaT þ K 2T Þ1=2 K T ðaT þ K 2 Þ1=2 K 1=2 k 1=2

¼ kðaT þ K 2T Þ1=2 K T ðaT þ K 2T Þ1=2 K 1=2 k þ kðaT þ K 2T Þ1=2 K 1=2 ðaT þ K 2 Þ1=2 K 1=2 k 1=2

p kðaT þ K 2T Þ1=2 k kK T K 1=2 k |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} pa

1=2 T

ðB:1Þ

¼Op ðT a Þ

þ k½ðaT þ K 2T Þ1=2 ðaT þ K 2 Þ1=2 K 1=2 k.

ðB:2Þ

Using A1=2 B1=2 ¼ A1=2 ½B1=2 A1=2 B1=2 ; we get (B.2) ¼ kðaT þ K 2T Þ1=2 ½ðaT þ K 2T Þ1=2 ðaT þ K 2 Þ1=2 ðaT þ K 2 Þ1=2 K 1=2 k p kðaT þ K 2T Þ1=2 k kðaT þ K 2T Þ1=2 ðaT þ K 2 Þ1=2 k kðaT þ K 2 Þ1=4 k |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} ¼Op ðT a Þ

1=2 T

pa

pa

1=4 T

kðaT þ K 2 Þ1=4 K 1=2 k . |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} p1

3=4

Hence (B.2) ¼ Op ðT a aT Þ. The ﬁrst equality of Lemma B.2 follows from the fact that (B.1) is negligible with respect to (B.2). The second equality can be established from the following decomposition: a

sup kðK TT Þ1=2 f T ðyÞ K 1=2 f ðyÞk [email protected]

p sup kðK aT Þ1=2 f ðyÞ K 1=2 f ðyÞk

ðB:3Þ

[email protected] a

þ sup kðK TT Þ1=2 f ðyÞ ðK aT Þ1=2 f ðyÞk

ðB:4Þ

[email protected]

a

a

þ sup kðK TT Þ1=2 f T ðyÞ ðK TT Þ1=2 f ðyÞk. [email protected]

ðB:5Þ

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

556

From the proof of Theorem 7 of Carrasco and Florens (2000), it follows that kðK aT Þ1=2 f ðyÞ K 1=2 f ðyÞk goes to zero as aT goes to zero. Moreover, from the ﬁrst part of Lemma B.2, we have a

(B.4)pkðK TT Þ1=2 ðK aT Þ1=2 k sup kf ðyÞk [email protected]

¼

3=4 Op ðT a aT Þ,

a

(B.5)pkðK TT Þ1=2 k sup kf T ðyÞ f ðyÞk [email protected]

1=4

¼ Op ðaT

T 1=2 Þ, a

1=4

using the fact that kðK TT Þ1=2 kpkðaT þ K 2 Þ1=4 k kðaT þ K 2 Þ1=4 K 1=2 kpaT follows. &

. The result

Proof of Proposition 3.2. First we prove consistency, second we prove asymptotic normality. Consistency: The consistency follows from Theorem 3.4. of White (1994) under the following three conditions: a (a) QT ðyÞ ¼ kðK TT Þ1=2 h^T ðyÞk2 is a continuous function of y for all ﬁnite T. P (b) QT ðyÞ ! QðyÞ ¼ kðKÞ1=2 Ey0 ht ðyÞk2 uniformly on Y. (c) QðyÞ has a unique maximizer y0 on Y.

We check successively (a), (b), and (c). (a) hbT ðyÞ is continuous in y by Assumption a A.4(ii). For T ﬁnite, ðK TT Þ1=2 is a bounded operator (because aT 40Þ and therefore aT 1=2 ^ 2 hT ðyÞk is a continuous function of y. kðK T Þ 3=4 (b) The uniform convergence as T and T a aT go to inﬁnity follows from A.4 and Lemma B.2. (c) Assumption A.5 implies that K is a positive deﬁnite operator. By the property of the norm, we have kEy0 hðyÞk2K ¼ 0 ) Ey0 hðyÞ ¼ 0 which implies y ¼ y0 by Assumption A.3. Asymptotic normality: Using a Taylor expansion of the ﬁrst order condition hry h^T ðy^ T Þ; h^T ðy^ T ÞiK aT ¼ 0 T

around y0; we obtain pﬃﬃﬃﬃ pﬃﬃﬃﬃ ¯ aT 1 hry h^T ðy^ T Þ; T h^T ðy0 Þi aT T ðy^ T y0 Þ ¼ ½hry h^T ðy^ T Þ; ry h^T ðyÞi K K T

where y¯ is a mean value. We need to establish P

¯ aT !hEy0 ry ht ðy0 Þ; Ey0 ry ht ðy0 ÞiK . N1 hry h^T ðy^ T Þ; ry h^T ðyÞi K T pﬃﬃﬃﬃ L y y ^ ^ ^ N2 hry hT ðyT Þ; T hT ðy0 ÞiK aT ! Nð0; hE 0 ry ht ðy0 Þ; E 0 ry ht ðy0 ÞiK Þ. T

T

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

557

N1 Note that Ey0 ry ht ðyÞ 2 HðKÞ by Assumption A.5. N1 follows directly from P a kðK TT Þ1=2 ry h^T ðy^ T Þ K 1=2 Ey0 ry ht ðy0 Þk ! 0, a ¯ kðK TT Þ1=2 ry h^T ðyÞ

ðB:6Þ

P

K 1=2 Ey0 ry ht ðy0 Þk ! 0.

ðB:7Þ

We prove (B.6). The same proof applies to (B.7). We have a kðK TT Þ1=2 ry h^T ðy^ T Þ K 1=2 Ey0 ry ht ðy0 Þk a pkðK T Þ1=2 ry h^T ðy^ T Þ K 1=2 Ey0 ry ht ðy^ T Þk

ðB:8Þ

E ry ht ðy^ T Þ K

ðB:9Þ

T

þ kK

1=2

y0

1=2

y0

E ry ht ðy0 Þk.

Let @ be some neighborhood of y0 . By Lemma B.2 and Assumption A.5(iii), we have for T sufﬁciently large a

(B.8)p sup kðK TT Þ1=2 ry h^T ðyÞ K 1=2 Ey0 ry ht ðyÞk ¼ Op ðT a aT

3=4

Þ.

[email protected]

And by the continuity in y of Ey0 ry ht ðyÞ (Assumption A.5(ii)) and the consistency of P y^ T , we have (B.9)! 0. N2 We have pﬃﬃﬃﬃ a a hðK TT Þ1=2 ry h^T ðy^ T Þ; ðK TT Þ1=2 T h^T ðy0 Þi pﬃﬃﬃﬃ a a ¼ hðK TT Þ1=2 ry h^T ðy^ T Þ K 1=2 Ey0 ry ht ðy0 Þ; ðK TT Þ1=2 T h^T ðy0 Þi ðB:10Þ aT 1=2 pﬃﬃﬃﬃ ^ 1=2 y0 T hT ðy0 Þi, þ hK E ry ht ðy0 Þ; ðK T Þ ðB:11Þ p ﬃﬃﬃﬃ a a (B.10)pkðK TT Þ1=2 ry h^T ðy^ T Þ K 1=2 Ey0 ry ht ðy0 ÞkkðK TT Þ1=2 kk T h^T ðy0 Þk. Applying again Lemma B.2, we obtain a 3=4 kðK TT Þ1=2 ry h^T ðy^ T Þ K 1=2 Ey0 ry ht ðy0 Þk ¼ Op ð1=ðT a aT ÞÞ, a

1=4

kðK TT Þ1=2 k ¼ Op ð1=aT Þ. The term (B.10) is Op ð1=ðT a aT ÞÞ ¼ op ð1Þ as T a aT goes to inﬁnity by assumption. The term (B.11) can be decomposed as pﬃﬃﬃﬃ a hK 1=2 Ey0 ry ht ðy0 Þ; ðK TT Þ1=2 T h^T ðy0 Þi pﬃﬃﬃﬃ a ¼ hK 1=2 Ey0 ry ht ðy0 Þ; ððK TT Þ1=2 ðK aT Þ1=2 Þ T h^T ðy0 Þi ðB:12Þ pﬃﬃﬃﬃ 1=2 y0 aT 1=2 ^ T hT ðy0 Þi. þ hK E ry ht ðy0 Þ; ðK Þ ðB:13Þ We have a (B.12)pkK 1=2 Ey0 ry ht ðy0 Þk kðK TT Þ1=2

pﬃﬃﬃﬃ ðK aT Þ1=2 k k T h^T ðy0 Þk ¼ Op

!

1 3=4

T a aT

by Lemma B.2. It remains to show that (B.13) is asymptotically normal. Denote ðlj ; fj : j ¼ 1; 2; . . .Þ the eigenvalues and eigenfunctions of K. T X 1 T X 1 1 1 X pﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ hEy0 ry ht ; fj ihht ; fj i pﬃﬃﬃﬃ Z~ Tt , (B.13) ¼ T l2 þ aT T t¼1 t¼1 j¼1 j

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

558

where y0 is dropped to simplify the notation. Z~ Tt is a q-vector. We apply Cramer Wold device A.3.8. of White, 1994) to prove the asymptotic normality of pﬃﬃﬃﬃ (Theorem P ð1= T Þ Tt¼1 Z~ Tt . Let b be a q-vector of constants so that b0 b ¼ 1. Denote Z Tt ¼ b0 Z~ Tt . By Theorem A.3.7. of White (1994), we have T 1 X L Z Tt ! Nð0; 1Þ sT t¼1

if the following assumptions are satisﬁed: (a) Ey0 ðjZ Tt jm ÞpDo1 for some m42; (b) ZTt is near epoch dependent on fV t g of size 1 where fV t g is mixing of size 2m=ðm P 2Þ; 1 (c) s2T varð Tt¼1 ZTt Þ satisﬁes s2 T ¼ OðT Þ. We verify Conditions (a)–(c) successively. (a) is satisﬁed for all m because Z Tt is bounded with probability one. Indeed, we have 1 X

1 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ hb0 Ey0 ry h; fj ihht ; fj i j¼1 l2j þ aT !1=2 !1=2 1 1 X X 1 0 y0 2 2 jhb E ry h; fj ij jhht ; fj ij p 2 j¼1 lj þ aT j¼1

ZTt ¼

by Cauchy–Schwartz inequality. As m2j þ aT Xm2j and 1 X 1 jhb0 Ey0 ry h; fj ij2 ¼ b0 hEy0 ry h; Ey0 ry h0 iK b, 2 l j¼1 j 1 X

jhht ; fj ij2 ¼ kht k2 .

j¼1

We have Z Tt pb0 hEy0 ry h; Ey0 ry hiK bkht k2 pDo1 with probability one. (b) It is easy to verify that Z Tt is near epoch dependent on fht g of arbitrary size. (c) We have ! T X 1 var Z Tt T t¼1 9 8 > > 1 =

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

559

1 X

pﬃﬃﬃﬃ 1 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ jhb0 Ey0 ry h; fj ij2 varðh T h^T ; fj iÞ j¼1 l2j þ aT X 1 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃqﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ hb0 Ey0 ry h; fi ihb0 Ey0 ry h; fj i þ iaj l2i þ aT l2j þ aT pﬃﬃﬃﬃ pﬃﬃﬃﬃ covðh T h^T ; fi i; h T h^T ; fj iÞ.

¼

Using as before l2j þ aT Xl2j , both sums can be bounded by a term that does not depend on T, therefore we may, in passing at the pﬃﬃﬃﬃlimitL as T ! 1, interchange the limit and the summation. By Lemma 3.1, we have T h^T ! Nð0; KÞ and hence pﬃﬃﬃﬃ pﬃﬃﬃﬃ lj if i ¼ j; ^ ^ lim covðh T hT ; fi i; h T hT ; fj iÞ ¼ hKfi ; fj i ¼ T!1 0 otherwise: Therefore T X 1 var Z Tt T t¼1

! !

X 1 jhb0 Ey0 5y h; fj ij2 ¼ b0 hEy0 5y h; Ey0 5y h0 iK b l j j

1 as T ! 1 proving that s2 T ¼ OðT Þ. Hence, we have L

(B.13) ! Nð0; hEy0 5y h; Ey0 5y h0 iK Þ. This completes the proof.

&

Proof of Proposition 3.3. Let kAkHS denote the Hilbert Schmidt norm of the operator A (see Dautray and Lions, 1988, for a deﬁnition and the properties of the Hilbert–Schmidt norm). If kAk denotes the usual operator norm, kAkpkAkHS . We have ZZ kK T Kk2HS ¼ jk^T ðt1 ; t2 Þ kðt1 ; t2 Þj2 pðt1 Þpðt2 Þ dt1 dt2 . (i) Here k^T ðt1 ; t2 Þ depends on a ﬁrst step estimator y^ 1 . Remark that kK T Kk2HS ¼ kk^T ðt1 ; t2 Þ kðt1 ; t2 ÞkL2 L2 . To simplify, we denote gt ðyÞ ¼ ht ðt1 ; yÞht ðt2 ; yÞ. Applying the mean value theorem, we get T T T 1 X 1 X 1 X ~ y^ 1 y0 Þ, k^T ðt1 ; t2 Þ ¼ gt ðy^ 1 Þ ¼ gt ðy0 Þ þ ry gt ðyÞð T t¼1 T t¼1 T t¼1

1 X T y gt ðy0 Þ E 0 ðgt ðy0 ÞÞ kk^T ðt1 ; t2 Þ kðt1 ; t2 ÞkL2 L2 p 2 2 T t¼1 L L 1 X T ~ þ r g ðyÞ ky^ 1 y0 kE , T t¼1 y t 2 2 L L

Lemma 3.1, we apply Politis and Romano where y~ is between y0 and y^ 1 . As inpﬃﬃﬃﬃ P (1994) to establish that the process ð1= T Þ Tt¼1 ½gt ðy0 Þ Ey0 ðgt ðy0 ÞÞ converges weakly

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

560

in L2 ðpÞ L2 ðpÞ and hence 1 X T gt ðy0 Þ Ey0 ðgt ðy0 ÞÞ T t¼1

¼ Op ðT 1=2 Þ.

L2 L2

Moreover, we have 1 X T ~ ry gt ðyÞ T t¼1

p L2 L2

T 1 X P ~ 2 2! kry gt ðyÞk Ey0 kry gt ðy0 ÞkL2 L2 o1 L L T t¼1

by Theorem A.2.2 of White (1994) and Assumption A.6(ii). The result follows. (ii) Now k^T ðt1 ; t2 Þ does not depend on a ﬁrst step estimator. We use the following result. If X T X0 is such that EX T ¼ Oð1Þ then X T ¼ Op ð1Þ. This result is proved in Darolles et al. (2002, footnote 12). We can exchange the order of integration and expectation by Fubini’s theorem to obtain ZZ Ey0 jk^T ðt1 ; t2 Þ kðt1 ; t2 Þj2 pðt1 Þpðt2 Þ dt1 dt2 . Ey0 kK T Kk2HS ¼ We have y Ejk^T ðt1 ; t2 Þ kðt1 ; t2 Þj2 pEy0 jk^T ðt1 ; t2 Þ Ey0 k^T ðt1 ; t2 Þj2 þ jE T0 k^T ðt1 ; t2 Þ kðt1 ; t2 Þj2 .

Parzen (1957) and Andrews (1991) consider kernel estimators of the covariance of realvalued random variables. Here, we have complex-valued ht but their results remain valid. From Parzen (1957, Theorem 6) and Andrews (1991), we have y

lim S nT ðET0 k^T ðt1 ; t2 Þ kðt1 ; t2 ÞÞ ¼ 2pon f ðnÞ ,

T!1

T y0 ^ y E jkT ðt1 ; t2 Þ ET0 k^T ðt1 ; t2 Þj2 ¼ 8p2 f 2 lim T!1 S T

Z

o2 ðxÞ dx.

To complete the proof, we need to be able to exchange the lim and integrals in RR limT!1 Ejk^T ðt1 ; t2 Þ kðt1 ; t2 Þj2 pðt1 Þpðt2 Þ dt1 dt2 . This is possible if Ejk^T ðt1 ; t2 Þ kðt1 ; t2 Þj2 is uniformly bounded in t1 , t2 and T by Theorem 5.4 of Billingsley (1995). The boundedness in t1 , t2 results from the fact that W is bounded. The boundedness in T follows from Andrews (1981, p. 853) using X T ¼ jT=S T ðk^T ðt1 ; t2 Þ kðt1 ; t2 ÞÞ2 j. He shows that supTX1 EX 2T o1 under assumptions that are satisﬁed here. Hence if S 2nþ1 =T ! g, we have T ! Z 2 ðnÞ2 T y0 2 2 2 on f 2 o ðxÞ dx . þ 2f lim E kK T KkHS ¼ 4p T!1 S T g and therefore, Ey0 kK T Kk2HS ¼ OðT 2n=ð2nþ1Þ Þ. This yields the result.

&

Proof of Proposition 3.4. The C-GMM estimator is solution of a a y^ T ¼ arg min kðK TT Þ1=2 hT ðyÞk2 () y^ T ¼ arg min hðK TT Þ1 h^T ð:; yÞ; h^T ð:; yÞi. y

y

(B.14)

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

561

a

Let g ¼ ðK TT Þ1 h^T ðyÞ so that g satisﬁes ðaT I T þ K 2T Þg ¼ K T h^T ðyÞ aT gðtÞ þ

T T X 1 1 X ht ðt; y^ 1 Þctl bl ¼ ht ðt; y^ 1 Þvt ðyÞ 2 T q t¼1 ðT qÞ t;l¼1

ðB:15Þ

with Z bl ¼

Uhl ðt; y^ 1 ÞgðtÞpðtÞ dt.

First, we compute bl ; l ¼ 1; . . . ; T. We premultiply (B.15) by Uhk ðt; y^ 1 ÞpðtÞ and integrate with respect to t to obtain: aT bk þ

T T X 1 1 X ckt ctl bl ¼ ckt vt ðyÞ. 2 T q t¼1 ðT qÞ t;l¼1

(B.16)

Using the matrix notation, (B.16) can be rewritten ½aT I T þ C 2 b ¼ C vðyÞ. where b ¼ ½b1 ; . . . ; bT 0 . Solving in b, we get b ¼ ½aT I T þ C 2 1 C vðyÞ.

(B.17)

Now we want to compute hg; hT ðyÞi that appears in (B.14). To do so, we multiply all terms of (B.15) with h^T ðt; yÞpðtÞ and integrate with respect to t: aT hg; h^T ðyÞi þ

T T X 1 1 X w ðyÞc b ¼ wt ðyÞvt ðyÞ. t tl l T q t¼1 ðT qÞ2 t;l¼1

So that hg; hT ðyÞi ¼

1 ½w0 ðyÞ vðyÞ w0 ðyÞC b aT ðT qÞ

and using (B.17), we obtain hg; hT ðyÞi ¼

1 w0 ðyÞ½I T C½aT I T þ C 2 1 C vðyÞ. aT ðT qÞ

Now we need to show that ½I T C½aT I T þ C 2 1 C ¼ aT ½aT I T þ C 2 1 . ¯ 0 . From Horn and Johnson Note that C is hermitian that is C ¼ C where C denotes C (1985), all hermitian matrices are normal and hence C can be written as C ¼ UDU

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

562

where D is a diagonal matrix and U satisﬁes U ¼ U 1 . We have I T aT ½aT I T þ C 2 1 ¼ I T aT U½aT I T þ D2 1 U

¼ U½U U aT ½aT I T þ D2 1 U

¼ U½I T aT ½aT I T þ D2 1 U

1 2 ¼U ½aT I T þ D I T aT ½aT I T þ D2 1 U

aT ¼ UD2 ½aT I T þ D2 1 U

¼ C½aT I T þ C 2 1 C. This yields the result.

&

Proof of Proposition 3.5. The proof is very similar to that of 3.4 and is not repeated here. The consistency follows from Lemma B.2. & For the proof of Proposition A.1, we need the following result. Lemma B.3. Let nT ðyÞ be a process in L2 ðpÞ. If nT ðyÞ is stochastically equicontinuous20 in the following sense: 840; Z40, 9d40 such that " # lim Pr

T!1

sup

knT ðy1 Þ nT ðy2 Þk4Z o

y1 ;y2 k;y;y2 kod

and knT ðyÞk ¼ Op ð1Þ for all y 2 Y, then sup knT ðyÞk ¼ Op ð1Þ. y2Y

Proof of Lemma B.3. Let 40. There S exists d40 such that, as Y is compact, there is a ﬁnite open covering such that Y ¼ Ji¼1 Yj where Yj are open balls of radius d and center yj . There are Z and T 0 such that for all T4T 0 , we have

Pr sup knT ðyÞk4Z y2Y " # ¼ Pr sup knT ðyÞ nT ðyj Þ þ nT ðyj Þk4Z "

j;y2Yj

#

"

#

p Pr sup knT ðyÞ nT ðyj Þk4Z þ Pr sup knT ðyj Þk4Z j;y2Yj

j

p=2 þ =2 using the stochastic equicontinuity of nT ðyÞ and knT ðyÞk ¼ Op ð1Þ. This completes the proof of the lemma. & Proof of Proposition A.1. Assumption A.7) Assumption A.1 is obvious. We check successively the conditions of Assumption A.4. 20

This is not the standard deﬁnition for stochastic equicontinuity because here nT ðyÞ is a function of t and k k denotes the norm in L2 ðpÞ.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

563

A.4(i) and (ii): jht j ¼ jeiðsxtþ1 þrxt Þ cy ðsjxt Þeirxtþ1 jpjeiðsxtþ1 þrxt Þ j þ jcy ðsjxt Þeirxtþ1 jp2 as jcy ðsjxt Þjp1 for all s. ht is continuously differentiable by A.8(ii). A.4(iii): We want to establish that 1 X T y0 sup pﬃﬃﬃﬃ fht ðyÞ E ht ðyÞg ¼ Op ð1Þ. y2Y T t¼1 pﬃﬃﬃﬃ P Let us denote nT ðyÞ ¼ ð1= T Þ Tt¼1 fht ðyÞ Ey0 ht ðyÞg. The same way as we proved Lemma 3.1, we can prove that nT ðyÞ converges weakly to a Gaussian process with mean zero in L2 ðpÞ. Hence knT ðyÞk ¼ Op ð1Þ for all y 2 Y. By Lemma B.3, it remains to prove the stochastic equicontinuity. We have nT ðy1 Þ nT ðy2 Þ T 1 X ¼ pﬃﬃﬃﬃ feirX t ðcy1 ðsjX t Þ cy2 ðsjX t ÞÞ þ Ey0 ½eirX t ðcy1 ðsjX t Þ cy2 ðsjX t ÞÞg. T t¼1

From van der Vaart (1998, Chapter 19), the equicontinuity follows from jf ðy1 Þ f ðy2 ÞjpBky1 y2 k where f ðyÞ ¼ eirX t cy ðsjX t Þ and under the extra moment condition on Ey0 ðB2 Þ. Using a mean-value theorem on cy , we get B ¼ supy2Y kry cy ð:jX t Þk. Now, we turn to the term involving ry h^T ðyÞ. By Politis and Romano (1994, Theorem 2.3(i)), pﬃﬃﬃﬃ T ðry h^T ðyÞ Ey0 ry ht ðyÞÞ converges weakly to a Gaussian process with mean zero under the Assumptions A.8(ii). Hence kry h^T ðyÞ Ey0 ry ht ðyÞk ¼ Op ðT 1=2 Þ on Y. It remains to establish uniform convergence, which is satisﬁed using Lemma B.2 and condition A.8(iii). A.5(i): Let t1 ¼ ðr1 ; s1 Þ and t2 ¼ ðr2 ; s2 Þ. We have kðt1 ; t2 Þ ¼ Ey0 ½eiðr1 r2 ÞX t fcy ðs1 s2 jX t Þ cy ðs1 jX t Þcy ðs2 jX t Þg. By changing the order of integrations, we have for all t1 ¼ ðr1 ; s1 Þ: ðKjÞðt1 Þ ¼ 0 3

Z Z eir1 x f y ðxÞ eir2 x fcy ðs1 s2 jxÞ cy ðs1 jxÞcy ðs2 jxÞgjðr2 ; s2 Þpðt2 Þ dt2 dx ¼ 0.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

564

Applying the Fourier inversion formula, we obtain for all x, s1 : Z eir2 x fcy ðs1 s2 jxÞ cy ðs1 jxÞcy ðs2 jxÞgjðr2 ; s2 Þpðt2 Þ dt2 ¼ 03 Z Z Z eir2 x eiðs1 s2 Þu f y ðujxÞ du eis1 u f y ðujxÞ du cy ðs2 jxÞ jðr2 ; s2 Þpðt2 Þ dt2 ¼ 03 Z Z is1 u ir2 x is2 u e f y ðujxÞ e ðe cy ðs2 jxÞÞjðr2 ; s2 Þpðt2 Þ dt2 du ¼ 0. Again applying the Fourier inversion formula, we get for all x, u Z eir2 x ðeis2 u cy ðs2 jxÞÞjðr2 ; s2 Þpðt2 Þ dt2 ¼ 0. We see that the second term on the left-hand side does not depend on u, the solution satisﬁes necessarily Z eir2 x eis2 u jðr2 ; s2 Þpðt2 Þ dt2 ¼ 0 for all x; u 3 jðr2 ; s2 Þ ¼ 0

for all r2 and s2 .

Hence NðKÞ ¼ f0g. A.5(ii): We check that Ey0 ht ðyÞ 2 HðKÞ for all y 2 Y. Note that Ey0 ht ðyÞ ¼ Ey0 ½eirX t ðcy0 ðsjX t Þ cy ðsjX t ÞÞ gðr; sÞ We apply Lemma B.1 to compute

kgk2K .

(B.18)

We need to ﬁnd G such that

gðr; sÞ ¼ Ey0 ½ðeiðsX tþ1 þrX t Þ cy ðsjX t ÞeirX t ÞGðX t ; X tþ1 Þ ¼ Ey0 ½eiðsX tþ1 þrX t Þ fGðX t ; X tþ1 Þ Ey0 ½GðX t ; X tþ1 ÞjX t g. Let us denote G~ ¼ G Ey0 ½GjX t . We want to solve in G~ the equation Z ~ t ; xtþ1 Þf y ðxtþ1 jxt Þf y ðxt Þ dxtþ1 dxt . gðtÞ ¼ eiðsxtþ1 þrxt Þ Gðx 0 0 Applying twice the Fourier inversion formula, we obtain a unique solution ZZ 1 gðr; sÞeiðsxtþ1 þrxt Þ ~ Gðxt ; xtþ1 Þ ¼ ds dr. 2 f y0 ðxtþ1 jxt Þf y0 ðxt Þ ð2pÞ

(B.19)

~ Applying the We now replace gðr; sÞ by its expression (B.18) into (B.19) to calculate G. Fourier inversion formula, we have Z Z 1 eirxt eiru cy ðsjuÞf y0 ðuÞ du dr ¼ cy ðsjxt Þf y0 ðxt Þ, 2p Z 1 cy ðsjxt Þeisxtþ1 ds ¼ f y ðxtþ1 jxt Þ. 2p Hence we have ~ t ; xtþ1 Þ ¼ Gðx

f y0 ðxtþ1 jxt Þ f y ðxtþ1 jxt Þ f y0 ðxtþ1 jxt Þ

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

565

2

and kgk2K ¼ Ey0 G~ o1 if and only if #2 ZZ " f y0 ðxtþ1 jxt Þ f y ðxtþ1 jxt Þ f y0 ðxt ; xtþ1 Þ dxt dxtþ1 o1 f y0 ðxtþ1 jxt Þ for all y 2 Y. We recognize Pearson’s chi-square distance. A.5(iii): Now, we check that Ey0 ry ht ðyÞ 2 HðKÞ for all y 2 @. We replace gðr; sÞ by gðr; sÞ Ey0 ry ht ðyÞ ¼ Ey0 ½eirX t ry cy ðsjX t Þ ~ We again apply the Fourier inversion formula to obtain in Eq. (B.19) to calculate G. Z Z 1 irxt iru e e ry cy ðsjuÞf y0 ðuÞ du dr ¼ ry cy ðsjxt Þf y0 ðxt Þ, 2p Z Z 1 1 ry cy ðsjxt Þeisxtþ1 ds ry cy ðsjxt Þeisxtþ1 ds ¼ 2p 2p ðB:20Þ ¼ ry f y ðxtþ1 jxt Þ. We R are allowed to interchange the order of integration and derivation in (B.20) because of supy2Y jry cy ðsjxt Þj dso1 and by Lemma 3.6 of Newey and McFadden (1994). Hence we have ry f y ðxtþ1 jxt Þ G~ ¼ f y0 ðxtþ1 jxt Þ and 0 kEy0 ry ht ðyÞk2K ¼ Ey0 G~ G~ " !0 # y0 ry f y ðxtþ1 jxt Þ r y f y ðxtþ1 jxt Þ ¼E f y0 ðxtþ1 jxt Þ f y0 ðxtþ1 jxt Þ

ðB:21Þ

which is ﬁnite by assumption A8(i). When y ¼ y0 , the term in (B.21) coincides with the information matrix I y0 which proves the ML-efﬁciency without using Proposition 3.6. Assumption A.6(ii) follows from A.8(iii) because Z kry ht ðyÞk2 ¼ jeirX t ry cy ðsjX t Þj2 ds Z p jry cy ðsjX t Þj2 ds ¼ kry cy ð:jX t Þk2 :

&

Proof of Proposition 4.1. The asymptotic distribution of y^ T follows from Propositions 3.2 and A.1. The asymptotic efﬁciency follows from Eq. (B.21). & Proof of Proposition 4.3. To simplify the notation, we omit y0 also all the terms in this proof are taken at y0 . Recall that the variance of e yT is given by J 1 SJ 1 with

J ¼ Ey0 ðryy ln f y ðY 0 ÞÞ ¼ Ey0 ry ln f y ðY 0 Þðry ln f y ðY 0 ÞÞ0 , 1 X S¼ Ey0 ½ry ln f y ðY 0 Þðry ln f y ðY j ÞÞ0 . j¼1

ARTICLE IN PRESS 566

M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

The asymptotic variance of y^ T is given by Theorem 2 in Carrasco and Florens (2000) by 1=2 replacing B by K~ : V ¼ ðkgk2K~ Þ1 ðK~ 1 g; K K~ 1 gÞðkgk2K~ Þ1 , where g ¼ Ey0 ðry hÞ. Theorem 2 assumes that B is a bounded operator, here B is not bounded but a proof similar to that of Theorem 8 of Carrasco and Florens (2000) would 1=2 show that the result is also valid for K~ . 2 (a) Calculation of kgkK~ : We apply results from Lemma B.1. First we check that G 0 ¼ ry ln f y ðY t Þ belongs to CðgÞ that is Z ry cy ðtÞ ¼ ry ln f y ðyt Þðeityt cy ðtÞÞf y ðyt Þ dyt Z ¼ ry f y ðyt Þðeityt cy ðtÞÞ dyt Z ¼ ry f y ðyt Þeityt dyt Z ¼ ry f y ðyt Þeityt dyt . Now consider a general solution G ¼ G 0 þ G1 . The condition G 2 CðgÞ implies Z G 1 ðyt Þðeityt cy ðtÞÞf y ðyt Þ dyt ¼ 0 8t Z 3 ðG 1 ðyt Þ EG1 Þeityt f y ðY t Þ dY t ¼ 0 8t ) G 1 EG 1 ¼ 0 ) Ey0 ðG 0 G1 Þ ¼ 0. This shows that the element of CðgÞ with minimal norm is G 0 . Hence we have kgk2K~ ¼ Ey0 ðG 0 G00 Þ ¼ Ey0 ½ðry ln f y ðY t ÞÞðry ln f y ðY t ÞÞ0 . ~ with (b) Calculation of K~ 1 g : We verify that g ¼ Ko Z oðtÞ ¼ eitv ry ln f y ðvÞ dv where v is a L-vector and f y denotes the joint likelihood of Y t . Because Y t is assumed to be stationary, f y does not depend on t. We have Z Z ~ ðKoÞðt1 Þ ¼ ðcy ðt1 þ t2 Þ cy ðt1 Þcy ðt2 ÞÞ eit2 v ry ln f y ðvÞ dv dt2 Z Z Z ¼ cy ðt1 þ t2 Þ eit2 v ry ln f y ðvÞ dv dt2 cy ðt1 Þ ry f y ðvÞ dv

Z Z it1 y it2 y it2 v ¼ e ry ln f y ðvÞ dv dt2 f y ðyÞ dy e e Z ¼ eit1 y ry ln f y ðyÞf y ðyÞ dy ¼ gðt1 Þ.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

567

The fourth equality follows from a property of the Fourier transform, see Theorem 4.11.12. in Debnath and Mikusinsky (1999). (c) Calculation of ðK~ 1 g; K K~ 1 gÞ: Note that ðK~ 1 g; K K~ 1 gÞ ¼ ðo; KoÞ. The kernel of K is given by 1 X

kðt1 ; t2 Þ ¼

½Ey0 ðeiðt1 Y 0 þt2 Y j Þ Þ cy ðt1 Þcy ðt2 Þ

j¼1

1 X

kj ðt1 ; t2 Þ.

j¼1

Let us denote K j the operator with kernel kj ðt1 ; t2 Þ. Z Z y0 iðt1 Y 0 þt2 Y j Þ ðK j oÞðt1 Þ ¼ E ðe Þ eit2 v ry ln f y ðvÞ dv dt2 because the second term equals zero.

ZZ Z eit2 yj eit2 v ry ln f y ðvÞ dv dt2 f y ðy0 ; yj Þ dy0 dyj eit1 y0 ðK j oÞðt1 Þ ¼ Z ¼ eit1 y0 ry ln f y ðyj Þf y ðy0 ; yj Þ dy0 dyj . We have

Z ZZ

ðo; K j oÞ ¼

e Z

¼

it1 y0 it1 v

e

ry ln f y ðvÞ dv dt1 ry ln f y ðyj Þ0 f y ðy0 ; yj Þ dy0 dyj

ry ln f y ðy0 Þry ln f y ðyj Þ0 f y ðy0 ; yj Þ dy0 dyj

¼ Ey0 ½ry ln f y ðY 0 Þðry ln f y ðY j ÞÞ0 . It follows that ðo; KoÞ ¼

1 X

ðo; K j oÞ ¼

j¼1

1 X

Ey0 ½ry ln f y ðY 0 Þðry ln f y ðY j ÞÞ0

j¼1

which ﬁnishes the proof.

&

Proof of Proposition 5.1. We wish to apply Proposition 3.2 on fh~Jt g. To do this, we need to check that the conditions of this proposition are satisﬁed for fh~Jt g. The mixing properties of fh~Jt g are the same as those of fX t g, moreover fh~Jt g is a martingale difference sequence. Hence by Assumption A.7 and Politis and Romano (1994), we have pﬃﬃﬃﬃ J ~ T h~ ) Nð0; KÞ t

as T ! 1 in L2 ðpÞ where K~ is the operator with kernel k~ satisfying ~ 1 ; t2 Þ ¼ covðh~J ðt1 Þ; h~J ðt2 ÞÞ kðt t t ¼ E½covðh~J ðt1 Þ; h~J ðt2 ÞjY t Þ þ cov½Eðh~J ðt1 ÞjY t Þ; Eðh~J ðt2 ÞjY t Þ t

t

t

t

1 J J ¼ EY E ½ðh~ ðt1 Þ hðt1 ÞÞðh~ ðt2 Þ hðt2 ÞÞjY t þ covðhðt1 Þ; hðt2 ÞÞ J 1 ¼ uðt1 ; t2 Þ þ kðt1 ; t2 Þ. J Note that we use E and cov for the expectation and covariance with respect to both t and Y t . Therefore K~ ¼ K þ U=J. Note that U is a positive deﬁnite operator. Assumption A.5

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

568

is satisﬁed under Assumption A.8(i) because Eht ðyÞ ¼ Eh~Jt ðyÞ, Ery ht ðyÞ ¼ Ery h~Jt ðyÞ. The second equality follows from

Eð5y h~t Þ ¼ EY E 5y h~t jY t ¼ EY ½5 E ðh~t jY t Þ.

ðB:22Þ

y

The order of integration and differentiation in B.22 can be exchanged because the distribution of ~ j;t does not depend on y and E supy2Y jry Hjo1 which is true under A.10(iii). Therefore, Eð5y h~Jt Þ ¼ Eð5y hÞ. Finally, using a proof very similar to that of Proposition A.1, we see that Assumptions A.4(iii), A.6(ii) and A.8(ii)–(iii) are satisﬁed under Assumption A.10. It is enough to note that 1 X J yj isX J ~ j5y ht j ¼ is5y HðX t ; j;tþ1 ; yÞe tþ1jt J j¼1 p

J 1X j5y HðX t ; j;tþ1 ; yÞj J j¼1

and j5yy h~Jt jpð1=JÞ

PJ

j¼1 j5yy HðX t ; j;tþ1 ; yÞj.

Hence, from Proposition 3.2, we have

pﬃﬃﬃﬃ L J J T ðy~ T y0 Þ ! Nð0; ðhEy0 ðry h~ Þ; Ey0 ðry h~ ÞiK~ Þ1 Þ. We can rewrite the variance by using Eð5y h~Jt Þ ¼ Eð5y hÞ. Now, we show the inequality kgk2K~ pkgk2K for any function g in the range of K. For sake of simplicity, we assume g scalar, the proof for g vector is very similar. Denote 1 1 f ¼ Kþ U g, J l ¼ K 1 g. We have kgk2K~ ¼ hf ; gi and kgk2K ¼ hl; gi. We want to show hl f ; giX0. hl f ; giX0 3hKðl f Þ; giK X0 1 1 3 Uf ; Kf þ Uf X0 J J K 1 3 hUf ; f i þ kUf k2K X0. J This last inequality is true because U is deﬁnite positive.

&

Proof of Proposition 5.2. The consistency holds under Assumptions A.2–A.4. By the e geometric ergodicity and the boundedness of eitY t and eitY j , the functional CLT of Chen

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

569

and White (1998, Theorem 3.9) gives pﬃﬃﬃﬃ T T X itY t ðe cLy ðtÞÞ ) Nð0; KÞ, T t¼1 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ JðTÞ JðTÞ X itYej ðe cLy ðtÞÞ ) Nð0; KÞ. JðTÞ j¼1 as T ! 1 in L2 ðpÞ. The asymptotic normality follows from: pﬃﬃﬃﬃ T pﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ JðTÞ pﬃﬃﬃﬃ JðTÞ X itYej T X itY t T L ~ T hT ðt; y0 Þ ¼ ðe cy ðtÞÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðe cLy ðtÞÞ T t¼1 JðTÞ JðTÞ j¼1 L

! Nð0; ð1 þ zÞKÞ because Y t and Y~ j are independent. Let K~ ¼ ð1 þ zÞK. Minimizing kh~T kK aT is equivalent T

a ~ By Proposition 3.2, to minimizing kh~T kK~ aT where K~ TT denote a regularized estimator of K. T

y~ T is asymptotically normal and the inverse of its variance is equal to 1 hEy0 ð5y h~t Þ; Ey0 ð5y h~t ÞiK . hEy0 ð5y h~t Þ; Ey0 ð5y h~t ÞiK~ ¼ ð1 þ zÞ Now, we compute Ey0 ð5y h~t Þ. By Assumption A.11, we have e e Ey0 ð5y h~t Þ ¼ Ey0 ð5y eitY j Þ ¼ 5y Ey0 ðeitY j Þ ¼ 5y cLy ðtÞ ¼ Ey0 ð5y hÞ:

&

Proof of Lemma 6.1. (1) follows from Chen et al. (1999, Theorem 7.1). This theorem states that a scalar diffusion with drift coefﬁcient, m, and diffusion coefﬁcient, s and nonattracting boundaries is b-mixing with geometric decay if ðm=s þ 0:5ðqs=qxÞÞ is negative at the right boundary and positive at the left boundary. Here, we get m 1 qs o0, lim r"1 s 2 qr m 1 qs 40. lim r#0 s 2 qr These conditions are satisﬁed provided that 4g s2 40. Note that the stronger condition 2g s2 X0 guarantees that neither boundary is attracting. (2) A7: The mixing property follows from (1) where ai ¼ ri for some 0oro1. Denote y ¼ ðg; s2 ; kÞ and Y be a compact set of ðR þ Þ3 . To see that f y ðrt jrt1 Þ is continuously differentiable, it sufﬁces to examine the expression of the conditional likelihood (Zhou, 2001): f y ðrt jrt1 Þ ¼

1 X

Gammaðrt jj þ l; 1ÞPoissonðjjcrt1 ek Þ

j¼0 k 1 X rtjþl1 ert ðcrt1 ek Þj ecrt1 e ¼ Gðj þ lÞ j! j¼0

1 X j¼0

with l ¼ 2g=s2 .

uyj (notation)

ðB:23Þ

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

570

A8(i): By developing the square we obtain 2 !2 3 ZZ f ðr jr Þ f y ðrt jrt1 Þ2 t t1 5¼ f ðrt1 Þ drt drt1 1. Ey0 4 1 y f y0 ðrt jrt1 Þ f y0 ðrt jrt1 Þ y0 Using the notation introduced in (B.23), we have ! P 2 P X u2yj uyj f y ðrt jrt1 Þ2 ð uyj Þ2 ¼ P pP p . f y0 ðrt jrt1 Þ uy 0 j uy 0 j uy 0 j Replacing uyj and uy0 j by their expressions, we obtain ! X u2yj uy 0 j ( ~ ) k k X rtjþl1 ert Gðj þ l0 ÞGðj þ e lÞ ðrt1 ððc2 =c0 Þeð2kk0 Þ ÞÞj ert1 ð2ce c0 e 0 Þ , ¼ ~ j! Gðj þ lÞ2 Gðj þ lÞ where l~ ¼ 2l l0 . Remark that the ﬁrst element of the sum integrates to 1 with respect to ~ rt provided l40 (which imposes a restriction on l and therefore Y). The marginal pdf of rt1 is a Gamma: f y ðrt1 Þ ¼

ol l1 ort1 r e , GðlÞ t1

where o ¼ 2k=s2 . Regrouping the terms yields ZZ X Gðj þ l0 ÞGðj þ e f y ðrt jrt1 Þ2 lÞ drt f y0 ðrt1 Þ drt1 ¼ f y0 ðrt jrt1 Þ Gðj þ lÞ2 j! 2 j c ð2kk0 Þ e c0 Z k k0 jþl 1 rt10 ert1 ð2ce c0 e þo0 Þ drt1 . Remark that Z k k0 jþl 1 rt10 ert1 ð2ce c0 e þo0 Þ drt1 ¼ ¼

Z

Gðj þ l0 Þ ek0

jþl0

ð2cek c0 þ o0 Þ Gðj þ l0 Þ

ð2cek c0 ek0 þ o0 Þjþl0

vjþl0 1 ev dv Gðj þ l0 Þ

.

Hence, it follows that ZZ X Gðj þ l0 Þ2 Gðj þ e f y ðrt jrt1 Þ2 lÞ j f y0 ðrt1 Þ drt drt1 ¼ q const. 2 f y0 ðrt jrt1 Þ Gðj þ lÞ j!

(B.24)

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

571

where q¼

c2 eð2kk0 Þ . c0 ð2cek c0 ek0 þ o0 Þ

The sum (B.24) is ﬁnite provided jqjo1. Let D ¼ c c0 and D0 ¼ k k0 where D and D0 may be positive or negative. We want to show that 0oqo1 for values of D and D0 around 0. 0oqo1 holds if c2 ð2kk0 Þ e o2cek c0 ek0 þ o0 , c0 which is equivalent to 0oc20 ð2 eD0 eD0 Þ þ 2Dc0 ð1 eD0 Þ þ D2 eD0 þ

c20 gðD; D0 Þ. ð1 ek0 Þ

Note that gð0; 0Þ ¼ c20 =ð1 ek0 Þ40. By continuity, gðD; D0 Þ is positive on an interval around ð0; 0Þ. This shows that there exists a compact Y that contains y0 as an interior point and such that the ﬁrst inequality of A8(i) holds for all y 2 Y. The proof of the second inequality follows the same line and is omitted here. A8(ii) and (iii): Remark that the CCF can be written as cy ¼ bðy; tÞeitgðy;tÞrt , where b and g are twice continuously differentiable. jry cy jpjry bj þ jry gjjtjjrt jjcy j, ryy cy ¼ ryy beitgðyÞrt þ ry bry geitgðyÞrt þ itrt ry cy ry g þ cy itrt ryy g ¼ ryy beitgðyÞrt þ ry bry geitgðyÞrt þ itrt ry bry geitgðyÞrt ðtrt Þ2 ry bðry gÞ2 cy þ cy itrt ryy g Recall that jcy jp1 and 2 it ð2g=s Þ ; b¼ 1 c

g¼

ek . 1 it=c

Note that 1 1 þ it=c ¼ 1 it=c 1 þ t2 =c2 and hence jbjp1.

and

1 2 1 1 it=c ¼ 1 þ t2 =c2 p1

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

572

qb 2 it ¼ 2 ln 1 b, qg s c qb 2g it qc b ¼ 2 2 , qk s c qk 1 it=c qb it=c 2g 2g ¼ þ lnð1 it=cÞ 4 b, qs2 1 it=c s2 s qg ¼ 0, qg qg ðit=c2 Þ ðqc=qkÞ ð1 it=cÞ ¼ ek , qk ð1 ðit=cÞÞ2 2 2 qg k ðit=c Þðqc=qs Þ ¼ e . qs2 ð1 it=cÞ2

R As rt admit second moments, t admits fourth moments (that is t4 pðtÞ dto1Þ and Y is compact bounded away from ð0; 0; 0Þ, we have kry bk and kry gk bounded and the condition A8(ii) is satisﬁed. The expressions of the second derivatives are omitted here. A8(iii) requires the existence of the fourth moment of rt and of the tenth moment of t which is true. & References Altissimo, F., Mele, A., 2005. Simulated nonparametric estimation of dynamic models with applications to ﬁnance. Working paper, London School of Economics. Andrews, D., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–858. Bates, D., 2003. Maximum likelihood estimation of latent afﬁne processes. Review of Financial Studies, forthcoming. Broze, L., Scaillet, O., Zakoian, J.-M., 1998. Quasi-indirect inference for diffusion processes. Econometric Theory 14, 161–186. Carrasco, M., Florens, J.P., 2000. Generalization of GMM to a continuum of moment conditions. Econometric Theory 16, 797–834. Carrasco, M., Florens, J.P., 2002. Efﬁcient GMM estimation using the empirical characteristic function. Working paper, CREST, Paris. Carrasco, M., Florens, J.P., 2004. On the asymptotic efﬁciency of GMM. Working paper, University of Rochester. Carrasco, M., Hansen, L.P., Chen, X., 1999. Time deformation and dependence. Working paper, University of Chicago. Chacko, G., Viceira, L., 2005. Spectral GMM estimation of continuous-time processes. Journal of Econometrics 116, 259–292. Chen, X., White, H., 1998. Central limit and functional central limit theorems for Hilbert space-valued dependent processes. Econometric Theory 14, 260–284. Chen, X., Hansen, L.P., Carrasco, M., 1999. Nonlinearity and temporal dependence. Working paper, University of Chicago. Conley, T., Hansen, L.P., Luttmer, E., Scheinkman, J., 1997. Short term interest rates as subordinated diffusions. Review of Financial Studies 10, 525–578. Cox, J.C., Ingersoll, J., Ross, S.A., 1985. A theory of the term structure of interest rates. Econometrica 53, 385–408. Darolles, S., Florens, J.P., Renault, E., 2002. Nonparametric instrumental regression. Working paper 05-2002, CRDE. Detemple, J., Garcia, R., Rindisbacher, M., 2002. Asymptotic properties of Monte Carlo estimators of diffusion processes. Working paper, University of Montreal.

ARTICLE IN PRESS M. Carrasco et al. / Journal of Econometrics 140 (2007) 529–573

573

Donald, S., Newey, W., 2001. Choosing the number of instruments. Econometrica 69, 1161–1191. Dufﬁe, D., Singleton, K., 1993. Simulated moments estimation of Markov models of asset prices. Econometrica 61, 929–952. Dunford, N., Schwartz, J., 1988. Linear Operators, Part II: Spectral Theory. Wiley, New York. Fermanian, J.-D., Salanie, B., 2004. A nonparametric simulated maximum likelihood estimation method. Econometric Theory 20, 701–734. Feuerverger, A., 1990. An efﬁciency result for the empirical characteristic function in stationary time-series models. Canadian Journal of Statistics 18, 155–161. Feuerverger, A., McDunnough, P., 1981. On the efﬁciency of empirical characteristic function procedures. Journal of Royal Statistical Society B 43, 20–27. Florens, J.-P., Mouchard, Rolin, 1993. Noncausality and marginalization of Markov processes. Econometric Theory 9, 239–260. Gallant, A.R., Long, J.R., 1997. Estimating stochastic differential equations efﬁciently by minimum chi-square. Biometrika 84, 125–141. Gallant, A.R., Tauchen, G., 1996. Which moments to match? Econometric Theory 12, 657–681. Gallant, A.R., Tauchen, G., 1998. Reprojecting partially observed systems with application to interest rate diffusions. Journal of American Statistical Association 93, 10–24. Genz, A., Keister, B.D., 1996. Fully symmetric interpolatory rules for multiple integrals over inﬁnite regions with Gaussian weight. Journal of Computational and Applied Mathematics 71, 299–309. Gourie´roux, C., Monfort, A., 1996. Simulation Based Econometric Methods. CORE Lectures, Oxford University Press, New York. Gourie´roux, C., Monfort, A., Renault, E., 1993. Indirect inference. Journal of Applied Econometrics 8, S85–S118. Hall, P., Horowitz, J., 1996. Bootstrap critical values for tests based on generalized-method-of-moments estimators. Econometrica 64, 891–916. Hansen, L., 1985. A method of calculating bounds on the asymptotic covariance matrices of generalized method of moments estimators. Journal of Econometrics 30, 203–238. Horn, R., Johnson, C., 1985. Matrix Analysis. Cambridge University Press, Cambridge, UK. Jiang, G., Knight, J., 2002. Estimation of continuous time processes via the empirical characteristic function. Journal of Business and Economic Statistics 20, 198–212. Parzen, E., 1957. On consistent estimates of the spectrum of a stationary time series. Annals of Mathematical Statistics 28, 329–348. Parzen, E., 1970. Statistical inference on time series by RKHS methods. In: Pyke, R. (Ed.), 12th Biennial Seminar Canadian Mathematical Congress Proceedings. Canadian Mathematical Society, Montreal. Politis, D., Romano, J., 1994. Limit theorems for weakly dependent Hilbert space valued random variables with application to the stationary bootstrap. Statistica Sinica 4, 451–476. Singleton, K., 2001. Estimation of afﬁne pricing models using the empirical characteristic function. Journal of Econometrics 102, 111–141. van der Vaart, A., 1998. Asymptotic Statistics. Cambridge University Press, Cambridge, UK. White, H., 1994. Estimation, Inference and Speciﬁcation Analysis. Cambridge University Press, Cambridge, UK. Yu, J., 2001. Empirical characteristic function estimation and its applications. Working paper, University of Auckland. Zhou, H., 2001. Finite sample properties of EMM, GMM, QMLE and MLE for a square-root interest rate diffusion model. Journal of Computational Finance 5, 89–122.