Tilted Nonparametric Estimation of Volatility Functions

Viewer
Transcript

Tilted Nonparametric Estimation of Volatility Functions∗ Ke-Li Xu† and Peter C. B. Phillips‡ April 13, 2008

∗

The authors thank Donald W.K. Andrews, Zongwu Cai, Yuichi Kitamura, and Taisuke Otsu for helpful comments. Phillips acknowledges partial research support from a Kelly Fellowship and the NSF under Grant Nos. SES 04-142254 and 06-47086. † Corresponding author. Department of Finance and Management Science, University of Alberta School of Business, Business Building 3-40N, Edmonton, Alberta, T6G 2R6, Canada. E-mail address: [email protected]. ‡ Yale University, University of Auckland, University of York and Singapore Management University. Address: Department of Economics, Cowles Foundation for Research in Economics, Yale University, P. O. Box 208281, New Haven, CT 06520, USA. E-mail address: [email protected].

1

Abstract This paper proposes a novel positive nonparametric estimator of the volatility function without relying on logarithmic or other transformations. The basic idea is to apply the re-weighted Nadaraya-Watson estimator of Hall and Presnell (1999, Journal of the Royal Statistical Society B ) to squared mean regression residuals. The new volatility estimator is asymptotically equivalent to the local linear estimator and is restricted to be positive in finite samples. We also show its adaptiveness to the unknown mean function. Simulations are conducted to compare with other existing estimators and empirical applications including a jump diﬀusion model are provided to illustrate the usefulness of the proposed method.

JEL Classification: C13; C14; C22.

Keywords: Conditional variance function; Empirical likelihood; Conditional heteroskedasticity; Jump diﬀusions; Local linear estimator; Nonparametric regression; Volatility.

2

1

Introduction

Nonparametric approaches provide flexible alternatives to traditional parametric modeling methods. Consider the following nonparametric heteroskedastic regression model

Yi = m(Xi ) + σ(Xi )εi ,

(1)

where {Xi , Yi , i = 1, · · · , n} are observations on two random variables X and Y, and {εi } are innovations satisfying E(εi |Xi ) = 0, V ar(εi |Xi ) = 1. The conditional mean function m(x) = E(Y |X = x) and the conditional variance function (or the volatility function) σ 2 (x) = V ar(Y |X = x) > 0 are left unspecified and are the main interests of statistical investigation. The model (1) is of fundamental importance in financial econometrics due to its ability to allow nonlinearity and conditional heteroskedasticity in financial time series (Engle, 1982, Tong, 1990, Bossaerts et al., 1996). It can also be regarded as the discretized version of the nonparametric continuous-time stochastic diﬀusion model which is commonly used in financial derivative pricing (Ait-Sahalia, 1996, Stanton, 1997, Bandi and Phillips, 2003). This paper focuses on estimation of the volatility function σ 2 (·), which is crucial in inference for the conditional mean function m(·), e.g. in constructing confidence intervals and selecting data-driven bandwidths. It is also of great importance in practical applications, e.g. in volatility or risk measurement in finance (Shephard, 2005). See Martins-Filho and Yao (2006, 2007) for recent applications of volatility estimation in the estimation of valueat-risk and expected shortfall functions for a financial asset and production frontiers. Other applications of variance estimation are discussed in Carroll and Ruppert (1988). Earlier contributions on the estimation of σ 2 (·) in nonparametric contexts include Carroll (1982), Müller and Stadtmüller (1987) and Hall and Carroll (1989) among others, which are mainly concerned with iid data applications. Assuming Xi = Yi−1 , Härdle and Tsybakov (1997) proposed a local polynomial estimation procedure of σ 2 (·) based on a variance decomposition. Their method was criticized by Fan and Yao (1998) who pointed out that it is not fully adaptive to the unknown mean function, with 3

the bias term depending on the derivative of m(·). To tackle this problem, Fan and Yao (1998) proposed a residual-based fully adaptive volatility estimator (see also Ruppert et al., 1997).1 Applying the local linear technique to squared residuals of a first-stage nonparametric mean regression, they showed that the resulting volatility estimator is asymptotically as eﬃcient as the oracle estimator, which assumes knowledge of the mean function m(·). Although fitting a linear function locally seems appealing when compared with conventional local level (Nadaraya-Watson) estimation as well demonstrated by Fan and Gijbels (1996), it is not guaranteed to give non-negative values in finite samples when estimating a variance. The tendency to produce negative variance estimates is even stronger when larger bandwidths are used. The negativity problem can lead to many practical diﬃculties. In consequence, it is commonly recommended in applications to use the Nadaraya-Watson estimator (which is theoretically inferior) when fitting the volatility function (Chen and Qin, 2002, Porter, 2003), especially at the design points where the local linear estimators give negative results. The dilemma has been widely recognized among practitioners, and some eﬀorts suggesting alternatives to local linear volatility estimators have been made. Ziegelmann (2002) proposed to fit an exponential function locally (rather than a linear function as in the local linear estimator) within the general locally parametric nonparametric framework of Hjort and Jones (1996). More recently, assuming iid Gaussian errors, Yu and Jones (2004) maximized the localized likelihood where the mean and the variance functions are parameterized locally as a linear function and an exponential function respectively. In both methods of Ziegelmann and Yu and Jones, the logarithm of the variance, rather than the variance itself, is estimated so that the resulting estimator is always positive by construction. However, there seems little intuitive justification for fitting the exponential form without knowledge of the underlying model. Furthermore, the introduction of a logarithmic transformation produces one more term in the bias expression, which will increase the squared bias and subsequently the mean square error, e.g. when the second derivative of the volatility function is negative. 1 See also Dahl and Levine (2006), who dealt with nonparametric volatility estimation with serially dependent innovations εi .

4

This paper proposes a novel volatility function estimator that preserves the appealing properties of local linear estimators while being always positive. It is based on the intentionally biased bootstrap method due to Hall and Presnell (1999). The idea is to adjust the conventional Nadaraya-Watson estimator by minimally tilting the empirical distribution subject to a discrete bias reducing moment condition satisfied by the local linear estimator. The new estimator, which is called the re-weighted Nadaraya-Watson or tilted estimator here, inherits the non-negativity restriction of the variance function from the usual NadarayaWatson estimator, while possessing the superior properties of bias, boundary correction and minimax eﬃciency of the local linear estimator. We also show its adaptiveness to the unknown mean function, i.e. we can estimate the volatility function as well as if we knew the true mean function. Computionally, unlike the logarithmic transformation-based variance estimators mentioned above, the proposed estimator has closed-form representation and requires no multivariate optimization. So it is relatively easy to use. Furthermore, the estimator is constructed without knowledge of the error distribution and is robust to possible mis-specification. The re-weighting idea used here is also useful in other contexts, e.g. in estimating the regression function (Hall and Presnell 1999, and Cai, 2001), the conditional distribution function (Hall, Wolﬀ and Yao, 1999), quantiles (Cai, 2002), the conditional density function (De Gooijer and Zerom, 2003), and in continuous-time functional diﬀusion estimation (Xu, 2006). The similar methodology was used by Hall and Huang (2001) to monotonize the nonparametric regression function estimates. The remainder of the paper is organized as follows. Section 2.1 describes the residualbased re-weighted Nadaraya-Watson estimator of the conditional variance. The asymptotic distributional theory is developed for stationary and mixing time series in Section 2.2 at both interior and boundary points, and a consistent estimator of the asymptotic variance is suggested. In Section 3 we compare the new estimator with the competitors suggested by Ziegelmann (2002) and Yu and Jones (2004), and some simulation evidence is reported in Section 4.1. Two empirical applications including a continuous-time jump diﬀusion model 5

are provided in Sections 4.2 and 4.3 to illustrate the usefulness of the proposed methodology. Section 5 concludes and all proofs are collected in an appendix.

2

Main Results

2.1

The estimator

The residual-based nonparametric estimator of the volatility function σ 2 (·) is built on a first-stage kernel-weighted least squares estimate of the conditional mean function m(·). Let W (·) and K(·) be kernel functions and h0 = h0 (n), h = h(n) > 0 be bandwidth parameters determining the complexity of the model. The local linear fitting of m(·) solves ¶2 µ ¶ n µ X γ 2 ) = arg min (b γ1, b Yi − γ 1 − γ 2 (Xi − x) W Xih−x 0 (γ 1 ,γ 2 )

(2)

i=1

and then estimates m(x) by m(x) b =γ b1 , for a design point x. The use of diﬀerent bandwidths in mean and variance estimation has been stressed by several authors (Ruppert et al., 1997,

and Yu and Jones, 2004), and we use h0 in the mean regression (2) and h in the variance estimation in what follows. Instead of fitting the squared residuals

rbi2

=

¶2

µ

b i) Yi − m(X

to Xi using a second-stage

local linear smoother as in Ruppert et al (1997) and Fan and Yao (1998), we consider the following re-weighted Nadaraya-Watson estimator of σ 2 (x) :

σ b2 (x) = where w bi (x) solves

Pn

i=1

Pn

w bi (x)K

i=1

µ

w bi (x)K

Xi −x h

µ

¶

Xi −x h

rbi2 ¶ ,

{w bi (x)} = arg max ln (w1 (x), · · · , wn (x)),

with ln (w1 (x), · · · , wn (x)) =

(3)

(4)

{wi (x)}

Pn

i=1 log wi (x), subject to the restrictions wi (x) ≥ 0,

6

Pn

i=1

wi (x) =

1 and n X i=1

wi (x)(Xi − x)Kh (Xi − x) = 0,

(5)

where Kh (·) = K(·/h)/h. See Hall and Presnell (1999) and Cai (2001) for the motivation of such re-weighted Nadaraya-Watson (NW) estimators. One may interpret w(x) b =

bn (x)) as revised estimates of the probability masses placed on observations (w b1 (x), · · · , w (X1 , · · · , Xn ) and ln (w1 (x), · · · , wn (x)) as the log empirical likelihood (Owen, 2001).

The estimator defined jointly by (3), (4) and (5) belongs to a wide class of re-weighted estimators of the form

Pn w bi (x)Ai (x)Yi , gb(x) = Pi=1 n bi (x)Ai (x) i=1 w

(6)

where Ai (x) is the original weighting function and Yi is the response variable (Hall and Huang, 2001). The probability vector w(x) b is chosen to minimize the distance D(w(x))

from the uniform distribution, wunif (x) = (1/n, · · · , 1/n) subject to desirable constraints, Pn Pn thereby assuring that the original estimator i=1 Ai (x)Yi / i=1 Ai (x) is modified to the

least extent needed to satisfy the constraints. Several distance measures are discussed in Hall and Huang (2001). Here we adapt the re-weighted estimator (6) in the conditional ¶ µ X −x i and Yi = rbi2 , and choose the distance variance estimation context with Ai (x) = K h P D(w(x)) = −2 ni=1 log(nwi (x)) which can be looked as the empirical log-likelihood ratio.

The constraint defined in (5) is a discrete bias-reducing moment condition (5) satisfied

by the local linear smoothing weights2 in which case wi (x) = Sn,2 − (Xi − x)Sn,1 , where P Sn,j = ni=1 (Xi − x)j Kh (Xi − x), j = 1, 2. Thus, we expect the constructed estimator to

behave like the local linear estimator while preserving the non-negative weights of the NW estimator. Without the constraint (5), the re-weighted NW estimator reduces to the usual NW estimator since ln (w1 (x), · · · , wn (x)) is maximized at wunif (x) = (1/n, · · · , 1/n). We can ¶ µ ¶2 µ 2 2 2 also choose weights w bi (x) subject to the constraint d σ b (x) /dx ≥ 0 b (x) /dx ≥ 0 or d σ to ensure monotonicity or convexity of the estimated variance function as needed.

2

A closed-form expression of the weights w bi (x) in (3) can be obtained via the Lagrange See section 3.3.2 in Fan and Gijbels (1996).

7

multiplier method, viz.

w bi (x) =

1

µ

¶,

(7)

n 1 + λ(Xi − x)Kh (Xi − x)

where the Lagrange multiplier λ satisfies n X i=1

2.2

(Xi − x)Kh (Xi − x) = 0. 1 + λ(Xi − x)Kh (Xi − x)

(8)

Limit theory

To derive the asymptotic distribution of σ b2 (x), we make the following assumptions. Let f (·)

be the density function of X.

Assumptions (i) For a given design point x, the functions f (x) > 0, σ 2 (x) > 0, E(Y 3 |X = x) and ¨ = d2 m(z)/dz 2 and σ ¨ 2 (z) = d2 (σ 2 (z))/dz 2 are E(Y 4 |X = x) are continuous at x, and m(z) uniformly continuous on an open set containing x; (ii) E|Y |4(1+δ) < ∞ for some δ ≥ 0; (iii) There exists a constant M < ∞ such that |g1,i (y1 , y2 |x1 , x2 )| ≤ M for all i ≥ 2, where g1,i (y1 , y2 |x1 , x2 ) is the conditional density of Y1 and Yi given X1 = x1 and Xi = x2 ; (iv) The kernel functions W (·) and K(·) are symmetric density functions each with a bounded support [−1, 1]. We also assume a Lipschitz condition is satisfied by each of functions f (·), W (·) and K(·); (v) The process {(Xi , Yi )} is strictly stationary and absolutely regular3 with mixing coefP 2 δ/(1+δ) (j) < ∞, where δ is the same as in (ii); ficients β(j) satisfying ∞ j=1 j β (vi). As n → ∞, h, h0 → 0 and lim inf n→∞ nh4 > 0, lim inf n→∞ nh04 > 0.

3

See, e.g., Davidson (1994) (page 209) for the definition of an absolutely regular process.

8

The asymptotic distribution of the re-weighted NW estimator of the volatility is given in the following theorem at both interior and boundary points.

Theorem 1. (i) Suppose that x is such that x ± h is in the support of f (x). Under the Assumptions stated, as n → ∞,

where K1 =

µ √ nh σ b2 (x) − σ 2 (x) −

R1

2

−1

u K(u)du, K2 =

R1

−1

¶

h2

K1 σ ¨ 2 (x)

2

d

→N

0, K2

µ

2

2

µ

σ 4 (x)ξ 2 (x) f (x)

¶

, ¶

K (u)du, ξ (x) = E (ε − 1) |X = x , ε = 2

2

(9)

Y −m(X) . σ(X)

(ii) Suppose that f (x) has bounded support [a, b] and x = a + ch ( 0 < c < 1). Under the Assumptions stated, as n → ∞, µ √ nh σ b2 (a + ch) − σ 2 (a + ch) − where K 0 =

Rc

K(u)du , K1 −1 1−λc uK(u)

Lc (λc ) = 0 with

=

Rc

h2 K 1 2 σ ¨ (a 2K 0

u2 K(u)du , K2 −1 1−λc uK(u)

Lc (λ) =

Z

c

−1

¶

+ ch)

=

Rc

−1

¶ µ K 2 σ4 (a)ξ 2 (a) , → N 0, 2 d

µ

K 0 f (a)

K(u) 1−λc uK(u)

¶2

(10)

du and λc satisfies

uK(u) du. 1 − λuK(u)

Remarks. Theorem 1 shows that the re-weighted NW volatility estimator is asymptotically fully adaptive to the unknown conditional mean function, a property that is shared by other residual-based volatility estimators (see Ruppert et al., 1997, Fan and Yao, 1998, Ziegelmann, 2002 and Yu and Jones, 2004). Theorem 1 also shows that the bias of the reweighted NW estimator on the boundary is of the same order as the bias in the interior and thus no boundary correction is needed. This feature can be better appreciated through the following heuristic argument. From the proof of Theorem 1 in the appendix, with smaller-order terms neglected, the bias of σ b2 (x) ¶ ¶µ µ Pn −1 X −x 2 2 i is approximately accounted for by the term (nh) σ (Xi ) − σ (x) , i=1 pi (x)K h 9

where pi (x) =

µ Pn

i=1

w bi (x)K

µ

Xi −x h

¶¶−1

w bi (x). By a second-order Taylor expansion of

σ 2 (Xi ) at x and the discrete moment condition (5),

1 X pi (x)K nh i=1 n

µ

Xi −x h

¶

¶µ

σ 2 (Xi ) − σ 2 (x)

¶µ ¶ µ n 1 X X −x 2 1 2 i = pi (x)K σ ¨ (x)(Xi − x) + higher order terms h 2 nh i=1 ⎧ ⎪ h2 ⎨ f (x)K1 σ ¨ 2 (x) + op (h2 ), if x is in the interior; 2 = ⎪ ⎩ h2 f (a)K σ ¨ 2 (a + ch) + o (h2 ), if x is on the boundary. 2

p

1

The bias term of order h is eliminated by the condition (5) for any n both at interior and boundary points just as the local linear smoother. It is diﬀerent from the conventional NW estimator which eliminates the bias term of order h in the limit by the symmetry of the kernel function for interior points, but this term does not vanish for boundary points. The constant λc is decreasing in c and approaches zero when c goes to 1. Theorem 1 (ii) also holds for c ≥ 1, viz. when x is in the interior, by noting K 0 and K i (i = 1, 2) reduce to 1 and Ki (i = 1, 2), respectively. For the right boundary point b − ch, a similar result of (10) holds. The following theorem gives a consistent estimator of the asymptotic variance of σ b2 (x)

both at interior and boundary points, thereby allowing construction of consistent point-wise confidence intervals. Let

b Ω(x) = fb−2 (x)Vb (x),

where nX 2 Vb (x) = w b (x)K 2 h i=1 i n

with w bi (x) defined in (7).

µ

Xi −x h

¶

1X −σ b (x)) , fb(x) = w bi (x)K h i=1 n

(b ri2

2

10

2

µ

Xi −x h

¶

,

Theorem 2. Assume EY 8(1+δ) < ∞ for some δ ≥ 0.

p b → (i) Under the conditions of Theorem 1 (i), as n → ∞, Ω(x)

K2 σ 4 (x)ξ 2 (x) ; f (x)

p b → (ii) Under the conditions of Theorem 1 (ii), as n → ∞, Ω(x)

3

K 2 σ 4 (a)ξ 2 (a) 2

K 0 f (a)

.

Other Estimators

Several positive nonparametric volatility estimators have been proposed as alternatives to the local linear estimator. Ziegelmann (2002)’s residual-based local exponential (LE) volatility estimator, denoted by σ b2LE , belongs to a wide class of local nonlinear estimators (Hjort and Jones, 1996, Gozalo and Linton, 2000). Generally, this class has the form ϑ(z, ϕ b ) where ϑ is

a known function and ϕ b minimizes the following sum of weighted squares, viz. ¶2 µ ¶ n µ X X −x 2 i , ϕ b = min rbi − ϑ(Xi , ϕ) K h ϕ i=1

where the rbi2 ’s are the squared residuals from the nonparametric regression. To ensure positivity of the resultant volatility estimator, Ziegelmann (2002) proposed to use the exponential

function ϑ(z, ϕ) = exp(ϕ1 + ϕ2 z) rather than the linear function ϑ(z, ϕ) = ϕ1 + ϕ2 z as in local linear smoothers. With this re-parameterization, b 1 ), σ b2LE = exp(ψ

b 2 ) solves b 1, ψ where (ψ

(11)

¶2 µ ¶ n µ X X −x 2 i . arg min rbi − exp(ψ1 + ψ2 (Xi − x)) K h (ψ 1 ,ψ 2 )

i=1

In a recent paper, Yu and Jones (2004) adopted a local maximum likelihood (LML) framework (see, e.g., Tibshirani and Hastie, 1987, Loader, 1999) and proposed a slightly diﬀerent estimator, denoted by σ b2LML . Assuming iid normal errors, a localized normal log-likelihood 11

for estimating the conditional mean and variance functions given X = x is 1X − h i=1 n

µ

(Yi −c(Xi d(Xi )

))2

¶

+ log(d(Xi )) K

µ

Xi −x h

¶

,

(12)

where c(·) and d(·) are functions to be fitted locally. The local maximum likelihood estimator amounts to maximizing (12) after an appropriate parameterization of the functions c(·) and d(·). Yu and Jones (2004) used a shortcut version by replacing (Yi − c(Xi ))2 as rbi2 and applied a linear form for the logarithm of d(·), viz.

σ b2LML = exp(db1 ),

where (db1 , db2 ) solves

(13)

¶ µ ¶ n µ X arg min = rbi2 exp(−d1 − d2 (Xi − x)) + d1 + d2 (Xi − x) K Xih−x . (d1 ,d2 )

i=1

The local exponential estimator σ b2LE and local maximum likelihood estimator σ b2LML share

the same asymptotic variance with that of the local linear estimator but with one extra term in the bias. Both estimators essentially estimate the logarithm of the variance, rather than the variance itself to ensure positivity. This logarithmic transformation complicates the bias term, which may have a negative eﬀect depending on the nature of the true variance function (see discussions in Yu and Jones, 2004).

4

Examples

We provide several numerical examples in this section using simulations and real data. In all applications, the Epanechnikov function K(u) = 0.75(1 − u2 )I(−1,1) is used for both kernels W and K, and the bandwidth parameter in mean estimation h0 is selected by least squares cross-validation.

12

4.1

Simulations

This subsection reports simulation experiments comparing the small-sample performance of the following three positive estimators of the volatility function: the re-weighted NW (RNW) proposed here, Ziegelmann’s local exponential (LE) and Yu and Jones’s local maximum likelihood (LML) estimators, given by (3), (11) and (13) respectively, along with the local linear estimator as a benchmark. The data generating process (DGP) follows:

Yi = 0.5(Xi + 2 exp(−16Xi2 )) + σ(Xi )εi , iid

(14)

iid

where Xi ∼ Unif (−2, 2), εi ∼ N (0, 1). We use two volatility functions with the exponential and linear forms respectively:

σ 21 (x) = (0.4 exp(−2x2 ) + 0.2)2 σ 22 (x) = 0.2x + 0.45.

The sample size used is 100. Representative samples for two DGPs are plotted in Figure 1. The first experimental design (14) with σ21 (x) was also used by Fan and Yao (1998). Each of four estimators is evaluated at 37 equally spaced grid points on [−1.8, 1.8] for 500 replications. Figures 2 and 3 compare the boxplots of the mean absolute deviation errors P37 1 σ 2 (xi ) − σ 2 (xi )|, where xi is the i−th grid point, for the four (MAD), MAD = 37 i=1 |b

estimators with 4 fixed bandwidths h = 0.4, 0.5, 0.6, 0.7, when the variance functions are

exponential and linear respectively. For all estimators considered the larger bandwidths lead to estimates with smaller variability. The RNW method generally performs very similarly to its asymptotic analog the local linear estimator. Compared with the LE and LML estimators, the RNW estimator has less bias over all bandwidths considered. These outcomes are not surprising since a logarithmic transformation leads to an adverse eﬀect on the quality of estimation. We also consider the case with heavy-tailed errors when εi has a common t distribution with 5 degrees of freedom. The corresponding comparisons of boxplots are shown

13

in Figure 4 when the variance function is exponential. The advantage of the RNW estimator over its two competitors is similar.

4.2

Prestige vs. income

We use the data from Fox (2002) to study the relationship between the prestige and the average income of Canadian occupations4 . The dataset consists of 102 occupations and the prestige for each occupation is measured by the Pineo-Porter prestige score from a social survey. Figure 5 (a) shows the scatterplot and a local linear mean fit with the bandwidth h0 = 5809 chosen via cross validation (Li and Racine, 2004). It would be useful to also provide variance estimates for, e.g., construction of pointwise confidence intervals for the mean fit or bandwidth selection via rule-of-thumb. Figure 5 (b) plots the squared residual against the explanatory variable (average income) and the fitted curves that gives functional variance estimates by the local linear, RNW and conventional NW methods. To clarify the comparisons among these fitted curves five points with large residuals are not displayed, and the fitted curves are calculated over 186 levels of average incomes equally spaced from x = 711 to 19211. For illustration, we use the bandwidth h = 5000. It is clear that the local linear variance estimates are negative at small values of the average incomes, and the conventional NW estimates suﬀer from large biases. The RNW estimates appear to compromise very well between the two estimates above, and clearly describe the declining variances in a reasonable way (being always positive) when the level of average income is low. At moderate and high average incomes, the RNW variance estimates are very close to the local linear estimates, which is not surprising given their first-order asymptotic equivalence. Figure 6 shows the sensitivity of various functional variance estimates to the smoothing bandwidth h. We estimate σ 2 (x) at two levels of the average income x = 1000 and 6000 using 91 bandwidths equally spaced from h = 1000 to 10000. At the boundary point x = 1000, 4

The dataset is named Prestige in the car package of R in Fox (2002). See also Li and Racine (2007, Chapter 2.6).

14

negative estimates of the local linear fitting occur within the quite reasonable bandwidth range between 4000 and 6000, which could be highly likely to be chosen by empirical researchers. The RNW estimates generally lie between the local linear and the conventional NW estimates, and are apparently quite stable over various bandwidths. At the boundary point x = 6000, the three fitted curves are much closer to each other, and the RNW and LL curves are almost indistinguishable over all bandwidths considered.

4.3

Jump diﬀusion

The re-weighting idea developed in this paper can be also used to estimate the nonparametric continuous-time jump diﬀusion model. Jump diﬀusion models are widely used in finance to account for discontinuities in the sample path, and are more flexible than the single-factor or multi-factor pure diﬀusion models in generating higher moments that match those that are typically observed in financial time series (see, e.g. Bakshi et al., 1997, Pan, 2002, Johannes, 2004). We use T = 54 years of daily secondary market quotes for the 3-month T-bill from January 4, 1954 to March 13, 2008, containing n = 13538 observations,5 plotted in Figure 5 (a). The spot rate rt is assumed to follow the jump diﬀusion process

d log(rt ) = µ(rt )dt + σ(rt− )dWt + d

µ PNt

n=1

¶

Zn ,

where rt− = lims↑t rs , Wt is a standard Brownian motion, Nt is doubly stochastic point process with stochastic intensity λ(rt ) and Zn ∼ N(0, σ 2z ). We have assumed that the mean jump size is zero. The four quantities of estimating interest (the drift function µ(r), the diﬀusion function σ2 (r), the jump intensity λ(r), for a interest rate level r, and the jump variance σ2z ) can be identified for a suﬃciently small sampling interval ∆ from estimates of Mj (r) = E(log(rt+∆ /rt )j |rt = r)/∆ for j = 1, 2, 4, 6 via the following approximate moment 5

The dataset is available from the website http://research.stlouisfed.org/fred2 (Source: Board of Governors of the Federal Reserve System).

15

conditions:

M1 (r) ' µ(r), M2 (r) ' σ 2 (r) + λ(r)σ 2z , M4 (r) ' 3λ(r)σ 4z , M6 (r) ' 15λ(r)σ 6z . We use local linear fitting to estimate M1 (r), and apply the re-weighted NW method to estimate the even-order moments M2 (r), M4 (r) and M6 (r) to avoid the occasional but unreasonable negative estimates that result from local linear fitting. The estimates are decj (r), j = 1, 2, 4, 6. As in Johannes (2004), we first estimate σ 2z by integrating the noted as M

ratio of sixth-to-fourth moments over the stationary density with the same bandwidth for sT −1/5 = 2.1%, where sb is the standard deviation of the fourth and sixth moments h4 = 1.7b

the sample. The estimate σ b2z is 2.39 × 10−3 . Then, to estimate λ(r) we consider bandwidths

(j) c4 (r). To estimate σ 2 (r) we use the bandwidth h4 in h4 = 1.2j h4 (j = 0, 1, 2) in estimating M

c4 (r) and bandwidths h2(j) = 1.2j h2 (j = 0, 1, 2), where h2 = 1.3b sT −1/5 = 1.7%, estimating M

c1 (r) using the bandwidth h1(j) = 1.2j h1 , c2 (r). Lastly, µ(r) is estimated by M in estimating M

sT −1/5 = 3.5%. We characterize the bandwidths used in term of j = 0, 1, 2, where h1 = 2.8b cj (r) depends the time span T (instead of the sample size n) since the convergence rates of M

on T (or more generally, the local time), as shown by Bandi and Nguyen (2003), and the scale constants chosen are such that the resulting bandwidths are close to the ones reported in empirical studies of US short rates dynamics. b The estimated curves µ b(r), λ(r), σ b2 (r) are plotted in Figure 7 (b), Figure 8 (a) and

(b) respectively. They are expected to have smaller biases than the estimates of Johannes

(2004) and Bandi and Nguyen (2003) which are based on local constant estimation of the four moments.6 Figure 8 (b) also contains the estimates (the higher three lines) of the total volatility σ 2 (r) + λ(r)σ2z . It implies that for most short rate levels the diﬀusion components explain about two thirds of the total volatility and the jump components account for about 6

Limit theories for the local linear and the re-weighted NW estimators of the four moments in the jump diﬀusion model are not yet available in the literature but can be studied along the line of Bandi and Nguyen (2003). For the pure diﬀusion models (where σ 2z = 0), the asymptotic theories for these two methods are studied by Moloche (2001), Fan and Zhang (2003) and Xu (2006).

16

a third. This can be compared with Johannes (2004) who used a subset of our data and found that jump typically generates more than half the volatility of interest rate changes and Eraker et al. (2003) who found jump in equity indices explains 10-15 percent of the return volatility.

5

Summary

This paper provides a new nonparametric approach to estimating the volatility function based on the maximization of the empirical likelihood subject to a bias-reducing moment restriction. It is fully adaptive to the unknown mean function. Its construction does not depend on the error distribution, and it is applicable in quite general time series settings. The new estimator preserves the appealing design adaptive, bias and automatic boundary correction properties of the local linear estimator, while it is guaranteed to be non-negative in small samples. Moreover, compared with other positive variance estimators previously proposed in the literature, it has a simpler form and is easier to use. Numerical examples suggest that the new estimator possesses good performance in finite samples and is a promising competitor in estimating conditional variance functions.

17

6

Appendix

This section provides the proofs of Theorems 1 and 2. For simplicity, we write the weights in the re-weighted NW estimator w bi (x) as wi . b i ) = [m(Xi ) − m(X b i )] + σ(Xi )εi and so Proof of Theorem 1. Note that rbi = Yi − m(X

Thus by (3)

rbi2

=σ

2

(Xi )ε2i

σ b2 (x) − σ 2 (x) = Pn

i=1

wi K

µ

Pn

Xi −x h

i=1

¶

wi K

2 T3 =

µ

¶2

+ 2σ(Xi )εi m(Xi ) − m(X b i ) + m(Xi ) − m(X b i)

where

T1 =

¶

µ

σ µ

2

(Xi )(ε2i

Xi −x h

Pn

wi K

i=1

and

¶

Pn

i=1

T4 =

− 1)

4 X

(16)

Tj ,

j=1

Pn

i=1

, T2 =

wi K

µ

Xi −x h

Pn

Xi −x h

wi K

¶

µ

¶

¶µ

wi K

i=1

µ

(15)

.

σ 2 (Xi ) − σ 2 (x) ¶ µ , Xi −x h

¶

σ(Xi )εi m(Xi ) − m(X b i) ¶ µ , Pn X −x i i=1 wi K h µ

Xi −x h

Pn

i=1

¶2

¶µ

wi K

b i) m(Xi ) − m(X ¶ µ

.

Xi −x h

(i). Suppose that x is such that x ± h is in the support of f (x). Since an absolutely regular time series is α−mixing, Lemma A2 in Cai (2001) holds under our assumptions, i.e. R 0 1 f (x) + Oa.s. (h3 ), where υ 2 = u2 K 2 (u)du, and λ = − hK υ2 f (x) −1

wi = n

µ

1−

hK1 f 0 (x) (Xi υ 2 f (x)

¶−1

− x)Kh (Xi − x)

18

(1 + op (1)),

(17)

Consider the term T2 first. The denominator of T2 times 1/h is 1X wi K h i=1 n

µ

Xi −x h

¶

1 X = K nh i=1 n

µ

Xi −x h

¶

p

+ op (1) → f (x),

(18)

by (17) and an application of Birkhoﬀ’s ergodic theorem (see, e.g., Shiryaev, 1995) provided µ µ ¶¶ ¶ µ R 1 1 X −x u−x i that E h K =h K f (u)du → f (x) as h → 0 after a change of variables. h h By a Taylor expansion of σ 2 (Xi ) at x and the discrete moment condition (5), the numerator

of T2 times 1/h is

1X wi K h i=1 n

1X = wi K h i=1 n

µ µ

Xi −x h

Xi −x h

¶

¶µ ¶

σ 2 (Xi ) − σ 2 (x)

1 2 ( σ ¨ (x)(Xi − x)2 + o((Xi − x)2 ) 2

h2 ¨ 2 (x) + op (h2 ), f (x)K1 σ 2

=

(19)

by (17) and the ergodic theorem. Combining (18) and (19) gives T2 =

h2 K1 σ ¨ 2 (x) 2

+ op (h2 ).

Noting (17) and (18), it follows from Fan and Yao (1998, the proof of Theorem 1, (b)-(d)) ¶ µ √ d 4 (x)ξ 2 (x) K σ , and T3 , T4 = op (h2 + h02 ). Hence by (16) Theorem (i) that nhT1 → N 0, 2 f (x) holds.

(ii). Suppose that f (x) has a bounded support [a, b] and x = a + ch (0 < c < 1). By Lemma A.3 in Cai (2001),

wi =

1 (1 + op (1)). n(1 − λc (Xi − a − ch)Kh (Xi − a − ch))

Consider the term T2 in (16). Note that

1X wi K h i=1 n

µ

Xi −a−ch h

¶

µ

¶

K Xi −a−ch n h 1 X p = + op (1) → K 0 f (a), nh i=1 1 − λc (Xi − a − ch)Kh (Xi − a − ch) (20)

19

by the ergodic theorem provided that

E

Ã

$

#

Xi −a−ch h 1 h 1−λc (Xi −a−ch)Kh (Xi −a−ch) K

!

=

Z

µ

→

¶

K 1 f (z)dz h 1 − λc (z − a − ch)Kh (z − a − ch)

b

a

Z

z−a−ch h

c

−1

K(u)du f (a) = K 0 f (a), 1 − λc uK(u)

as h → 0 after a change of variables. By a Taylor expansion of σ 2 (Xi ) at a + ch and the discrete moment condition (5),

1X wi K h i=1 n

1X = wi K h i=1 n

µ µ

Xi −a−ch h

Xi −a−ch h

¶

¶µ ¶

σ 2 (Xi ) − σ 2 (a + ch)

1 2 ( σ ¨ (a + ch)(Xi − a − ch)2 + o((Xi − a − ch)2 ) 2

σ 2 (a + ch) h2 K 1 f (a)¨ + op (h2 ), = 2

again by the ergodic theorem. Thus, by (20) T2 =

h2 K 1 2 σ ¨ (a 2K 0

+ ch) + op (h2 ). Following the

proof of Theorem 1 in Fan and Yao (1998), it can be proved that T3 , T4 = op (h2 + h02 ) and T1 is asymptotically normal with mean zero and variance 1/nh times (noting (20)) 1 2

hK 0 f 2 (a) = →

1

¶ µ ¶2 µ 2 2 σ (Xi )(εi − 1) E nwi K Xi −a−ch h µ

1 K (1−λc (Xi −a−ch)Kh (Xi −a−ch))

E 2 hK 0 f 2 (a) Z cµ 1 2

K 0 f 2 (a)

−1

K(u) 1−λc uK(u)

¶2

µ

Xi −a−ch h

duσ 4 (a)ξ 2 (a)f (a) =

¶

σ

2

(Xi )(ε2i

K 2 σ 4 (a)ξ 2 (a) 2

K 0 f (a)

So by (16) the proof of (ii) is complete.

20

.

¶2 − 1) + op (1)

Proof of Theorem 2. (i). We write Vb (x) = Vb1 (x) + Vb2 (x) + Vb3 (x), where nX 2 2 w K Vb1 (x) = h i=1 i

µ

¶

2nb σ 2 (x) X 2 2 Xi −x b V2 (x) = − wi K h h i=1 µ ¶ n nb σ 4 (x) X 2 2 X −x b i V3 (x) = . wi K h h i=1 n

n

rbi4 ,

µ

Xi −x h

¶

rbi2 ,

Consider the term Vb1 (x) first. By (15), we have rbi4

4

+ 4σ

2

(Xi )ε2i

¶2

µ

µ

¶4

3 3 b i ) + m(Xi ) − m(X b i ) + 4σ (Xi )εi · m(Xi ) − m(X ¶ ¶2 ¶3 µ µ 2 2 b i ) + 2σ (Xi )εi m(Xi ) − m(X b i ) + 4σ(Xi )εi m(Xi ) − m(X b i) , m(Xi ) − m(X

= σ µ

(Xi )ε4i

P and denote Vb1 (x) = 6j=1 Sj , where

nX 2 2 = w K h i=1 i n

S1

S2 = S3 = S4 = S5 = and S6 =

µ

Xi −x h

¶

σ 4 (Xi )ε4i ,

¶2 µ ¶ µ n 4n X 2 2 X −x 2 2 i σ (Xi )εi m(Xi ) − m(X w K b i) h h i=1 i ¶4 µ ¶µ n n X 2 2 X −x i w K b i) , m(Xi ) − m(X h h i=1 i ¶ µ ¶ µ n 4n X 2 2 X −x 3 3 i σ (Xi )εi m(Xi ) − m(X w K b i) , h h i=1 i ¶2 µ ¶ µ n 2n X 2 2 X −x 2 2 i σ (Xi )εi m(Xi ) − m(X w K , b i) h h i=1 i ¶3 µ ¶ µ n 4n X 2 2 X −x i σ(Xi )εi m(Xi ) − m(X w K b i) . h h i=1 i

Similar to the analysis of the term T1 in the proof of Theorem 1 (i), we have

21

√ n√ n h

Pn

i=1

wi2 ·

K

2

µ

Xi −x h

¶

σ 4 (Xi )(ε4i − (ξ 2 (x) + 1)) = Op (1) provided that µ ¶ µ ¶2+δ/2 2 2 2 4 4 X −x i σ (Xi )(εi − (ξ (x) + 1)) E wi K < ∞, h

which holds by assumptions. Thus S1 = Se1 + op (1), where n Se1 = (ξ 2 (x) + 1) h

n X i=1

wi2 K 2

µ

Xi −x h

¶

p

σ 4 (Xi ) → (ξ 2 (x) + 1)K2 σ 4 (x)f (x)

by the ergodic theorem. In view of (17), it follows from Fan and Yao (1998, the proof of p Theorem 1 (c)) that Si = op (1) for i = 2, · · · , 6. Thus, Vb1 (x) → (ξ 2 (x) + 1)K2 σ 4 (x)f (x).

p p Similarly using (15) we can show that Vb2 (x) → −2K2 σ 4 (x)f (x). Lastly Vb3 (x) → K2 σ 4 (x)f (x)

p by noting (17). So Vb (x) → ξ 2 (x)K2 σ 4 (x)f (x) and Theorem 2 (i) follows from (18).

(ii). This can be proved as in (i) using the arguments in the proof of Theorem 1 (ii).

22

References Aït-Sahalia, Y., 1996, Nonparametric pricing of interest rate derivative securities. Econometrica 64, 527-560. Bakshi, G., C. Cao and Z. Chen, 1997, Empirical performance of alternative option pricing models. Journal of Finance 52, 2003-2049. Bandi, F. and T. Nguyen, 2003, On the functional estimation of jump-diﬀusion processes. Journal of Econometrics 116, 293-328. Bandi, F. and P.C.B. Phillips, 2003, Fully nonparametric estimation of scalar diﬀusion models. Econometrica 71, 241-283. Bossaerts, P., W. Hardle and C. Hafner, 1996, Foreign exchange-rates have surprising volatility, in: P.M. Robinson, (Eds.), Athens conference on applied probability and time series, Vol. 2, Springer, New York, pp. 55-72. Cai, Z., 2001, Weighted Nadaraya-Watson regression estimation. Statistics and Probability Letters 51, 307-318. Cai, Z., 2002, Regression quantiles for time series. Econometric Theory 18, 169-192. Carroll, R.J., 1982, Adapting for heteroscedasticity in linear models. Annals of Statistics 10, 1224-1233. Carroll, R. and D. Ruppert, 1988, Transforming and weighting in regression. Chapman and Hall, London. Chen, S.X. and Y. Qin, 2002, Confidence interval based on a local linear smoother. Scandinavian Journal of Statistics 29, 89-99. Dahl, C.M. and M. Levine, 2006, Nonparametric estimation of volatility models with serially dependent innovations. Statistics and Probability Letters 76, 2007-2016. Davidson, J., 1994, Stochastic limit theory: An introduction for econometricians. Oxford University Press, Oxford. De Gooijer, J.G. and D. Zerom, 2003, On conditional density estimation, Statistica Neerlandica 57, 159-176.

23

Engle, R.F., 1982, Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. inflation. Econometrica 50, 987-1008. Eraker, B., M. Johannes and N. Polson, 2003, The impact of jumps in volatility and returns. Journal of Finance 58, 1269—1300. Fan, J. and I. Gijbels, 1996, Local polynomial modeling and its applications. Chapman and Hall, London. Fan, J. and Q. Yao, 1998, Eﬃcient estimation of conditional variance functions in stochastic regression. Biometrika 85, 645-660. Fan, J. and C. Zhang, 2003, A re-examination of Stanton’s diﬀusion estimations with applications to financial model validation. Journal of American Statistical Association 98, 118-134. Fox, J., 2002, An R and S-PLUS companion to applied regression. Sage, Thousand Oaks. Gozalo, P. and O. Linton, 2000, Local nonlinear least squares: Using parametric information in nonparametric regression. Journal of Econometrics 99, 63-106. Härdle, W., 1990, Applied nonparametric regression. Cambridge University Press, Cambridge. Härdle, W. and A.B. Tsybakov, 1997, Local polynomial estimators of the volatility function in nonparametric autoregression. Journal of Econometrics 81, 223-242. Hall, P. and R.J. Carroll, 1989, Variance estimation in regression: The eﬀect of estimating the mean. Journal of the Royal Statistical Society B 51, 3-14. Hall, P. and L.-S. Huang, 2001, Nonparametric kernel regression subject to monotonicity constraints. Annals of Statistics 29, 624-647. Hall, P. and B. Presnell, 1999, Intentionally biased bootstrap methods. Journal of the Royal Statistical Society B 61, 143-158. Hall, P., R.C.L. Wolﬀ and Q. Yao, 1999, Methods for estimating a conditional distribution function. Journal of the American Statistical Association 94, 154-163. Hjort, N.L. and M.C. Jones, 1996, Local parametric nonparametric density estimation. Annals of Statistics 24, 1619-1647. Li, Q. and J.S. Racine, 2004, Cross-validated local linear nonparametric regression. Statistica 24

Sinica 14, 485-512. Li, Q. and J.S. Racine, 2007, Nonparametric econometrics: Theory and practice, Princeton University Press, Princeton. Loader, C., 1999, Local regression and likelihood. Springer-Verlag, New York. Johannes, M., 2004, The statistical and economic role of jumps in continuous-time interest rate models. Journal of Finance 59, 227-260. Martins-Filho, C. and F. Yao, 2006, Estimation of value-at-risk and expected shortfall based on nonlinear models of return dynamics and extreme value theory. Studies in Nonlinear Dynamics and Econometrics 10 (2), Article 4. Martins-Filho, C. and F. Yao, 2007, Nonparametric frontier estimation via local linear regression. Journal of Econometrics 141, 283-319. Moloche, G., 2001, Local nonparametric estimation of scalar diﬀusions. Unpublished paper, MIT. Müller, H.G. and U. Stadtmüller, 1987, Estimation of heteroscedasticity in regression analysis. Annals of Statistics 15, 610-625. Owen, A., 2001, Empirical likelihood. Chapman and Hall/CRC. Pan, J., 2002, The jump-risk premia implicit in options: Evidence from an integrated timeseries study, Journal of Financial Economics 63, 3-50. Porter, J., 2003, Estimation in the regression discontinuity model. Working paper, Department of Economics, University of Wisconsin. Ruppert, D., M.P. Wand, U. Holst and O. Hössjer, 1997, Local polynomial variance function estimation. Technometrics 39, 262-273. Shephard, N., 2005, Stochastic volatility: Selected readings. Oxford University Press, Oxford. Shiryaev, A.N., 1995, Probability. Springer-Verlag, New York. Stanton, R., 1997, A nonparametric model of term structure dynamics and the market price of interest rate risk. Journal of Finance 52, 1973-2002. Tibshirani, R. and T. Hastie, 1987, Local likelihood estimation. Journal of the American Statistical Association 82, 559-568. 25

Tong, H., 1990, Nonlinear time series analysis: A dynamic approach. Oxford University Press, Oxford. Xu, K.-L., 2006, Re-weighted functional estimation of nonlinear diﬀusions. Working paper, Yale University. Yu, K. and M.C. Jones, 2004, Likelihood-based local linear estimation of the conditional variance function. Journal of the American Statistical Association 99, 139-144. Ziegelmann, F.A., 2002, Nonparametric estimation of volatility functions: The local exponential estimator. Econometric Theory 18, 985-991.

26

(b) Linear Variances

3

3

2

2

1

1 Y(i)

Y(i)

(a) Exponential Variances

0

0

−1

−1

−2

−2

−3 −2

−1

0 X(i)

1

−3 −2

2

−1

0 X(i)

1

2

Figure 1: Two representative samples of 100 observations and the true mean curve with exponential variances (Panel a) and linear variances (Panel b).

(a) h=0.4

(b) h=0.5

0.06 MAD

MAD

0.06 0.04 0.02

0.04 0.02

LL

RNW

LE

LML

LL

(c) h=0.6

LE

LML

(d) h=0.7

0.06 MAD

0.06 MAD

RNW

0.04 0.02

0.04 0.02

LL

RNW

LE

LML

LL

RNW

LE

LML

Figure 2: Exponential variances: Boxplots of the mean absolute deviation errors (MAD) for the local linear (LL), re-weighted NW (RNW), local exponential (LE) and local maximum likelihood (LML) estimators with 4 fixed bandwidths: (a) h = 0.4;(b) h = 0.5;(c) h = 0.6;(d) iid h = 0.7, based on 500 replications. The sample size n = 100. εi ∼ N (0, 1). 27

(b) h=0.5 0.2

0.15

0.15

MAD

MAD

(a) h=0.4 0.2

0.1

0.1

LL

RNW

LE

LML

LL

LE

LML

(d) h=0.7

0.2

0.2

0.15

0.15

MAD

MAD

(c) h=0.6

RNW

0.1

0.1

LL

RNW

LE

LML

LL

RNW

LE

LML

Figure 3: Linear variances: Boxplots of the mean absolute deviation errors (MAD) for the local linear (LL), re-weighted NW (RNW), local exponential (LE) and local maximum likelihood (LML) estimators with 4 fixed bandwidths: (a) h = 0.4;(b) h = 0.5;(c) h = 0.6;(d) iid h = 0.7, based on 500 replications. The sample size n = 100. εi ∼ N (0, 1).

28

(b) h=0.5

0.1

0.1

0.08

0.08

MAD

MAD

(a) h=0.4

0.06 0.04

0.06 0.04

LL

RNW

LE

LML

LL

LE

LML

(d) h=0.7

0.1

0.1

0.08

0.08

MAD

MAD

(c) h=0.6

RNW

0.06 0.04

0.06 0.04

LL

RNW

LE

LML

LL

RNW

LE

LML

Figure 4: Exponential variances: Boxplots of the mean absolute deviation errors (MAD) for the local linear (LL), re-weighted NW (RNW), local exponential (LE) and local maximum likelihood (LML) estimators with 4 fixed bandwidths: (a) h = 0.4;(b) h = 0.5;(c) h = 0.6;(d) iid h = 0.7, based on 500 replications. The sample size n = 100. εi ∼ (0.6)1/2 t5 .

29

2

(b) Estimates of σ (x) (h=5000)

(a) Estimates of m(x) (h’=5809) 90

250 Squared Residuals

70

Prestige

60 50 40

200 150 100 50

30

0

20 10 0

LL RNW NW 2 r

300

80

0.5

1

1.5 Income

2

2.5

3

−50 0

0.5

4

x 10

1 Income

1.5

2 4

x 10

Figure 5: Prestige vs. Income: (a) local linear estimation of the mean function using the bandwidth h0 = 5809; (b) estimates of the variance function based on the squared residuals using the local linear (LL), re-weighted Nadaraya-Watson (RNW) and conventional NadarayaWatson (NW) methods with the bandwidth h = 5000.

2

2

(a) Estimates of σ (x), x=1000

(b) Estimates of σ (x), x=6000

140 LL RNW NW

160

120 100

150

80 140 60 130

40 20

120

0 110 −20 −40

2000

4000 6000 Bandwidth h

8000

10000

100

2000

4000 6000 Bandwidth h

8000

10000

Figure 6: Prestige vs. Income: estimates of the variance function over bandwidths using LL, RNW and NW methods when the design point (a) x = 1000; (b) x = 6000.

30

(a)

−3

1.5

x 10

(b) h(1) 1

1

h(2) 1

15

h(3) 0 Drift Coefficient

T−bill Rate Level (Percent)

1

10

−1

5 −2

0 1955

1970

1980 Year

1990

−3

2000 2008

5

10 Interest Rate Level (Percent)

15

Figure 7: (a) The time series of daily 3-month Treasury bill rates (secondary market rates) from January 4, 1954 to March 13, 2008; (b) the local linear estimators of the drift function using three bandwidths 3.5%, 4.2% and 5.0%. (a)

(b)

40

0.16 h(1) 4 h(2) 4

35

0.12 Diffusion Coefficient

30

Jump Intensity

0.14

h(3) 4

25 20 15 10

0.1 0.08 0.06 h(1) 2

0.04

h(2) 2 5 0

0.02

5

10 Interest Rate Level (percent)

0

15

h(3) 2 5

10 Interest Rate Level (percent)

15

Figure 8: (a) The re-weighted NW estimators of the jump intensity using three bandwidths; c2 (r) (the higher three lines) and (b) the re-weighted NW estimators of the second moment M the diﬀusion coeﬃcient over three bandwidths respectively. 31