Bayesian Semiparametric Estimation of Elliptic Densities Alessio Sancetta∗ January 14, 2009

Abstract The class of elliptic distributions is fully determined by a scaling matrix and a function with domain in the positive reals. It contains the Gaussian distribution and the t-distribution often used in applications as well as many others. This class is flexible enough to account for fat tails and complex forms of dependence, e.g. tail dependence. At the same time it retains the conceptual simplicity and interpretability of the Gaussian distribution. Here we establish weak conditions for posterior consistency (in a frequentist sense) of the Bayesian semiparametric estimator of elliptic densities when the scaling matrix is full rank and the generator possesses the consistency property (Kano, 1994). Conditions for consistency when the observations are pre-whitened are given. This is particular useful in large dimensional time series problems where it is easier to model the marginal dynamics first.

1

Introduction

Multivariate distributions are complex objects and it is difficult to model the distribution of a multivariate random variable X = (X1 , ..., XK ). When K is large, nonparametric methods are not suitable and parametric restrictions are necessary. Here, the class of possible multivariate densities will be restricted to the elliptic class. Elliptic distributions are routinely used in multivariate analysis (e.g. Fang et al., 2002, Hult and Lindskog, 2002, and references therein). This class partially retains the simplicity of a Gaussian density, but can also account for a more complex dependence structure, like tail dependence (e.g. Frahm et al., 2003, Hult and Lindskog, 2002, for a discussion and references). Tail dependence is of particular importance in financial applications (e.g. Maleverge and Sornette, 2003) and flexible methods that allow to estimate families of distributions exhibiting this dependence property should be welcomed. We shall only consider elliptic densities with full rank scaling matrix Σ, which is also an unknown parameter. Moreover, the density will have to satisfy the consistency property. This means that if the K dimensional elliptic density is in some subclass (e.g. Gaussian) then the K − 1 dimensional marginals are also in that same subclass. The ∗ Address

for Correspondence: Via Flaminia Nuova 213, 00191 Roma, Italy. E-mail: ; webpage: . Acknowledgment: I thank Professor Stephen G. Walker for explaining me the extension of one of his results.

1

consistency property has been defined and investigated in Kano (1994) and further results -using a different terminology- have been reported in Gómez-Sánchez-Manzano et al. (2006). Liebscher (2005) details an estimation procedure based on kernel estimators without the requirement of the consistency property. Definition of this estimator is not trivial and requires careful asymptotic analysis. Indeed the estimator is based on functional transformation of the data, where the transform needs to satisfy certain asymptotic conditions. Imposing the consistency property, we loose in generality, but also make the estimation procedure simpler and amenable to Bayesian estimation. Indeed, the goal of the paper is to define a Bayesian semiparametric estimator for this class of densities and provide conditions under which the posterior is strongly consistent. There is a rich literature on Bayesian estimation for infinite dimensional parameters (e.g. Barron et al, 1999, Ghosal et al., 1999, 2000, Ghosal and van der Vaart, 2007a, 2007b, Kleijn and van der Vaart, 2006, Lijoi et al., 2005, Walker, 2004, Walker et al., 2007); see Walker 2004 for a review of results up to 2004. These results are directly relevant to this study. However, considerable effort is required to establish primitive weak and easy to verify conditions that lead to consistency for the Bayesian semiparametric estimator of an elliptic density. In the multivariate case, details differ. Moreover, here we are interested in a mixture with respect to the scaling parameter rather than the location parameter. Hence, existing results cannot be directly used (e.g. Ghosal et al., 1999, Lijoi et al., 2005). It is worth stressing that the conditions under which we achieve consistency are very weak. On top of other reasons, this is possible invoking an argument in Lijoi et al. (2005); details will be given in Section 2.2.2. The Bayesian procedure considered here relies on independent identically distributed (iid) data. However, we shall also allow the data to be subject to estimation error and show that consistency is still possible. This is particularly useful when a prewithneing filter is applied to non-iid data. To the author’s knowledge, this is the first study that proposes a consistent Bayesian estimator for the class of elliptic densities with the consistency property where the data are also subject to estimation error. Below, we shall review some basic definitions concerning elliptic densities. Section 2 formally introduces the estimation problem and studies the frequentist properties of the Bayesian estimator of the density when the data are independent identically distributed (iid). Conditions for posterior strong consistency when the data are subject to error are also established and these are useful when we deal with dependent observations. Section 3 contains further brief remarks on estimation and approximate estimation procedures which can be easily implemented using standard statistical packages like R. The proofs are deferred to Section 4.

1.1

Background on Elliptic Densities

The class of K dimensional elliptic densities depends on a K × K scaling matrix Σ and an infinite dimensional parameter g, a positive function with support in the positive reals. The function g : R → R is called the generator of the elliptic distribution. Let X = (X1 , ..., XK ) be a mean zero random variable with values in RK . Then, X is elliptically distributed with full rank scaling matrix Σ if and only if its density at x is −1/2

|Σ|

! " g x" Σ−1 x 2

(1)

and the prime stands for the transpose (see Fang et al., 2002, for more details on the above). Note that if Σ is not full rank, the representation used above does not hold (see Hult and Lindskog, 2002, for the general case). In this paper, attention is restricted to the subclass of elliptic densities with full rank scaling matrix Σ. By the properties of elliptic random variables (Hult and Lindskog, 2002), X has elliptic distribution (1) if and only if the following stochastic representation holds d

X = RAS

(2)

d

where = is equality # $ in distribution, S is uniformly distributed on the unit hypersphere s ∈ RK : s" s = 1 , A is full rank such that AA" = Σ and R is a positive real random variable independent of S. The generator g is uniquely determined by R up to a scaling factor; i.e. X with scaling matrix Σ and generator g (y) has same distribution as Z with scaling matrix vΣ and generator g (yv). In particular, g (y) =

Γ (K/2) −(K−1)/2 % 1/2 & y fR y , 2π K/2

where fR is the density function of R (e.g. Cambanis et al., 1981, eq. 20). While the notation does not make it explicit, the above display shows that the gen−K/2 erator g depends on K; e.g. for a Gaussian density g (y) = (2π) exp {−y/2}. To introduce the following definition, let us make this dependence explicit by writing g = gK for the generator of a K dimensional elliptic density. If X = (X1 , ..., XK ) has density with generator gK and X−k := (X1 , ..., Xk−1 , Xk+1 , ..., XK ) has density with generator gK−1 for any k, then the elliptic density is said to possess the consistency property. Many elliptic densities have this property (e.g. the Gaussian, the t-distribution, etc.), but not all (Kano, 1994, for a discussion and examples; see also Gómez-Sánchez-Manzano et al., 2006). In applications it can be desirable to enforce the consistency property. Using the above notation, the consistency property ensures that if X has, say, a t-distribution with r degrees of freedom, then X−k will also have a t-distribution with r degrees of freedom. In this paper, we shall restrict attention to elliptic densities with the consistency property (see Liebscher, 2005, for kernel density estimation of elliptic densities without requiring the consistency property).

1.2

Modelling Time Series Observations

In the iid framework of the present paper, an elliptic density can still be used to construct parsimonious multivariate models for high dimensional data in a time series context. To provide some motivation for the estimation of an elliptic density and for the sake of definiteness we focus on a generalisation of the Constant Conditional Correlation GARCH model of Bollerslev (1990). To this end, suppose Zi = (Zi1 , ..., ZiK ) has components each following a univariate t-GARCH model with conditional mean vector µi and with same degrees of freedom. This implies that Zi = µi + diag {σi1 , ..., σiK } Xi

(3)

where diag {σi1 , ..., σiK$} is the diagonal matrix of conditionally heteroskedastic processes # (σik )i>0 ; k = 1, ..., K and (Xi )i>0 is a sequence of iid t-distributed random variables 3

with mean zero, correlation matrix proportional to the scaling matrix Σ and r degrees of freedom. By the condition on (Xi )i>0 , it must follow that Σ is time invariant as postulated in Bollerslev (1990) model and that also the degrees of freedom r are time invariant. Restrictions are also imposed depending on the specific problems. For example, in finance applications, Zi represent the log return on a vector of assets and, for each k, µik = 0 (using unpredictability of returns) and σik is often assumed to depend on (Zsk )s0 we observe realisations from Xi , a noisy version i>0 # −1 $ −1 ˜ i = diag σ of Xi , e.g. X ˆi1 , ..., σ ˆiK (Zi − µˆi ) where µˆi , σ ˆi1 , ..., σ ˆiK are consistent estimators of the true means and univariate scaling values obtained by using some procedure, not necessarily Bayesian. This clearly covers two stage estimation of the density of Zi . We conclude stressing that the (conditional and unconditional) time invariance of the scaling matrix Σ and the generator g is assured by the conditions on (Xi )i>0 , which is supposed to be iid. For example, within the context of the model in (3), this seems to be overlooked by some authors (e.g. see the DCC model of Engle, 2002, where, in the conditionally Gaussian case, Σ is erroneously treated as a time varying conditional correlation matrix; see Sancetta and Nikandrova, 2008, section 2.1.1 for a more general discussion of this and related issues on modelling time series and cross-sectional dependence separately). Dropping the identically distributed condition on the data, both Σ and g can be time inhomogeneous. However, in this paper we will not allow any heterogeneity in either the scaling matrix or the elliptic generator and we will just restrict attention to (Xi )i>0 being iid. Frequentist methods that assume local stationarity often allow to deal with time inhomogeneous parameters under suitable conditions (e.g. Dahlhaus, 1997).

2

Estimation Problem

The goal of this paper is to give weak conditions for posterior consistency of the elliptic density fg (u|Σ). We address this issue restricting attention to g that possesses the consistency property. Then, we have the following. Theorem 1 Any elliptic density with the consistency property, mean zero and full rank scaling matrix Σ can be written as F (x|Σ) = EΦ (x|Σ/V )

(4)

where Φ (x|Σ/V ) is the Gaussian density with random covariance matrix Σ/V , and V is some positive random variable. The proof is immediately deduced from Theorem 1(iv.) in Kano (1994). By Theorem 1, the problem of consistently estimating the density of F (x|Σ) is equivalent to the problem of estimating the finite dimensional parameter Σ and the law of V , which we denote by P , an infinite dimensional parameter. Next we define the Bayesian semiparametric estimator and show that its posterior is consistent.

4

2.1

Nonparametric Bayesian Estimation

By Theorem 1, the problem reduces to the estimation of a mixture of Gaussian densities. Bayesian nonparametric estimation of mixtures models requires to select a prior measure Π on the set P of distributions with support, in this case, in the positive real line. If also Σ is unknown, the prior Π is assumed to have support in Θ = C × P where C is a suitable subset of the set of positive definite matrices (details to be given in due course). The prior Π through the map (Σ, P ) =: θ %→ fθ induces a prior on the set of elliptic densities ' ) ( ∞ fθ (u) = φ (u|Σ/v) P (dv) : (Σ, P ) ∈ C × P = Θ . (5) 0

Then, the posterior Πn induces a random density as follows ( fn (x) = fθ (x) Πn (dθ)

(6)

Θ

where, for A ⊆ Θ,

* +n fθ (Xi ) Π (dθ) Πn (A) := *A +i=1 n i=1 fθ (Xi ) Π (dθ) Θ

and (Xi )i>0 are iid random variables with values in RK and joint density in (5). By the remarks in Section 1.2, if we assume a previsible affine transform as in (3) and the parameters in the transformation are known, there is no loss to assume that the data are iid. The case when the data are subject to estimation error will be considered later and this shall cover the case when parameters are estimated.

2.2

Consistency of Posterior

We need to recall some definitions. For two arbitrary measures P and Q, on some set X with densities p and q w.r.t. * some dominating measure µ, their Kullback-Leibler distance is defined as D (P, Q) = X ln (p/q) dP while their Hellinger and total variation distance ,* , , * √ √ ,2 ,1/2 , are defined as dH (P, Q) = , X , p − q , dµ, and dT V (P, Q) = X |p − q| dµ and they are just the L2 distance of the square root of two densities and the L1 distance of densities. The following relations are known (e.g. Pollard, 2002, p. 61-62): 2

1/2

dH (P, Q) ≤ dT V (P, Q) ≤ dH (P, Q) ≤ D (P, Q)

(7)

showing that dT V and dH are topologically equivalent. The relation between total variation and Kullback-Leibler distance is often called Pinsker inequality. We define the following sets K" := {θ ∈ Θ : D (Fθ0 , Fθ ) ≤ %} (8) A" := {θ ∈ Θ : dT V (Fθ0 , Fθ ) > %} .

(9)

The support of Π is in Θ = (C, P) and we write ΠΣ (•) = Π (•, P) and ΠP (•) = Π (C, •) for the marginals. An element P0 ∈ P is said to be in the support of ΠP if every open neighborhood of P0 in the weak topology is given positive ΠP measure. We shall establish a.s. convergence to zero of the posterior over A" with respect to the “true” ∞-fold product measure Fθ∞ , the infinite product measure induced by the density fθ0 , assumed to be the 0 5

true density of the data. We shall use linear functional notation for the expectation w.r.t. Π. Hence, Π (A) means expectation of A under the prior Π, where A can be a set or some other object for which the expectation makes sense (i.e. measurable). Introduce the following conditions that will be discussed at length after the statement of the main results. Recall that K is the dimension of Xi . Condition 2 (Xi )i>0 is iid mean zero with elliptic distribution satisfying the consistency property and scaling matrix Σ0 . *∞ Condition 3 0 v (K/2+α) P0 (dv) < ∞ for some α > 0 and P0 ({0}) = 0. Condition 4 Σ0 ∈ C where C is the class of positive definite matrices Σ such that th maxi,j |Σij | ≤ C for some finite absolute constant C (Σij is the the (i, j) entry of Σ).

(K/2+α) Condition 5 (i.) For Π.& P (P ([a, ∞))) → 0 for some α > 0; (ii.) for %-a → 0, a −1 K+α M → ∞, M ΠΣ Σ ∈ C : |Σ| > M → 0 for some α > −1 (|Σ| is the determinant of Σ).

These conditions are sufficient for strong consistency of the posterior. Theorem 6 Suppose that θ0 is in the support of Π. Under Conditions 2, 3, 4, and 5, Πn (A" ) → 0, Fθ∞ -a.s. 0 Remark 7 The above conditions imply that if θ0 is in the support of Π then Π (K" ) > 0, i.e. θ0 is also in the Kullback-Leibler support of Π. A common example of nonparametric prior is the Dirichlet process D (ν, G) where ν ∈ (0, ∞) is a scaling parameter controlling the confidence on the mean prior probability measure G with support [0, ∞). In particular, we derive consistency using D (ν, G) as prior. It is convenient to use following hierarchical representation Xi |Vi , Σ Vi |P P Σ

∼ Φ (u|ΣVi ) ∼ P

∼ D (ν, G) ∼ PΣ .

Corollary 8 For the above hierarchical representation, suppose that v (K/2+α) G ([v, ∞)) → 0, as v → ∞, for some α > 0, and PΣ is inverse Wishart. Then, Πn (A" ) → 0, Fθ∞ -a.s. 0 for any P0 satisfying Condition 3 and any Σ0 ∈ C. 2.2.1

Remarks on Condition 3 and 4

Conditions 3 and 4 restrict the class of elliptic distributions. In particular, Condition 3 restricts the mixing measure but does not rule out tail dependence (e.g. Joe, 1997 for definitions). To appreciate this point, note that if X is a K dimensional student d

t random variable with r degrees of freedom (d.f.) and scaling matrix Σ, then X = √ d N (0, Σ) V where V = r/χ2(r) (χ2(r) is chi-square with r d.f.). Since P0 is the law of 6

*∞ V , then 0 v K/2+α P0 (dv) < ∞ for any r > 0 and K (finite). Note that the condition P0 ({0}) = 0 rules out degeneracies and implies that we can restrict the parameter space P to be the set of measure with positive support such that if P ∈ P then, P0 ({0}) = 0. Finally, the full rank condition of Σ is required for the representation in (4) to be true and cannot be weakened. 2.2.2

Remarks on Condition 5

Condition 5(i.) is weak. Mutatis mutandis, the literature usually assumes exponential tails for the prior of the mixing distribution; e.g. Ghosal et al. (1999). Here the mixing variable is the scaling parameter rather than the location parameter as in the Bayesian nonparametric literature. Lijoi et al. (2005) show how to weaken the requirement to polynomial rate. It seems reasonable that if the true mixing measure P0 needs to satisfies Condition 3, then the mean of P under the prior ΠP also satisfies Condition 3, as implied by Condition 5(i.). Under the Dirichlet process prior D (ν, G), this implies that G satisfies Condition 3. In many practical situations, Condition 5(ii.) is weak enough to avoid worrying about the tail condition when choosing the prior ΠΣ . By Markov inequality, Condition 5(ii.) requires % & %.& ΠΣ |Σ|−(K+α) −1 ≤ ΠΣ Σ ∈ C : |Σ| > M <∞ M (K+α) for some α > −1 (recall we are using linear functional notation). The inverse Wishart distribution is of particular interest, as it is a conjugate prior. When ΠΣ is inverse Wishart with inverse scaling matrix Ψ and degrees of freedom m, by direct computation of the −(K+α) expectation of |Σ| using the trick that a density integrates to one, we have % & 2(K+α)K Γ (m/2 + (K + α)) K −(K+α) ΠΣ |Σ| = (K+α) |Ψ| ΓK (m/2) where, for x ∈ RK , ΓK (x) is the multivariate gamma distribution (e.g. James, 1964, for its definition). Hence, the expectation is finite and Condition 5(ii.) does not constrain the hyperparameters of the the prior.

2.3

Consistency when Data is Subject to Error

It was assumed that the data is iid. However, in applied problems, it is often the case that data can be made approximately iid after some preliminary filtering (e.g. estimating the parameters in a scale location model derived from (3)). Then, it is natural to ask if the results of this paper continue to hold. The answer is yes, clearly under suitable conditions. To set the scene for the affirmative solution of the% problem, let (Xi )i>0 be & ˜ the unobservable sequence of iid elliptic random variables and Xi be the observable i>0

˜ i = Xi (a.s.) surrogate sequence subject to error. By error we mean that not necessarily X for all i > 0. It is assumed that both sequences are defined on the same probability space.

7

% & ˆi Define the posterior based on X

i>0

as

% & ˆ i Π (dθ) f X θ i=1 A ˆ % & Πn (A) := * + . n ˆ f X i Π (dθ) i=1 θ Θ

% & ˆi Given that both (Xi )i>0 and X

* +n

i>0

are defined on the same probability space, denote

this probability space by (Ω, P). The following condition, together with the ones used ˆ n (A). previously, is sufficient for strong consistency of Π Condition 9 For any % > 0, )0 /' , , , ˆ i (ω),, > % = 0. P ω ∈ Ω : lim sup ,Xi (ω) − X n→∞ i≥n

Condition 10 There is an α > 0, such that for Π- almost all θ , % & ,1+α , , , f X ˆ (ω) θ i , , , , < ∞. sup E ,ln , f (X (ω)) i>0 θ i , ,

Theorem 11 Suppose that θ0 is in the support of Π. Under Conditions 2, 3, 4, 5, 9 and ˆ n (A" ) → 0, P-a.s. . 10, Π 2.3.1

Checking Conditions 9 and 10

Suppose a parametric model in the class (3). We estimate its parameters using past observations. Then, by the continuous mapping theorem, Condition 9 is satisfied if the unknown parameters converges a.s. to the true ones. Finally, Condition 10 can be checked using the following. Lemma 12 Suppose that for any k = 1, ..., K, 1, 2 ,(2+α) ,ˆ , (2+α) sup E ,Xik (ω), + |Xik (ω)| <∞ i>0

for some α > 0. Then, under Condition 3, Condition 10 is satisfied.

3

Further Remarks

One of the fastest areas of research of Bayesian nonparametrics is concerned with computational issues. In particular the method of Escobar (1994) opened the way for Markov Chain Monte Carlo (MCMC) estimation using Dirichlet processes (see also Walker, 2005, and references therein for a review). A recent appealing approach is to use Sethuraman’s constructive definition of a Dirichlet process: 3 P (dv) = Ws δVs (dv) s>0

8

where (Ws )s>0 are random variables in the infinite unit simplex derived by stick breaking construction (e.g. Sethuraman, 1994), (Vs )s>0 is a sequence of iid random variables from the measure G of the Dirichlet process D (ν, G), while δV (•) is the point measure at V . It is common to define N 3 (10) PN (dv) = Ws δVs (dv) , s=1

with N finite (e.g. N = 50), and approximate the posterior using PN as prior rather than P . The error incurred in this approximation is O (n exp {−N/ν}) (Ishwaran and James, 2001, 2002, for details). Once N is finite, the model is put in hierarchical form and estimated by Gibbs sampling (Ishwaran and James, 2001, for a list of methods). In particular, the block Gibbs estimation Ishwaran and James (2001) is very efficient as long as the prior ΠΣ is conjugate to a Gaussian likelihood, which implies that ΠΣ is inverse Wishart. One of the steps in the block Gibbs estimator requires sampling from an inverse Wishart distribution. While this is simple, the procedure is computationally intensive. In this case, approximations can be used to speed up calculations. The scaling matrix can be estimated separately and the estimate used in place of the true scaling matrix Σ. Indeed under the identification condition EV −1 = 1 (V as in Theorem 1), Σ is just the covariance matrix, which follows directly from Theorem 1 and Fubini’s Theorem. A two step approximation uses an estimate of Σ. In high dimensions, this estimate can be based on frequentist methods using either shrinkage of factor methods (e.g. Ledoit and Wolf, 2004, Sancetta, 2008a, Fan et al., 2008, and references therein). Alternatively, a Bayesian estimator that uses a Gaussian likelihood rather than a mixture can also be employed leading to consistent estimation of Σ (e.g. Kleijn and van der Vaart, 2006, see also Theorem 2 in Sancetta, 2008b, among others). The consistent estimator can then be used in place of Σ. When Σ is the covariance matrix, then V should satisfy EV −1 = 1, as just mentioned above. Unfortunately, it is difficult to impose this constraint. Hence the estimation can be implemented without the constraint. This disadvantage is offset by the gain in simplicity. For the sake of concreteness, consider the block Gibbs sampling procedure of Ishwaran and James (2001). Then, using a plugin estimator for Σ avoids the last step in their procedure (Ishwaran and James, 2001, step (d), p. 168-168). In this case, publicly available software can be directly used without much programming effort (e.g. WinBUGS, R). Finally, while the conditions in this paper are very weak, a major obstacle in practice is the assumption of time homogeneity. In some applications it can be interesting to allow for unknown time inhomogeneous parameters. For weak versions of predictive density consistency (Barron, 1986, section 4, for a discussion; see also Barron, 1998), it is possible to modify the posterior to account for time inhomogeneous finite dimensional parameters (e.g. Sancetta, 2008b, using the results of Bousquet and Warmuth, 2002). It would be interesting to investigate notions of posterior consistency in this more general framework. This seems particularly important in financial applications.

4

Proofs

We find convenient to collect here the notation used in the lemmata and the proofs and refer to it when required. The reader can just browse to it whenever unfamiliar/undefined notation is found in the proofs. 9

Notation 13 P is the set of measures on the positive reals such that if P ∈ P, then P ({0}) = 0; C is as in Condition 4; ! (") is inequality up to a finite absolute constant, i.e. the right hand side is proportionally smaller (larger) than the right hand side, while + implies that the right hand side is proportional to the left hand side; Φ is the standard Gaussian distribution and φ its density; if the argument of φ is a vector, then, φ will denote the standard multivariate Gaussian density; ! "! " For a positive definite matrix Σ, write Σ1/2 for the matrix such that Σ1/2 Σ1/2 = Σ, |Σ| stands for its determinant, |Σ|∞ := max1≤i,j≤K |Σij |, Λ (Σ) is the matrix of eigenvalues of Σ and λk (Σ) is its k th eigenvalue; For a > 0, δ ∈ (0, 1)and constants m < M , define the the following classes of mixtures of normals '( ∞ ) % & F := v K/2 φ v 1/2 Σ−1/2 x P (dv) : P ∈ P, Σ ∈ C 0

FP := FΣ, a,b FΣ a,b,δ

:=

:=

'(

0



v

K/2

'(





v

K/2

v

K/2

:=

'(



0



v

v

%

φ v

%

φ v

φ v

1/2

1/2

1/2

Σ

Σ

Σ

−1/2

−1/2

−1/2

) & x P (dv) : Σ ∈ C

) & x P (dv) : P ([a, b]) = 1

) & x P (dv) : P ([a, b]) > 1 − δ

) % & −1 v K/2 φ v 1/2 Σ−1/2 x P (dv) : |Σ| ≤ M K/2

1/2

Σ

%

φ v

0

K/2

φ v

%

0

'(

%

0

'(

FM P :=

Fm,M a,b,δ



0

:=

Fm,M P

'(

−1/2

1/2

Σ

−1/2

) & −1 x P (dv) : m < |Σ| ≤ M

) & −1 x P (dv) : P ([a, b]) > 1 − δ, m < |Σ| ≤ M ;

For any set of functions G, and δ > 0, N (δ, G) is the δ-covering number of G under the L1 norm (van der Vaart and Wellner, 2000, for details). The lemmata are numbered sequentially and proofs may refer to technical results that are only stated subsequently in order not to interrupt the the flow of the main steps of each proof.

4.1

Proof of Theorem 6

Proof of Theorem 6. Let Ac" ⊂ Θ be the complement of A" in Θ, i.e. if Fθ ∈ Ac" then, dT V (Fθ0 , Fθ ) > %. According to a slight extension of Theorem 4 in Walker (2004) (see the proof of Theorem 1 in Lijoi et al., 2005), it is enough to show that: (1.) Π (K" ) > 0 and (2.) for any δ > 0 there is a countable partition (Aj )j>0 of Ac" ⊂ Θ such that Aj := # ! " $ ! " 4 θ ∈ Θ : dT V Fθj , Fθ < δ (and dT V Fθ0 , Fθj > %) for any j and j Πβ (Aj ) < ∞ for some β ∈ (0, 1). This is accomplished using Lemmata 14 and 17, respectively.

10

4.1.1

Statement and Proof of Lemma 14

At some stage in the proof we recall the following fact: an open set centered at P0 in the weak topology of P can be metricized by the Bounded Lipschitz Metric (Dudley metric) 5 6 ,( , , , P ∈ P : sup ,, f d (P − P0 ),, ≤ % , (11) f ∈BL 1

where BL1 is the class of functions where each element f satisfies -f -BL = -f -L +-f -∞ ≤ 1, -f -L being the Lipschitz constant of f and -f -∞ its L∞ norm (Dudley, 2002, ch. 11.2 for further details).

Lemma 14 Under Conditions 3, 4 and 5(i.), if θ0 := (P0 , Σ0 ) is in the support of Π, then Π (K" ) > 0 (K" as in (8)). Proof. Define f0 (x) :=

(



0

fP (x) :=

(



0

fP Σ (x) :=

(

D (Fθ0 , Fθ ) =

(

RK

(

f0 (x) 1/2

|Σ0 |

% & v K/2 φ v 1/2 Σ−1/2 x P (dv) .

1/2

ln

f0 (x) |Σ|

1/2

fP Σ (x) |Σ0 |

dx 1/2

f0 (x) |Σ| dx + ln 1/2 1/2 f (x) K PΣ R |Σ0 | |Σ0 | ( ( 1/2 f0 (x) f0 (x) f0 (x) fP (x) |Σ| = ln dx + ln dx + ln 1/2 1/2 1/2 fP (x) fP Σ (x) RK |Σ0 | RK |Σ0 | |Σ0 | ( 1/2 f0 (x) fP (x) |Σ| ≤ sup |f0 (x) − fP (x)| + ln dx + ln 1/2 1/2 fP Σ (x) x∈RK RK |Σ0 | |Σ0 | =

f0 (x)

% & −1/2 v K/2 φ v 1/2 Σ0 x P (dv) ,



0

Then,

% & −1/2 v K/2 φ v 1/2 Σ0 x P0 (dv) ,

ln

[by Lemma 15]

= I + II + III and we bound each term separately. Control over I. Define ' ) ' ( ∞ ( Γ0 := γ > 0 : v K/2 P0 (dv) < % , ΓP := γ > 0 : γ

γ

11



) v K/2 P (dv) < % .

Condition 3 assures that Γ0 is not empty, and, by Lemma 16, ΓP is also not empty Π-a.s., but for convenience we will suppress the a.s. qualifier throughout as, by the statement of the lemma, we are only interested in Π-non-null sets. Hence, define aP = inf {γ > 0 : γ ∈ Γ0 ∩ ΓP } .

(12)

Then, ,/( aP ( ∞ 0 , % & , , K/2 1/2 −1/2 , I = sup , + v φ v Σ0 x (P − P0 ) (dv),, x∈RK 0 aP , ,( aP ( aP % & , , −1/2 v K/2 (P + P0 ) (dv),, ≤ sup ,, v K/2 φ v 1/2 Σ0 x (P − P0 ) (dv) + K x∈R

0

0

[because φ < 1] ,( aP , % & , , K/2 1/2 −1/2 , ≤ sup , v φ v Σ0 x (P − P0 ) (dv),, + 2% K x∈R

0

by definition of aP . We shall deal with the remaining term considering, at first, an arbitrary positive scalar a in place of aP . Then note that the family of functions % & . −1/2 v %→ v K/2 φ v 1/2 Σ0 x {v ∈ [0, a]} : x ∈ RK (13)

is bounded by some constant proportional to aK/2 and Lipschitz with Lipschitz constant less or equal to sup v∈[0,a],x∈R



sup

,/ 0 % &,, , d K/2 1/2 −1/2 , , v φ v Σ x 0 , , dv K

v∈[0,a],x∈RK

" v (K−2)/2 % 1/2 −1/2 & ! 2 " −1 φ v Σ0 x v x Σ0 x + K 2

[differentiating and rearranging]

! a(K−2)/2 % &! " −1/2 K because φ v 1/2 Σ0 x v 2 x" Σ−1 0 x + K is bounded for any v ∈ [0, a] and x ∈ R as long as K ≥ 2. Then, each element, say f , in (13) satisfies -f -BL1 ! a(K−2)/2 + aK/2 ! aK/2 . This implies that the family of functions in (13) is equicontinuous. Define weak neighbors of P0 of diameter δ in terms of the Dudley metric in (11). Choosing δ ! %a−K/2 assures that we can find a P in the support of ΠP such that ,( ∞ , % & , , K 1/2 −1/2 , sup , v φ v Σ0 x {v ∈ (0, a]} (P − P0 ) (dv),, ≤ %. x∈RK

0

This result holds for any a ∈ (0, ∞) because a is arbitrary. Hence, by Egoroff’s Theorem (e.g. Theorem 7.5.1 in Dudley, 2000), ,( ∞ , % & , , K 1/2 −1/2 , sup sup , v φ v Σ0 x {v ∈ (0, a]} (P − P0 ) (dv),, ! %, a∈A x∈RK

0

12

for some set A so large that for any % > 0, P (Ac ) < %, Ac being the complement of A. Given that P0 is fixed, choosing % small enough, we also have P0 (A) < % (by an application of the Portmanteau Theorem, e.g. van der Vaart and Wellner, 2000). Hence, by definition of aP it must be the case that aP ∈ A implying ,( ∞ , % & , , −1/2 sup ,, v K φ v 1/2 Σ0 x {v ∈ (0, aP ]} (P − P0 ) (dv),, ! %, x∈RK

0

by the previous display, so that I ! %. Control over II Let Σij be the i, j entry of Σ and similarly for Σ0ij . Since Σ0 is in the support of ΠΣ , −1 we can choose Σij = (1 + %) Σ0ij , % > 0. Then, Σ−1 = (1 + %) Σ−1 0 , implying −1 Σ−1 = 0 −Σ

% Σ−1 . (1 + %) 0

(14)

Hence,

ln

fP (x) fP Σ (x)

&  % −1/2 φ v 1/2 Σ0 x ≤ sup ln  ! 1/2 −1/2 "  φ v Σ x v≥0 " v ! −1 = sup − x" Σ−1 x 0 −Σ 2 v≥0 v% = sup − x" Σ−1 0 x 2 (1 + %) v≥0 [by (14)] = 0

−1 because % > 0, and x" Σ−1 0 x ≥ 0, Σ0 being positive definite. This implies that II ≤ 0. Control over III. By choice of Σ as in Control over II, since each term of Σ is a (1 + %) multiple of Σ0 , K we have |Σ| = (1 + %) |Σ0 | implying III ! %. Putting together I, II and III, we deduce that the set K" is not empty under Π.

Lemma 15 Suppose F0 and F are two distributions with densities f0 /c0 and f /c w.r.t. some dominating measure µ with support X , and c0 , c are constants of integration. Then, D (F0 , F ) ≤

c0 + c c sup |f0 (x) − f (x)| + ln . 2c0 x∈X c0

Proof. By Taylor expansion with integral reminder / 0 ( 1 x x + τ (y − x) ln = dτ (y − x) . y x 0

13

(15)

Then, by (15), and the definition of the K-L distance, ( f0 (x) f0 (x) c D (F0 , F ) = ln + ln c0 f (x) c0 RK ( ( 1 1 (1 − τ ) f0 (x) + τ f (x) c ≤ f0 (x) dτ dx sup |f0 (x) − f (x)| + ln c0 RK 0 f0 (x) c0 x∈X [by (15) and a simple upperbound] ( 1 (1 − τ ) c0 + τ c c = dτ sup |f0 (x) − f (x)| + ln , c0 c0 x∈X 0 using Fubini’s Theorem because the integrals are finite. Computing the integral w.r.t. τ gives the result. Lemma 16 Under Condition 5, for any % > 0 there is a γ < ∞ such that ( ∞ v K/2 P (dv) < %, ΠP − a.s. γ

*∞ Proof. It is sufficient to show that 0 v K/2 P (dv) < ∞, ΠP -a.s. because integrability *∞ would imply that eventually for γ large enough γ v K/2 P (dv) < %, ΠP -a.s.. By Condition 5, ΠP (P ([v, ∞))) ! v −K/2−α for some α > 0. Then, 0 /( ∞ 0 / ( ∞ K v K/2−1 P ([v, ∞)) dv ΠP v K/2 P (dv) = ΠP 2 0 0 [integrating by parts, e.g. Petrov (1995, Lemma 2.4)] ( K ∞ K/2−1 −K/2−α ≤ v v dv 2 0 [by Fubini’s Theorem and Condition 5 (i.) ] < ∞. Since for any random variable Y , EY < ∞ implies that Y < ∞ a.s., then the result follows. 4.1.2

Statement and Proof of Lemma 17

Lemma 17 Under Conditions 4 and 5, for any δ > 0 there exists a δ-cover (Aj )j>0 of Ac" ⊂ Θ such that, for some β ∈ (0, 1), 3 Πβ (Aj ) < ∞. j>0

The poof of Lemma 17 is long and requires several intermediate results, which are sequentially derived next. We shall make heavy use of Notation 13. The reader is recommended to consult Notation 13 while reading the statement of each lemma. Lemma 18 For any δ ∈ (0,%1), there& exists a finite absolute constant CK,δ , depending on δ and K only, such that N δ, FΣ a,b,δ ≤ CK,δ (a/b). 14

Proof. For δ ∈ (0, 1) define a sequence (ri )i≥0 such that ri = b exp {iδ/K}. From this sequence define a countable partition (Ai )i≥1 of (a, b) where Ai = {v : ri < v ≤ ri+1 }. For arbitrary but fixed i ≥ 1, when v ∈ Ai , by Taylor expansion with integral reminder, setting r (τ ) = ri + τ (v − ri ) with τ ∈ [0, 1], we have the first equality in the next display ( , % & % &, , K/2 , K/2 1/2 −1/2 I := |Σ| φ v 1/2 Σ−1/2 x − ri φ ri Σ−1/2 x , dx ,v RK , 0 ( ,( 1 % &/ K , , x" Σ−1 x −1/2 K/2 1/2 −1/2 , dx , − (v − r ) dτ = |Σ| r (τ ) φ r (τ ) Σ x i , , 2r (τ ) 2 RK 0 0 / ( ( 1 % & K x" Σ−1 x −1/2 K/2 1/2 ≤ |Σ| (v − ri ) r (τ ) φ r (τ ) Σ−1/2 x + dτ dx 2r (τ ) 2 RK 0 [because Σ is positive definite and ri ≤ v ∈ Ai ] 0 ( 1( % &/ K x" Σ−1 x −1/2 K/2 1/2 ≤ |Σ| (v − ri ) r (τ ) φ r (τ ) Σ−1/2 x + dxdτ 2r (τ ) 2 0 RK [by Fubini’s Theorem because the double integral is finite] ( 1 K = (v − ri ) dτ 0 r (τ ) 1/2

=

[performing integration with the change of variable r (τ ) / 0 v K ln ≤ δ, ri

Σ−1/2 x %→ y]

by direct integration and algebraic simplification. Hence, when v ∈ Ai with i ≥ 1, we have I ≤ δ. Using the same arguments as in Ghosal et al. (1999, proof of Lemma 1, 156) together with the estimate I (just put this estimate in the last display of p. 156 in Ghosal et al., 1999), we deduce that we can find a 2δ-cover of FΣ a,b consisting of discrete probabilities with atoms at (ri )i∈{0,...,I} where I is the smallest integer greater or equal than min {i > 0 : ri > b} and we can choose ; / 0< K b I =1+ ln , δ a 0x1 is the integer part of x. Mutatis mutandis, by Ghosal et al. (1999, p.157, see also Barron et al., 1999) for the subset PI of discrete probabilities with I atoms % such that & #! " $ PI ⊂ P, we have N (δ, PI ) ≤ exp 1 + ln 1+δ I . By these remarks, N 2δ, FΣ ≤ a,b δ

N (δ, PI ) ≤ CK,δ (b/a) for some some CK,δ < ∞ depending only on K and δ > 0. Mutatis mutandis, following % & %the proof & of Lemma 2 in Ghosal et al. (1999), we also deduce that Σ N 3δ, FΣ a,b,δ ≤ N δ, Fa,b and the lemma is proved because δ ∈ (0, 1) is arbitrary. To use the argument in the proof of Lemma 2 of Ghosal et al. (1999) we have relied on the following: 0, ( ,,( ∞ K/2 % &/ v {[a, ∞)} P (dv) ,, , 1/2 −1/2 φ v Σ x P (dv) − , , dx 1/2 , P ([a, ∞)) RK , 0 |Σ| , ( , % &, ( ∞ ,, , , K/2 , ,P (dv) − {[a, ∞)} P (dv) , ≤ φ v 1/2 Σ−1/2 x , dx ,v , P ([a, ∞)) , RK 0 ≤ 2P ((0, a]) ! δ

15

by Condition 3 for appropriate choice of a > 0. The next two lemmata will be used in the proof of Lemma 21 and the reader may wish to skip them and look at them while reading the proof of Lemma 21. Lemma 19 Fix an arbitrary δ ∈ (0, 1). Let A ⊂ C be a set such that for any Σ ∈ A, (1) (2) there are constants sk , sk satisfying 2/K

[1 − (δ/4)]

(1)

(2)

(16)

≤ sk /sk

! " (1) (2) such that sk ≤ λk Σ−1 ≤ sk , k = 1, ..., K. Moreover, for Σ1 , Σ2 ∈ A, their % matrices of

(2)

orthonormal eigenvectors D1 , D2 are assumed to satisfy |D1 − D2 |∞ ≤ δ 2 / 4C 2 K 4 maxk sk (C is as in Condition 4). Then, for any f1 , f2 ∈ {FP ∩ A} = {FP : Σ ∈ A}, ( |f1 (x) − f2 (x)| dx ≤ δ.

&

RK

Proof. For any two matrices Σ1 , Σ3 ∈ A, let Σ2 ∈ A be a matrix similar to Σ1 and with same eigenvectors as Σ3 , i.e. −1 −1 " " " Σ−1 1 = D1 Λ1 D1 , Σ2 = D2 Λ1 D2 , Σ3 = D2 Λ2 D2

(17)

where D1 and D2 are matrices of orthonormal eigenvectors. For r = 1, 2, 3, define fr (x|v) =

% & 1/2 −1/2 φ v Σ x r 1/2 |Σr | v K/2

where Σ1 , Σ2 , Σ3 are as in (17). We claim that for any v ≥ 0, ( |f1 (x|v) − f3 (x|v)| dx ≤ δ.

(18)

RK

Assuming for the moment that (18) holds, then for any Σ1 , Σ3 ∈ A, we deduce the following bound using the triangle inequality and Fubini’s Theorem, which can be applied because the double integral below is finite, , ( ,( ∞ ( ∞ ( ∞( , , , , f1 (x|v) P (dv) − f3 (x|v) P (dv), dx ≤ |f1 (x|v) − f3 (x|v)| dxP (dv) , RK

0

0

RK

0

[by Jensen Inequality and Fubini’s Theorem] ( ∞ ≤ δ P (dv) 0

[by (18)]

= δ and the lemma is proved. Hence, it remains to show that (18) holds. If v = 0, (18) is obvious, as the densities are both zero. Hence, we assume v > 0. Then, by the triangle inequality, ( ( ( |f1 (x|v) − f3 (x|v)| dx ≤ |f1 (x|v) − f2 (x|v)| dx + |f2 (x|v) − f3 (x|v)| dx RK

RK

RK

= I + II.

16

We shall control each term separately. Control over I. The eigenvectors are orthonormal, hence their entries are bounded by one in absolute value. By this remark derive the following inequality, , −1 , ,Σ − Σ−1 , 1 2 ∞ = |D1 Λ1 (D1 − D2 ) + (D1 − D2 ) Λ1 D2 |∞ ≤ 2CK 2 max λk (Λ1 ) |D1 − D2 |∞ 1≤k≤K

[by the previous remark and definition of C] (2)

≤ 2CK 2 max sk |D1 − D2 |∞ 1≤k≤K

[by the eigenvalues for matrices in A] 1 δ2 ≤ 2CK 2

(19)

by the eigenvectors for matrices in A. Then, I2



(

RK

% & & |Σ1 |−1/2 v K/2 φ v 1/2 Σ−1/2 x 1 −1/2 −1/2 K/2 % & dx |Σ1 | v φ v 1/2 Σ1 x ln 1/2 −1/2 K/2 |Σ2 | v φ v 1/2 Σ2 x %

[by (7)] 2 ( −1/2 % &1 1 ! −1 " |Σ1 | −1/2 K/2 −1 1/2 −1/2 " = |Σ1 | v φ v Σ1 x − vx Σ1 − Σ2 x dx + ln −1/2 2 RK |Σ2 | 1 2 ( " 1/2 1 % 1/2 &" ! −1 = φ (y) − Σ1 y Σ1 − Σ−1 Σ1 y dy 2 2 RK −1/2

[by the change of variables v 1/2 Σ1 !! " " 1 −1 Σ1 = − trace Σ−1 1 − Σ2 2 [computing the integral] , K 2 ,, −1 , ≤ C Σ1 − Σ−1 2 ∞ 2 [by Condition 4]

x %→ y and noting that Σ1 , Σ2 have same eigenvalues]

≤ δ 2 /4

by (19) so that I ≤ δ/2. Control over II. . (1) Define S1 as the matrix with eigenvalues equal to sk , k = 1, ..., K and matrix of eigenvectors equal.to D2 : S1 = D2 Λ (S1 ) D2" . Define S2 similarly, but with eigenvalues ! " ! " (2) sk , k = 1, ..., K . For Σ2 , Σ3 ∈ A as in (17), S2 − Σ−1 , S2 − Σ−1 are positive 2 3 −1

−1

definite. Moreover, |S1 | ≤ |Σ2 | and |S1 | ≤ |Σ3 | . These two remarks imply % & % & −1/2 1/2 −1/2 K/2 1/2 |Σ3 | v φ v 1/2 Σ3 x ≥ |S1 | v K/2 φ v 1/2 S2 x and

−1/2

|Σ2 |

% & % & −1/2 1/2 1/2 v K/2 φ v 1/2 Σ2 x ≥ |S1 | v K/2 φ v 1/2 S2 x . 17

By the last displays, and the triangle inequality, ( , % &, , , 1/2 1/2 II ≤ ,f2 (x|v) − |S1 | v K/2 φ v 1/2 S2 x , dx K R ( , % &, , , 1/2 1/2 + ,f3 (x|v) − |S1 | v K/2 φ v 1/2 S2 x , dx RK ( = % &> 1/2 1/2 = f2 (x|v) − |S1 | v K/2 φ v 1/2 S2 x dx RK ( = % &> 1/2 1/2 + f3 (x|v) − |S1 | v K/2 φ v 1/2 S2 x dx K @ ?R 1/2 |S1 | = 2 1− 1/2 |S2 | = δ/2

(20)

because the determinant is the product of the eigenvalues, which are chosen to satisfy (16) in the statement of the lemma. Putting together I and II proves (18) as required. ! " −1 Lemma 20 Under Condition 4, we have: mink λk Σ−1 " 1/K, |Σ| " K −K , and if ! " A ! "B ! " −(K−1) −1 |Σ| ! M , then maxk λk Σ−1 ! M mink λk Σ−1 implying maxk λk Σ−1 ! M K K−1 . Proof. By Condition 4, the sum of the eigenvalues of a K dimensional full rank matrix in C is proportionally less than K !and these eigenvalues must all be positive. Hence, " maxk λk (Σ) ! K implying that mink λk Σ−1 " K −1 . Since the determinant of a matrix +K −1 −1 is the product of its eigenvalues, we deduce that |Σ| " k=1 K −1 . Finally, if |Σ| ! ! −1 " ! " A ! "B −(K−1) M , since mink λk ! Σ " " K −1 , deduce maxk λk Σ−1 ! M mink λk Σ−1 implying maxk λk Σ−1 ! M K K−1 .

Lemma 21 Let (Mi )i≥0 be any strictly increasing sequence. Then, we can choose an C M ,M M0 + K −K such that FP ⊆ i>0 FP i−1 i and % & M ,M N δ, FP i−1 i ! lnK (Mi ) MiK−2 for any P ∈ P.

Proof. By definition of (Mi )i≥0 , Lemma 20 implies that FP ⊆

M ,M FP i−1 i

C

M

i>0

FP i−1

,Mi

and the

first part is proved. To show the cardinality of a δ cover of we shall eventually ! " use Lemma 19. To this end, for arbitrary, but fixed δ ∈ (0, 1) define the sequences sj(k) j(k)≥0 ! " −2j(k)/K where s0 := mink λk Σ−1 " 1/K (by Lemma 20), sj(k) := s0 [1 − (δ/4)] , k = 1, ..., K. Let # ! " $ Bj := Σ ∈ C : sj(k) ≤ λk Σ−1 < sj(k)+1 , k = 1, ..., K . (21) Note that (Bj )j∈NK is a countable cover of C because, by definition and Lemma 20, the % & B smallest eigenvalue of Σ−1 is s0 " 1/K. Let Er j be a countable cover for Bj r>0

18

B

j (j ∈ NK ) such that if Σ1 , Σ matrices of eigenvectors D1 , D2 ! 2 ∈2 E4r , their orthonormal " 2 satisfy |D1 − D2 |∞ ≤ δ / 4C K maxk sj(k)+1 =: δ " .% We &claim that there is a finite CNj Bj B Nj = N (Bj ) such that Bj = r=1 Er , implying that Er j is a finite cover

r∈{1,...,Nj }

of Bj . To find this Nj , consider the set of orthonormal matrices # $ E := D ∈ RK×K : DD" = D" D = IK

where IK is the K dimensional identity matrix. For each given matrix of eigenvalues, the set of possible eigenvectors of a matrix in C is a strict subset of E. To see this, recall Condition 4. Moreover, given any K dimensional vector e, there exist only 2K−1 possible sets of orthonormal vectors orthogonal to e. This is clear for K = 2 and for K > 2 it follows by induction. Hence, the cardinality of a δ " cover for the set of unit vectors in RK is proportional to the cardinality of a δ " cover for E. Given that a unit vector in RK is a point on the unit dimensional sphere in RK , it is sufficient to find the cardinality of a δ " cover for the the surface of the unit sphere in RK , proportionally equivalent to the K−1 cardinality of a δ " cover for [−1, 1] . By these remarks it follows that " K−1

Nj + (1/δ ) % & B when Er j

r>0

+

/

max sj(k)+1 /δ k

2

0(K−1)

(22)

is as defined above; for ease of notation we have suppressed dependence

of Nj on C and K, hence on s0 because finite and not required for the final result. Proved that Nj < ∞, we then claim that there is a ' finite integer J depending on )M . % & B −1 only, such that Σ ∈ C : |Σ| ≤ M can be covered by Er j : j ≤ (J, ..., J) for r≤Nj

some positive integer J (j ≤ (J, ..., J) elementwise). This integer J must be such ! is meant " that, for j (k) ≥ J, sj(k) ≥ maxk λk Σ−1 , for any k = 1, ..., K. Moreover, by Lemma 20, ! " −1 if |Σ| ≤ M , we can choose J such that for, j (k) ≥ J, sj(k) " M/sK−1 " maxk λk Σ−1 , 0 for any k = 1, ..., K. Hence, to find the smallest J satisfying the previous inequalities, take logs on both sides of sJ " M/sK−1 and solve for J to find that 0 < ; K (ln M + K ln K) . J !1+ 2 [ln 4 − ln (4 − δ)] Therefore, our cover has cardinality -! . " # ErBj r≤Nj : j ≤ (J, ..., J) ! J K NJ + lnK (M ) M K−1

using the previous display, (22) and the fact that maxk maxj(k)≤J sj(k)+1 + M , where we have suppressed on δ and K. To finish the proof, note that FM P ⊆ % dependence & C C Bj Bj and that Er satisfies the conditions of Lemma 19 (i.e. j≤(J,...,J) r≤Nj FP ∩ Er '% ) & B B set Er j = A where A is as in Lemma 19). Hence, FP ∩ Er j : j ≤ (J, ..., J) is r≤Nj M ,M of FP i−1 i ,

FM P .

a δ-cover in total variation for Finally, the covering number is % & % & M i N δ, FM − N δ, FP i−1 ! lnK (Mj ) MjK−2 (Mj − Mj−1 ) ! lnK (Mj ) MjK−2 P 19

using a first order Taylor series approximation, and the fact that 0 < Mj − Mj−1 < ∞. We can now prove Lemma 17. Proof Lemma 17. We show that we can find a δ-cover of F satisfying the statement of the lemma. Let (bi )i>0 and (Mj )j>0 be sequences of increasing positive numbers going to infinity and (ai )i>0 a sequence of decreasing positive numbers going to zero. Note that D M ,M j−1 j F⊆ Fai ,b i ,δ i,j>0

and



 Fm,M a,b,δ =

D

P :P ([a,b])>1−δ

 

D

Σ:m<|Σ|−1 ≤M

 

'(

0



) % & v K/2 φ v 1/2 Σ−1/2 x P (dv) ,

where, by Lemmata 18 and 21, and the triangle inequality, % & b N δ, Fm,M ! M K−2 lnK (M ) . a,b,δ a Define the following sequence of sets (Gi,j,δ )i,j>0 ,

(23)

Gi,j,δ := '( ∞ ) % & −1 v K/2 φ v 1/2 Σ−1/2 x P (dv) : P ([ai−1 , bi−1 ]) ≤ 1 − δ, P ([ai , bi ]) > 1 − δ, Mj−1 < |Σ| ≤ Mj . 0

Mutatis mutandis, by the arguments in Lijoi et al. (2005, p.1295), for% any δ ∈ (0,&1), M ,Mj there is an integer N large enough such that for i > 0, N (δ, Gi,j,δ ) ≤ N δ, FaNj−1 ! ,bN ,δ K−2 (b lnK (Mj ) using (23) for the right hand term. Then, note that F ⊆ CN /aN ) Mj " i,j>0 Gi,j,δ . As in Lijoi et al. (2005) we also note that that there is a δ > δ such that

{P : P ([ai−1 , bi−1 ]) ≤ 1 − δ, P ([ai , bi ]) > 1 − δ} ⊂ {P : P ((0, bi−1 ]) ≤ 1 − δ, P ((0, bi ]) > 1 − δ} ⊂ {P : P ((bi−1 , ∞)) > δ " }

Hence, there exists a countable δ-cover (As )s>0 of Ac" such that, for some β ∈ (0, 1), 3 Πβ (As ) s>0

! ≤

3 bN MjK−2 lnK (Mj ) Πβ (Gi,j,δ ) a i,j>0 N

% & 3 bN −1 MjK−2 lnK (Mj ) ΠβP (P ([bi−i , ∞)) > δ " ) ΠβΣ |Σ| > Mj−1 a i,j>0 N

[by independence of P and Σ and (24)] % & 3 bN Πβ (P ([bi−i , ∞))) 3 −1 β K K−2 P M ln (M ) Π |Σ| > M ≤ j j−1 j Σ a δβ i>0 N j>0 [by Markov inequality for P ]

< ∞

by Condition 5. 20

(24)

4.2

Proof of Theorem 11

Proof of Theorem 11. By Lemma 22 (below), the set {ω ∈ Ω : Π (Bδ (ω)) = 1}, where Bδ (ω) is as in (25), has P-probability one for any δ > 0. Hence, multiplying and dividing by fθ (Xi ), % & > + * =+ n n ˆ f X /f (X ) [ i=1 fθ (Xi )] Π (dθ) i θ i i=1 θ A! ˆ % & > + Πn (A" ) = * =+ n n ˆ i=1 fθ Xi /fθ (Xi ) [ i=1 fθ (Xi )] Π (dθ) Θ =+ % & > * n ˆ i /fθ (Xi ) [+n fθ (Xi )] Π (dθ) f X θ i=1 i=1 A! ∩Bδ (ω) =+ % & > + = * n n ˆ i=1 fθ Xi /fθ (Xi ) [ i=1 fθ (Xi )] Π (dθ) Θ∩Bδ (ω) [ by the previous remarks about Bδ (ω) ] =+ % & > n ˆ i /fθ (Xi ) supθ∈Bδ (ω) f X θ i=1 % & > =+ ≤ n ˆ i /fθ (Xi ) inf θ∈Bδ (ω) f X θ i=1 * +n i=1 fθ (Xi ) Π (dθ) × *A! +n i=1 fθ (Xi ) Π (dθ) Θ ≤ exp {2nδ} Πn (A" )

P-a.s. by the remarks about Bδ (ω) for all but finitely many n. By Theorem 6 and Walker (2004, proof of Theorem 4, p. 2036), deduce that Πn (A" ) ! exp {−nγ} a.s. for some γ > 0. Then, choose 2δ < γ in the definition of Bδ (ω). Lemma 22 Under Conditions 9 and 10, for any δ > 0, P ({ω ∈ Ω : Π (Bδ (ω)) = 1}) = 1, where   H n J1/n   & I % ˆ i (ω) /fθ (Xi (ω)) Bδ (ω) := θ ∈ Θ : lim fθ X ∈ [exp {−δ} , exp {δ}] . n→∞   i=1

(25)

Proof. We argue along the lines of Lemma 3 in Barron et al. (1999). Let   H n J1/n   & I % ˆ i (ω) /fθ (Xi (ω)) Bδ := ω ∈ Ω, θ ∈ Θ : lim fθ X ∈ [exp {−δ} , exp {δ}] . n→∞   i=1

Obvious readaptations of Lemma 11 10 in Barron et al. (1999), using % and % Lemma & & 4n 1 ˆ i (ω) /fθ (Xi (ω)) is a measurable function Condition 10 imply that n i=1 ln fθ X from Ω × Θ to R. A continuous function of a measurable function is measurable, then =+ % & >1/n n ˆ i (ω) /fθ (Xi (ω)) fθ X is also measurable. Then, the set Bδ is measurable i=1

21

and we can apply Fubini’s Theorem. Let Bδ (θ) be as in (26) below. By Lemma 23, P (Bδ (θ)) = 1 for every θ ∈ Θ and integrating both sides of this equality w.r.t. Π, ( 1 = P (Bδ (θ)) Π (dθ) (Θ = Π (Bδ (ω)) P (dω) Ω

by Fubini’s Theorem. Since Π and P have range [0, 1], the last display implies the statement of the lemma. Lemma 23 For any δ > 0,

P (Bδ (θ)) = 1,

where   % & 1/n    ˆ n f   I θ Xi (ω)  Bδ (θ) := ω ∈ Ω : lim  ∈ [exp {−δ} , exp {δ}] . n→∞   f (Xi (ω))   i=1 θ

(26)

Proof. Define N := Nn + ln n. Then, % & 1/n ˆ n f I θ Xi   f (X ) θ i i=1 

% &    ˆi  1 3 3 fθ X   ln = exp + n fθ (Xi )  i≤N

i>N

= exp {I + II}

and we shall upper and lowerbound each term separately. To this end, there is an α > 0 such that % & ,1+α , , ˆ , f 1+α 3 θ Xi , ,1 N , , P (|I| > %) ≤ ln 1+α E , N fθ (Xi ) ,, (%n) , i≤N

N

1+α



(%n)



0

1+α

% & ,1+α , , f X ˆ i ,, θ , , , sup E ,ln , i>0 , fθ (Xi ) , 1+α

using Condition 10. Since, by definition of N , (N/n) is summable in n, the BorelCantelli lemma gives |I| ≤ %, P-a.s.. Finally, , , % & , ˆ i (ω) − ln fθ (Xi (ω)),, |II| ≤ lim sup ,ln fθ X →

N →∞ i>N

0

P-a.s. by Condition 9, using the Continuous Mapping Theorem as the density is continuous in its arguments for any θ. Since % > 0 is arbitrary, the lemma follows.

22

4.3

Proof of Lemma 12

Proof of Lemma 12. Define

% & f (x|v) = v K/2 φ v 1/2 Σ−1/2 x .

Then, % & ,1+α , , f X , ˆ (ω) θ i , , , , E ,ln , f (X (ω)) θ i , ,

=



& , *∞ % ,1+α , ln ˆ f X (ω) |v P (dv) ,, i , 0 , E ,, * ∞ , , ln 0 f (Xi (ω) |v) P (dv) , , ( ∞ % ,1+α , ( & , , , ˆ , , E ,ln f Xi (ω) |v P (dv), + E ,,ln

=: I + II. Define the set

0

0



,1+α , f (Xi (ω) |v) P (dv),,

A := {v ∈ R : v ∈ [a, b]}

for constants 0 < a < b < ∞. Moreover, note that for any x, y > 0, |ln (x + y)| ≤ |ln (x)| + y. Then, writing Ac for the complement of A, I

,1+α , ( ,1+α ,( % & % & , , , , ˆ i (ω) |v P (dv), ˆ i (ω) |v P (dv), , f X ≤ E ,,ln f X + E , c , , A A , , , ( , ( % & ,1+α , ,1+α , ∞ K/2 ˆ i (ω) |v P (dv), , , ≤ E ,,ln f X + v P (dv) , , , 0

A

[because φ < 1] , % &,1+α ,,( , , ˆ E sup ,ln f Xi (ω) |v , + ,, v∈A



0

v

K/2

,1+α , P (dv),,

, ,1+α ,,( ,ˆ " ˆ (ω),, lnK/2 (1/a) + bE ,X (ω) Σ−1 X + ,,

0



,1+α , . v K/2 P (dv),,

The first term is clearly finite. To bound the second term, note that K , , , , , ˆ " −1 ˆ ,1+α 3 , ˆ ,2(1+α) E ,X Xi , ! E ,Xik , iΣ k=1

which is finite by the conditions of the lemma. To bound the third term, it is enough to show that ?,( ,1+α @ , ∞ K/2 , , ΠP , v P (dv),, < ∞. 0

23

To this end, using the same argument as in the proof of Lemma 16, and writing α" for the α in Condition 3 to distinguish it from the one used here, ?,( ,1+α @ /( ∞ 0 , ∞ K/2 , (1+α)K/2 , , ΠP , v P (dv), ≤ ΠP v P (dv) 0

0

[by Jensen’s inequality] ( ∞ " ≤ v (1+α)K/2−1 v −K/2−α dv 0

[arguing as in the proof of Lemma 16] ( ∞ " = v αK/2−(1+α ) dv < ∞ 0

choosing αK/2 < α . Control over II is similar, hence the lemma is proved. "

References [1] Barron A.R. (1986) Discussion on Diaconis and Freedman: The Consistency of Bayes Estimates. Annals of Statistics 14, 26-30. [2] Barron A.R. (1998) Information-Theoretic Characterization of Bayes Performance and the Choice of Priors in Parametric and Nonparametric Problems. In J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds), Bayesian Statistics 6, 27-52. Oxford University Press. [3] Barron, A., M.J. Schervish and L. Wasserman (1999) The Consistency of Posterior Distributions in Nonparametric Problems. Annals of Statistics 27, 536-561. [4] Bollerslev, T. (1990) Modeling the Coherence in Short-run Nominal Exchange Rates: A Multivariate Generalized ARCH Model. Review of Economics and Statistics 72, 498-505. [5] Bousquet, O. and M.K. Warmuth (2002) Tracking a Small Set of Experts by Mixing Past Posteriors. Journal of Machine Learning Research 3, 363-396. [6] Dahlhaus, R. (1997) Fitting Time Series Models to Nonstationary Processes. Annals of Statistics 25, 1-37. [7] Engle, R. (2002) Dynamic Conditional Correlation: A Simple Class of Multivariate Generalized Autoregressive Conditional Heteroskedasticity Models. Journal of Business and Economic Statistics 20, 339-350. [8] Engle, R.F., V.K. Ng and M. Rothschild (1990) Asset Pricing with a Factor-Arch Covariance Structure: Empirical Estimates for Treasury Bills. Journal of Econometrics 45, 213-237. [9] Escobar, M.D. (1994) Estimating Normal Means with a Dirichlet Process Prior. Journal of the American Statistical Society 89, 268-277.

24

[10] Fan, J., Fan, Y. and Lv, J. (2008) High Dimensional Covariance Matrix Estimation Using a Factor model. Journal of Econometrics 147, 186-197. [11] Fang, H.-B., K.-T. Fang and S. Kotz (2002) The Meta-Elliptical Distributions with Given Marginals. Journal of Multivariate Analisis 82, 1-16. [12] Ferguson, T.S. (1973) A Bayesian Analysis of Some Nonparametric Problems. Annals of Statistics 1, 209-230. [13] Ghosal, S., J.K. Ghosh and R.V. Ramamoorthi (1999) Posterior Consistency of Dirichlet Mixtures in Density Estimation. Annals of Statistics 27, 143-158. [14] Ghosal, S., J.K. Ghosh and A. van der Vaart (2000) Convergence Rates of Posterior Distributions. Annals of Statistics 28, 500-531. [15] Ghosal, S. and A. van der Vaart (2007a) Convergence Rates of Posterior Distributions for Noniid Observations. Annals of Statistics 35, 192-223. [16] Ghosal, S. and A. van der Vaart (2007b) Posterior Convergence Rates of Dirichlet Mixtures at Smooth Densities. Annals of Statistics 35, 697-723. [17] Hult, H. and F. Lindskog (2002) Multivariate Extremes, Aggregation and Dependence in Elliptical Distributions. Advances in Applied Probability 34, 587-608. [18] Ishwaran, H. and L.F. James (2002) Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information. Journal of Computational and Graphical Statistics 11, 508-532. [19] Ledoit, O. and M. Wolf (2004) A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices. Journal of Multivariate Analysis 88, 365-411. [20] Lijoi, A., I. Prünster and S.G. Walker (2005) On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation. Journal of the American Statistical Association 100, 1292-1296. [21] James A.T. (1964) Distributions of Matrix Variates and Latent Roots Derived from Normal Samples. Annals of Mathematical Statistics 35, 475-501. [22] Kano, Y. (1994) Consistency Property of Elliptical Probability Density Functions. Journal of Multivariate Analysis 51, 139-147. [23] Kleijn, B.J.K. and A. van der Vaart (2006) Misspecification in Infinite-Dimensional Bayesian Statistics. Annals of Statistics 34, 837-877. [24] Sancetta, A. (2008a) Sample Covariance Shrinkage for High Dimensional Dependent Data. Journal of Multivariate Analysis 99, 949-967. [25] Sancetta (2008b) Universality of Bayesian .

Predictions.

Preprint:

[26] Sethuraman, J. (1994) A Constructive Definition of Dirichlet Priors. Statistica Sinica 4, 639-650.

25

[27] Walker, S. (2004a) New Approaches to Bayesian Consistency. Annals of Statistics 32, 2028-2043. [28] Walker, S.G. (2004b) Modern Bayesian Asymptotics. Statistical Science 19, 111-117. [29] Walker, S.G. (2005) Bayesian Nonparametric Inference. In D.K. Dey and C.R. Rao (eds.) Handbook of Statistics 25, 339-371. Amsterdam: Elsevier. [30] Walker, S.G., A. Lijoi and I. Prünster (2007) On Rates of Convergence for Posterior Distributions in Infinite-Dimensional Models. Annals of Statistics 35, 738-746. [31] Wang, Y., Q. Yao, P. Li and J. Zou (2007) High Dimensional Volatility Modeling and Analysis for High-Frequency Financial Data. Preprint: . [32] Van der Vaart, A.W. and J.A. Wellner (2000) Weak Convergence of Empirical Processes. New York: Springer.

26

Bayesian Semiparametric Estimation of Elliptic Densities

Jan 14, 2009 - ues of Σ and λk (Σ) is its kth eigenvalue;. For a > 0, δ ∈ (0, ...... Journal of Machine Learning Research 3, 363-396. [6] Dahlhaus, R. (1997) ...

553KB Sizes 0 Downloads 246 Views

Recommend Documents

Identification and Semiparametric Estimation of ...
An important insight from these models is that plausible single-crossing assump- ...... in crime and commuting time to the city center in estimation using a partially.

Semiparametric Estimation of Markov Decision ...
Oct 12, 2011 - procedure generalizes the computationally attractive methodology of ... pecially in the recent development of the estimation of dynamic games. .... distribution of εt ensures we can apply Hotz and Miller's inversion theorem.

Nonparametric/semiparametric estimation and testing ...
Mar 6, 2012 - Density Estimation Main Results Examples ..... Density Estimation Main Results Examples. Specification Test for a Parametric Model.

Identification and Semiparametric Estimation of Equilibrium Models of ...
Research in urban and public economics has focused on improving our under- standing of the impact of local public goods and amenities on equilibrium sort- ing patterns of households.1 These models take as their starting point the idea that households

Semiparametric Estimation of the Random Utility Model ...
Apr 15, 2017 - ... the consistent estimation of the ratios of coefficients despite stochastic mis- ... is asymptotically normal, meaning that it is amenable to the ...

Nonparametric/semiparametric estimation and testing ...
Mar 6, 2012 - Consider a stochastic smoothing parameter h with h/h0 p−→ 1. We want to establish the asymptotic distribution of ˆf(x, h). If one can show that.

Bayesian Estimation of DSGE Models
Feb 2, 2012 - mators that no amount of data or computing power can overcome. ..... Rt−1 is the gross nominal interest rate paid on Bi,t; Ai,t captures net ...

online bayesian estimation of hidden markov models ...
pose a set of weighted samples containing no duplicate and representing p(xt−1|yt−1) ... sion cannot directly be used because p(xt|xt−1, yt−1) de- pends on xt−2.

Bayesian nonparametric estimation and consistency of ... - Project Euclid
Specifically, for non-panel data models, we use, as a prior for G, a mixture ...... Wishart distribution with parameters ν0 + eK∗ j and. ν0S0 + eK∗ j. Sj + R(¯β. ∗.

Bayesian Estimation of Time-Changed Default Intensity ...
Dec 7, 2017 - spondence to Paweł J. Szerszeń, Federal Reserve Board, Washington, DC 20551; telephone +1-202-452-3745; email .... intensity models is that λtdt is the probability of default before business time t + dt, conditional on survival to ..

Bayesian nonparametric estimation and consistency of ... - Project Euclid
provide a discussion focused on applications to marketing. The GML model is popular ..... One can then center ˜G on a parametric model like the GML in (2) by taking F to have normal density φ(β|μ,τ). ...... We call ˆq(x) and q0(x) the estimated

Bayesian Empirical Likelihood Estimation and Comparison of Moment ...
application side, for example, quantile moment condition models are ... For this reason, our analysis is built on the ... condition models can be developed. ...... The bernstein-von-mises theorem under misspecification. Electron. J. Statist.

Bayesian Estimation of Time-Changed Default Intensity ...
Jan 6, 2015 - The diffusion component would drive the low volatility periods, and the jumps would accommodate .... PremiumLeg = c n. ∑ i=1. Pt(si)˜St(si .... κQ = 0.2,µ = 0.004,σ0 = .1, and recovery is fixed to ρ = 0.4. As we expect, kurtosis 

ON THE NATURAL DENSITIES OF EIGENVALUES OF ...
We prove explicit lower bounds for the density of the sets of primes p such that eigenvalue λp of a Siegel cusp form of degree 2 satisfy c2 > λp > c1, c1,c2 real. A.

Semiparametric forecast intervals
May 25, 2010 - include quarterly inflation fan charts published by the Bank of ... in Wiley Online Library ... Present value asset pricing models for exchange rates (e.g., Engel .... has also shown that the ANW estimator has good boundary ...

Chemistry Lab: Densities of Regular and Irregular Solids Density of ...
Measure the mass of one of the small irregular solids with a triple beam balance. Fill a 100-mL graduated cylinder with enough water to completely submerge the solid. Record this volume of water as “Volume Before”. Hold the cylinder at an angle a

Elliptic Curves_poster.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Elliptic ...

Evaluating Predictive Densities of US Output Growth ... - Bank of Canada
May 27, 2013 - International Economic Analysis Department, Bank of Canada, 234 ...... addition to the empirical distribution function of the PIT, the pictures ...

Evaluating Predictive Densities of US Output Growth ... - Bank of Canada
May 27, 2013 - this paper are solely those of the authors and should not be attributed to the Bank of Canada. †. ICREA Research Professor, UPF and ...

Evaluating Predictive Densities of US Output Growth ... - Bank of Canada
May 27, 2013 - International. Journal of Forecasting 26(4), 808-835. [35] Rossi, B. and T. Sekhposyan (2012), “Conditional Predictive Density Evaluation in the.