Abstract Decomposing a covariance matrix of a high dimensional dataset into collections of eigenvalues and eigenvectors has numerous applications, notably for Principal Component Analysis and Factor Analysis. In this paper and for Itˆ o semimartingale dynamics, we derive the asymptotic distribution of a likelihood-ratio-type test statistic for the purpose of identifying the structure of eigenvalues of integrated covariance matrix estimated using high-frequency data. Unlike the existing approaches where the cross-section dimension grows to infinity, our test does not require large cross-section and thus it opens the door to a wide variety of applications. We also show how our test can be used to test for some specific factor decompositions of the integrated covariance matrix that are very useful for the empirical analysis of financial data. Furthermore, a test for ‘unexplained’ quadratic variation is proposed to investigate whether a given set of factors ‘explains’ at least a given proportion of integrated variance in the continuous part of the underlying process. These tests, however, are based on non-standard asymptotic distributions with many nuisance parameters. Another contribution of this paper consists in proposing a wide bootstrap method to approximate the asymptotic distribution. While standard bootstrap methods focus on sampling point-wise returns, the proposed method replicates features of the asymptotic approximation of the statistics of interest that guarantee its validity. A Monte Carlo simulation study shows that the bootstrapbased test controls size and has a very good power even in samples with moderate size. Finally, we consider two empirical applications. The first one pertains to optimal hedging to crude oil futures while the second one concerns the extraction and usefulness of pricing factors from S&P 500 high-frequency stock prices.

Keywords: Itˆ o semimartingale, high-frequency data, large data cross sections, eigenfunctions of integrated covariance matrix, PCA, Factor Analysis, bootstrap. ∗

Economics Department, Concordia University, 1455 de Maisonneuve Blvd. West, H 1155, Montreal, Quebec, Canada H3G 1M8. Tel: +1 (514) 848 2424 ext. 3479. Email: [email protected] † Department of Economics and Finance, Durham University Business School. Address: Mill Hill Lane, Durham, DH1 3LB, UK. Tel: +44 (0) 191 33 45423. E-mail: [email protected] ‡ Department of Economics and Finance, Durham University Business School. Address: Mill Hill Lane, Durham, DH1 3LB, UK. Tel: +44 (0) 191 33 45301. E-mail: [email protected]

1

1

Introduction

Decomposing a q-dimensional covariance matrix into a set of eigenvalues and eigenvectors, referred to as eigen- or spectral- decomposition, is an important part of a number of statistical methods, notably Principal Component Analysis (PCA) and Factor Analysis (FA). The ability of PCA and FA to summarize the variation of a high dimensional data via a smaller number of factors while retaining most of the variance make them popular methods in many fields including economics and finance. Depending on the structural assumptions, the factors may be correlated or uncorrelated, and in the latter setting the factors are often referred to as principal components. Empirical work often raises the question of deciding about the number Q of factors (principal components) to retain for data modelling. Such a decision is based on testing the clustering structure of eigenvalues of the covariance matrix of interest. Specifically, if π is a targeted proportion of data dispersion supported by the principal components (say, π = 0.90 or 0.95), Q can be found by testing whether the ratio of sum of the largest Q eigenvalues by the sum of all eigenvalues is as large as π. Moreover, particular structures of the eigenvalues can be informative about the underlying data generating processes. For instance, the equality of the m smallest eigenvalues is a testable implication of a factor representation of the primitive vector process with q−m factors and uncorrelated idiosyncratic shocks of the same magnitude. While the covariance matrix is a well-accepted measure of multivariate dispersion of data observed at low frequency, the so-called integrated covariance matrix (hereafter IV) plays a similar role for data observed at high frequency and has established itself as an important component of financial portfolios’ risk measures. The eigenvalue structure of IV has the same importance for PCA and FA of the continuous part of high-frequency vectors of stock prices as does that of covariance for low frequency data. While testing the eigenvalue structure of covariance matrices estimated by low frequency data has been extensively studied in the literature, to the best of our knowledge, no such tests exist for IV. Our main goal in this paper is to provide statistical inference for the structure of eigenvalues of IV estimated using high-frequency data. Early tests for the structure of eigenvalues go back to Anderson (1963) who proposes a likelihood ratio test statistic for the equality of adjacent eigenvalues of a population covariance matrix when the data is independent and identically distributed as Gaussian. Further extension to non Gaussian and to dependent data have since been proposed. See Waternaux (1976), Davis (1977), Tyler (1981), Muirhead (1982), Eaton and Tyler (1991), Cook and Setodji (2003), and Onatski (2010) to name few. The literature on FA and PCA of high-frequency data is rather recent. Papers on this topic include Todorov and Bollerslev (2010), A¨ıt-Sahalia and Xiu (2017a,b) and Pelger (2015). This strand of research considers some class of continuous-time multivariate stochastic process, namely the class of Itˆo semimartingale processes, with the aim of describing them by a factor representation or PCA through some dispersion characteristic such as IV. Inference results are based on the so-called infill asymptotic where the process is supposed to be observed over a fixed time interval more and more frequently. Todorov and Bollerslev (2010) formulate a two-factor model for continuous-time process of financial

2

asset prices that allows them to disentangle and estimate assets’ exposure to the diffusive and the jump components of the market systematic risk. A¨ıt-Sahalia and Xiu (2017b) and Pelger (2015) propose estimators of the number of factors in continuous-time factor models. In the spirit of Bai and Ng (2002), the estimator of A¨ıt-Sahalia and Xiu (2017b) minimizes a criterion that involves estimated eigenvalues of IV and a penalty term, whereas Pelger (2015) follows Onatski (2010) and Ahn and Horenstein (2013) by proposing an estimator that maximizes the ratio of perturbed adjacent estimated eigenvalues of IV. Similarly to the approximate factor models theory of Chamberlain and Rothschild (1983), these estimators are all shown to be consistent when the cross-section dimension increases to infinity. Such estimators are therefore reliable only in empirical applications including large number of assets. A¨ıt-Sahalia and Xiu (2017a) develop some methodology to conduct PCA with high-frequency data. They propose estimators of realized eigenvalues, eigenvectors and principal components along with their asymptotic distribution in a setup of fixed cross-section dimension. However, their results are useful only if the clustering structure of the so-called integrated eigenvalues is known to the researcher. In this paper, we consider the class of Itˆo semimartingale processes and propose some tests for the structure of eigenvalues of the associated integrated covariance matrix (IV). As already mentioned, the ability to test for the eigenvalue structure of IV has some practical implication for the principal component analysis of the continuous part of the process of interest. Moreover, it can also help identify some factor representation; namely, it can be used to test for the decomposition of IV matrix to a sum of factor structure that exhibit few eigenvalues that are large plus the remaining eigenvalues of same magnitude reflecting the common magnitude of the idiosyncratic shocks. This structure is very useful for the empirical analysis of financial data as it represents the factor structure displayed by the integrated covariance matrix of S&P100 index constituents; see A¨ıt-Sahalia and Xiu (2017a,b) and Pelger (2015). The test that we propose builds on the work of Anderson (1963). We first consider the subclass of continuous L´evy processes since this class offers a parametric setting in which the likelihood ratio criterion can be fully derived for the hypothesised structure of eigenvalues of IV. We show that, as the observation frequency increases, the likelihood ratio test statistic converges in distribution to a standard chi-squared distribution with 12 (qk − 1)(qk + 2) degrees of freedom, where qk is the number of equal eigenvalues of IV in the k-th cluster. This result is reminiscent of that of Anderson (1963) for the standard Gaussian framework. Moving to the more general class of Itˆo semimartingales, we use the same likelihood ratio test statistic and derive its asymptotic distribution (under the null) which is now noticeably non-standard with many nuisance parameters. Using consistent estimators of these nuisance parameters, we indicate how an approximation of the asymptotic distribution can be simulated. However, it may be computationally costly to deal with the numerous nuisance parameters involved. Another main contribution of this paper consists in proposing a bootstrap method to approximate this asymptotic distribution and establish its first-order asymptotic validity. The bootstrap procedure that we introduce is simple to implement and does not require the estimation of the nuisance parameters. Even though the inference on the structure of eigenvalues of IV is of our primary interest, we also derive some results for the eigenvalues structure of the correlation matrix R, with

3

p Ri,j = IVi,j / IVi,i IVj,j . Since this is a smooth function of IV, the validity of our bootstrap method extends easily to R. Since the eigenvalues (eigenvectors and principal components) are functions of the estimator of the integrated covariance matrix, an important step towards bootstrapping our test statistic consists in bootstrapping the IV matrix estimator itself, but the question is which type of bootstrap we should use to better approximate the asymptotic distributions of these quantities. The asymptotic distribution of the estimator of IV matrix and some of its functions have been object of estimation by bootstrap in recent literature. On the one hand, Dovonon, Gon¸calves, and Meddahi (2013) have applied the nonparametric i.i.d. bootstrap to approximate the distribution of the so-called realized beta and realized correlation between assets, but they point out that this type of bootstrap cannot generally reproduce the exact asymptotic distribution of estimators of IV matrix. On the other hand, generalizing the work by Hounyo, Gon¸calves, and Meddahi (2017), Hounyo (2017) propose a wild blocks of blocks bootstrap for estimating the distribution of various integrated covariance matrix estimators. We show that the first-order asymptotic approximation of our test statistics of interest can be written as a function of the estimation error of IV. Therefore, using some variant of the blocks of blocks bootstrap of Hounyo, Gon¸calves, and Meddahi (2017), valid bootstrap statistics are derives for our tests. In connection to the literature, it is worthwhile to mention that our results complement those of A¨ıt-Sahalia and Xiu (2017b) and Pelger (2015) for being valid in cases of fixed cross-section dimension allowing for empirical applications with a small number of assets. Our results also complement those of A¨ıt-Sahalia and Xiu (2017a) particularly in the case of constant volatility by providing some test of structure of eigenvalues that their theory takes as known. The finite sample properties of the results obtained have been investigated by an extensive Monte Carlo simulations studies in which several data generating processes have been considered as well as small and large cross-section dimensions. The results reveal that the bootstrap test has a very good size and power performance. We also report the rejection rates based on the standard chi-squared asymptotic distribution which is valid only if the underlying process is continuous L´evy. It turns out that the latter systematically overrejects the null except, as expected, in the case of L´evy dynamics. Finally, we consider two empirical applications to illustrate different facets of the analysis that is feasible within our framework. In the first application - where we look at a relatively low dimensional data - we use our bootstrap-based test to build a hedging strategy for the spot of Crude Oil using future contracts. The results show that the most effective hedging strategy is the one that is based on the number of factors that are selected using our test, followed by the weekly rebalanced static factors approach, and then the naive rebalancing. In the second application - where we look at a relatively high dimensional data - we use our ratio test of ‘unexplained’ quadratic variation to select the number of factors from a large cross-sectional data on stocks prices. These factors are then used to explain an even larger number of individual stock prices. We then test the statistical significance of pairwise correlations between the pricing errors of what is left after regressing the returns on our factors. After comparing the performance of our methodology with several standard asset pricing models such as the CAPM model, the Fama and French (1993) three factor (FF3) model and liquidity factor model of P´astor and Stambaugh (2003), we find that our factor-based asset pricing model lead to reduction

4

in the number of statistically significant pairwise correlations between pricing errors, thus it performs better than the benchmark models. The structure of this paper is as follows. Section 2 presents the theoretical framework and recalls the main existing results. In Section 3, the likelihood ratio test statistic is derived for the test of equality of eigenvalues in the case of continuous L´evy process and its asymptotic distribution provided under general dynamics. The applications of the test to the detection of factor structure in the IV matrix and the proportion of volatility supported by the main principal components are introduced. This section is concluded by the analysis of eigenvalue structure of the correlation matrix. Section 4 exposes our bootstrap methodology and establishes its validity. The Monte Carlo experiments are reported in Section 5. Section 6 contains the empirical applications. Finally, Section 7 provides some concluding remarks on the scope of the test. All proofs are relegated to the appendix.

2

Set-up and existing results

Let X be a q-dimensional Itˆ o semimartingale defined on (Ω, F, (Ft )t≥0 , P), a filtered probability space, with Grigelionis decomposition: Z Xt = X0 +

t

t

Z

σs dWs + (δ1{kδk≤1} ? (µ − ν)t + (δ1{kδk>1} ) ? µt ,

bs ds + 0

(1)

0

where W is a q-dimensional Wiener process, µ is a Poisson measure with compensator ν(dt, dz) = dt ⊗ λ(dz), with λ the Lebesgue measure on Rq ; δ is a real function on Ω × R+ × Rq , and σs is the volatility process. We let cs = σs σs> denote the spot covariance matrix. We assume that X satisfies the following assumption for some r ∈ [0, 2]: Assumption (H-r). bt is locally bounded and σt is c` adl` ag, and kδ(ω, t, z)k ∧ 1 ≤ γn (z) for all (ω, t, z) with t ≤ τn (ω), where (τn ) is a localizing sequence of stopping times and each function γn satisfies R γn (z)r λ(dz) < ∞. The process X represents the vector of log-prices of q assets that we assume are observed at regular time interval ∆n over a time period [0, T ]. The main object of interest in this paper is the integrated covariance matrix of X over the time interval [0, T ] Z IVT =

T

cs ds, 0

which corresponds to the quadratic variation of the continuous part X c of the process X at time T , Rt Rt that is IVT = [X c , X c ]T , where Xtc = X0 + 0 bs ds + 0 σs dWs . Let ∆in X = Xi∆n − X(i−1)∆n be the log-return process over ((i − 1)∆n , i∆n ] for i = 1, . . . , n ' bT /∆n c, where bxc is the largest integer smaller or equal to x. The integrated covariance matrix IVT is estimated by cn = IV

bT /∆n c

X

(∆in X)(∆in X)0 1{k∆in Xk≤α∆$ , n}

i=1

5

(2)

c n is a consistent estimator for some α > 0 and $ ∈ (0, 12 ). Under mild conditions, it is shown that IV of IVT where the setting of asymptotic analysis is the so-called infill asymptotic in which prices are supposed to be sampled more and more often over the same time interval [0, T ], i.e. ∆n → 0.

cn Our interest resides in the asymptotic distribution of the characteristic roots and vectors of IV

and tests for equality of all or some roots of IVT . These are useful inputs to carry out inference on the importance of principal components of IVT as determined by its eigenvectors. These are also useful to test for some specific factor decomposition of IVT as we shall highlight. We next introduce some existing asymptotic theory results followed by some extension that lay the groundwork for our main contributions that appear in the next section. c n is well-known [see e.g. Theorem A.16 of A¨ıt-Sahalia and Jacod The asymptotic behaviour of IV h 1 (2014)]. If Assumption (H-r) holds for some r ∈ [0, 1) and the truncation level $ ∈ 2(2−r) , 12 , then √

1 cn L−s IV − IVT −→ WT , ∆n

(3)

where WT is a random vector defined on an extension of the original probability space and conditionally on F, and is Gaussian with conditional mean 0 and conditional variance covariance given by E

WTij WTkl |F

T

Z =

jl il jk cik c + c c ds, s s s s

(4)

0

with i, j, k, l = 1, . . . , q. ‘L-s’ stands for convergence stable in distribution. We refer to A¨ıt-Sahalia and Jacod (2014, Section 3.2) for further details on this mode of convergence. ++ Let Mq denote the Euclidean space of all q × q real-valued symmetric matrices, and M+ q (Mq )

the subset of all positive semidefinite (definite) elements of Mq . Most of the quantities of interest in this paper are continuously differentiable functions of the integrated variance matrix. Thus, using the delta-method, the large sample behaviour of their estimators can be based on (3). For this, let ϕ be a r generic function defined on M+ q with value in R . Assuming that ϕ is continuously differentiable on

the support of IVT , we have: √

1 c n L−s ϕ IV − ϕ (IVT ) −→ WTϕ , ∆n

(5)

where, similarly to WT , WTϕ is defined on an extension of the original probability space and, conditionally on F is centered Gaussian with conditional covariance matrix given by: E

0 WTϕ WTϕ |F

=

q X u,v,k,l=1

Z T 0 ∂ϕ ∂ϕ uk vl ul vk cs cs + cs cs ds (IVT ) (IVT ) . ∂Muv ∂Mkl 0

(6)

Let λ denotes the eigenvalue function defined on M+ q , with non-increasing elements and value in a suitable subset of Rq . Let A ∈ M+ q with r clusters Lk (for k = 1, . . . , r) of qk -repeated eigenvalues with common values λk , where Lk is the collection of the ranks of eigenvalues (sorted from largest to smallest) of A equal to λk . The components of λ(A) ≡ (δi )1≤i≤q have the structure:

6

= ··· =

δ1

=

δ2

δq1 +1 .. .

=

δq1 +2

δq1

= · · · = δq1 +q2

= λ1 , = λ2 , (7)

δq−qr +1 = δq−qr +2 = · · · =

δq

= λr ,

with λ1 > λ2 > · · · > λr . The eigenvalue function λ(·) is locally Lipschitz continuous on M+ q and differentiable only at points A of M+ q with no repeated eigenvalues, i.e. r = q [see Tao, 2012]. The fact that λ(·) is not differentiable in general rules out the use of (5) to obtain the asymptotic distribution c n ) in most cases with empirical relevance in which multiple roots are expected. Nevertheless, of λ(IV even though λ(·) is not in general differentiable, some relevant functions of λ(·) are. Consider again A ∈ M+ q with eigenvalue structure given by (7). There exists a neighborhood of A on which, for k = 1, . . . , r, the functions: ϕk : M ∈

M+ q

7→ ϕk (M ) =

qk X

λιk−1 +i (M ),

i=1

Pk

are continuously differentiable. Note that ϕk is the sum of eigenvalues P with ranks in the same cluster Lk . It follows that Sk (·) = ki=1 ϕi (·) is differentiable in a neighborhood

with ι0 = 0 and ιk =

i=1 qi ,

of A as well; with Sr (·) being the trace operator. We refer to Chu (1990) and Hiriart-Urruty and Ye (1995) for more details on these standard results on eigenvalue functions. Corollary 3.11 of HiriartUrruty and Ye (1995) establishes that ∂ϕk (A) = Uιk−1 +1:ιk Uι0k−1 +1:ιk , ∂M where Ua:b is the submatrix of the a-th through b-th columns of any orthogonal matrix U diagonalizing A, i.e. any (q, q)-matrix U satisfying U 0 U = Iq

and

U 0 AU = diag(λ1 (A), . . . , λq (A)),

where diag(v) (with v ∈ Rq ) is the diagonal (q, q)-matrix with elements of v as diagonal elements. A¨ıt-Sahalia and Xiu (2017a) have studied the asymptotic behaviour of the estimating error of λ(IVT ) through differentiable functions of eigenvalues obtained by averaging eigenvalues over suitable clusters. Specifically, assuming that IVT has an eigenvalue structure as given in (7), they consider the function:

λ

ϕ (M ) =

0 1 1 1 ϕ1 (M ), ϕ2 (M ), . . . , ϕr (M ) . q1 q2 qr

(8)

Of course, ϕλ is differentiable in a neighborhood of IVT and relying on (5) they show that: √

λ 1 λ c n L−s ϕ IV − ϕλ (IVT ) −→ WTϕ , ∆n

7

(9)

λ

where WTϕ is defined similarly to WTϕ with ∂ϕλk 1 0 , k = 1, . . . , r, (IVT ) = UT,qk−1 +1:qk UT,q k−1 +1:qk ∂M qk with UT being any orthogonal matrix such that UT0 IVT UT = diag(λ1 (IVT ), . . . , λd (IVT )). If for some i ∈ {1, . . . , q}, λi (A) is a simple eigenvalue, the function γi (A) returning the i-th eigenvector of A defines, up to the sign, a differentiable function in a neighborhood of A [see Magnus, 1985, Th. 1]. Commonly, γi (·) makes a well-defined function by restricting its h-th entry to be nonnegative where h is the first entry of γi (A) with nonzero element. Upon such a restriction, if the i-th of IVT is simple, one can use (5) to obtain the asymptotic distribution of the estimator eigenvalue n c γi IV of the eigenvector γi (IVT ) associated to the eigenvalue λi (IVT ). A¨ıt-Sahalia and Xiu (2017a) show that: √

1 c n L−s − γi (IVT ) −→ WTγi , γi IV ∆n

(10)

where WTγi is defined similarly to WTϕ with ∂γi (IVT ) = γi (IVT )0 ⊗ [λi (IVT )Iq − IVT ]+ , ∂(vec[M ])0 where A+ is the Moore-Penrose inverse of A and vec[M ] is the standard vectorizing operator that transforms the matrix M into a vector by stacking its columns.

3

Testing for the eigenvalue structure and applications

The results in the previous section take the structure of the eigenvalues of IVT - such as the one in (7)as known to the researcher. In particular, the rank and multiplicity of the eigenvalues. In this case, confidence intervals can be built for eigenvalues or their average along the framework of A¨ıt-Sahalia and Xiu (2017a) recalled in the previous section. These results can also be used to test some restriction on the true eigenvalues, and such a test would be asymptotically valid if the maintained eigenvalue structure is correct. However, this structure is not known in general and the possibility of testing for equality of eigenvalues shall be useful to learn about the eigenvalue structure of IVT . We introduce in this section a test statistic for this purpose and derive its asymptotic distribution under the null hypothesis. Further, we show how a variant of this test can be used to test for some specific factor decomposition of the IV matrix. In addition, a test is proposed to investigate whether a given set of factors “explains” at least a certain proportion of integrated variance in the continuous part of the process X (vectors of log-prices). An extension to tests for eigenvalue structure of correlation matrices is also provided. To build a test for eigenvalue structure, we first consider a simpler version of the stochastic process X in (1). Namely, we assume that X is a continuous L´evy process, that is δ ≡ 0 and bs ≡ b and σs = σ are constant. This gives rise to a parametric model in which ∆in X (for i = 1, . . . , n) are independent and identically distributed N (∆n b, ∆n c), with c = σσ 0 and IVT = T c = T σσ 0 .

8

By definition, the eigenvalues of IVT are then T times those of c whereas the corresponding eigenvectors are the same. Even though the variance of ∆in X tends to zero with the sample size, the fact that they are independent and normally distributed allows us to draw from the work of Anderson (1963) to build the aforementioned tests. In the case of more complex dynamics for X than continuous L´evy, the same test statistics will be used and their asymptotic distributions will be derived. These asymptotic distributions are typically untractable as we shall see, which motivates the bootstrap approximations that will be proposed in the next section. Returning to continuous L´evy process, the likelihood function of the model in terms of b and c is given by L(b, c) = (2π∆n )

− qn 2

n

|c|

−n 2

X 1 tr c−1 (∆in X − ∆n b)(∆in X − ∆n b)0 exp − 2∆n

!! ,

i=1

where |c| is the determinant of c and tr is the usual trace operator. It is not hard to see that the maximum likelihood estimators of b, c and IVT are: ˜b =

n 1 X i ∆n X, n∆n i=1

c˜ =

n 0 1 X i ∆n X − ∆n˜b ∆in X − ∆n˜b n∆n

f n = T c˜, and IV

(11)

i=1

with n = bT /∆n c. We will make throughout the standard simplifying assumption that T /∆n is an integer. Note that ˜b is an unbiased estimator of b while c˜ is a consistent estimator of c. The loglikelihood of this model can also be expressed in terms of the eigenvalues of T c, i.e. IVT with the restriction that the latter has the eigenvalue structure in (7). The log-likelihood maximized in the direction of b is −

n 1 qn log(2π∆n ) − log |c| − tr(n∆n c−1 c˜). 2 2 2∆n

Hence, up to a constant independent of the model parameters, the log-likelihood is equal to n n fn . − log |IVT | − tr (IVT )−1 IV 2 2 This is a similar expression to that of Equation (3.2) of Anderson (1963) and by the same arguments as his leading to his Equation (3.5), we can claim that, up to a constant (in model parameters) term, the log-likelihood of the model in terms of eigenvalues of IVT is given by log L(λ1 , . . . , λr ) = cst −

r r nX n X X d˜i qk log λk − , 2 2 λk k=1

(12)

k=1 i∈Lk

n f f n . We have the following result: where d˜ = λ IV is the vector of eigenvalues of IV Proposition 3.1. Let X be a continuous L´evy process. (a) If the characteristic roots of IVT are λ1 > · · · > λr > 0 with multiplicities q1 , . . . , qr , respectively,

9

the maximum likelihood estimate of λk is: X ˆk = 1 λ d˜i ; qk

for k = 1, . . . , r,

i∈Lk

where Lk is the set of integers q1 + . . . + qk−1 + 1, . . . , q1 + . . . + qk . (b) The likelihood ratio criterion for testing the roots of IVT with rank indexes in Lk : H0 :

δq1 +···+qk−1 +1 = · · · = δq1 +···+qk = λk ,

where λk is unknown, i.e. the kth row in the eigenvalue structure (7), is given by `˜k =

qk n

, Y

2

q −1 k

d˜i

i∈Lk

X

d˜j .

(13)

j∈Lk

gk = −2 log `˜k and is asymptotically distributed as (c) The likelihood ratio test statistic for H0 is LR gk → ∞, in probability. a χ21 . If δq +···+q +1 6= δq +···+q , then LR 2

(qk −1)(qk +2)

1

k−1

1

k

(d) The likelihood ratio criterion for testing: H0 (λ) :

δq1 +···+qk−1 +1 = · · · = δq1 +···+qk = λ,

with λ specified is given by

`˜k,λ

Q ˜ n2 di X ˜ n di i∈L − qk = qk exp − k λ 2 λ

(14)

i∈Lk

gk,λ = −2 log `˜k,λ is asymptotically disand, under the null, the likelihood ratio test statistic LR 2 gk,λ → ∞, in probability. tributed as a χ 1 . Under the alternative, LR q (q +1) 2 k k

The proof of Proposition 3.1 is relegated to the appendix and is an adaptation of the result of Anderson (1963) to the infill asymptotic framework that we consider with ∆n → 0. Interestingly, the chi-squared asymptotic distributions obtained for the likelihood ratio test statistics in Parts (c) and (d) under the null hypothesis are the same as those derived by Anderson (1963) for the case where the sample is assumed to be independent and identically normally distributed with fixed variance. To give an intuition of the obtained result, let Γ be the matrix of normalized eigenvectors of IVT : Γ0 IVT Γ = ∆

and

Γ0 Γ = Iq ,

(15)

where ∆ is the diagonal matrix containing δ1 ≥ · · · ≥ δq > 0, the eigenvalues of IVT satisfying the gk and LR gk,λ in Parts (c) and (d) of Proposition structure in (7). The asymptotic distribution of LR

10

3.1 are deduced from the asymptotic distribution, say U, of e = √1 f nΓ − ∆ U Γ0 IV ∆n as we show that gk = T 2 LR 2λ2k

2

X

X

u ˜2ij +

u ˜2ii −

i∈Lk

i

u ˜ii + oP (1),

i∈Lk

and

2

gk,λ = T LR 2λ2

1 qk

X

X

u ˜2ij + oP (1),

i,j∈Lk

e . We also show that the limit distribution U of U ˜ has its entries uij where u ˜ij are entries of U that are such that uij = uji , {uij , j ≤ i} are pairwise independent with uii ∼ N (0, λ2k /T ) and uij ∼ N (0, 2λ2k /T ), for i < j which yields the claimed distributions. Hence, likelihood ratio test statistics obtained using eigenvalues of any estimator of IVT that is asymptotically equivalent to f n would be asymptotically equivalent to LR gk and LR gk,λ , respectively. In particular, the following IV equivalences (that we establish in an appendix): √

1 fn n IV − IV = oP (1) ∆n

and

√

1 f n c n IV − IV = oP (1), ∆n

(16)

Pn n i i 0 c n is given by (2), ensure that √ 1 Γ0 IV n Γ − ∆ and where IV = i=1 (∆n X)(∆n X) and IV ∆ n 0 IV f n Γ − ∆ are asymptotically equivalent to U e implying that likelihood ratio test statistics √1 Γ ∆ n

using related eigenvalues have the same asymptotic distribution as in (c) and (d). The last statements in (c) and (d) emphasize the consistency of the respective tests. It is worth mentioning that several lines (eigenvalues clusters) of (7) can jointly be tested. The corresponding likelihood ratio criterion for such a joint hypothesis is simply the product of likelihood ratio criterions `˜k ’s over relevant values of k and the asymptotic distribution of the resulting likelihood ratio test statistic is chi-squared with degrees of freedom equal to the sum over the relevant k’s of 1 2 (qk

− 1)(qk + 2). Several lines can also be tested likewise if λk ’s are specified for each of them.

We now turn our attention to more general dynamics of the process X. We assume that X has the Itˆo semimartingale representation in (1), and we aim to use the likelihood ratio test settings in Proposition 3.1 to carry out inference about the eigenvalue structure of the quadratic variation over the time interval [0, T ] of the continuous part of X, that is IVT . As already mentioned, IVT is c n in (2), which is the sum of outer product of returns after removing consistently estimated by IV jumps by truncation. n c Let d = λ IV be the estimator of δ = λ (IVT ), the eigenvalues of IVT . Let `k and `k,λ be the same as `˜k and `˜k,λ in (13) and (14), respectively, with d˜ replaced by d, and let LRk = −2 log `k and

11

LRk,λ = −2 log `k,λ . We next derive the asymptotic distributions of LRk and LRk,λ . Note that, the representation of X being only partially parametric implies that `k and `k,λ cannot in general enjoy the interpretation of likelihood ratio criteria. Nevertheless, these test statistics can be relied upon once we are able to characterize their asymptotic distributions under the null hypothesis. We refer to Waternaux (1976) and Davis (1977) who have studied the large sample behaviour of these types of test statistics in the context of non normal data. To obtain the asymptotic distribution of LRk and LRk,λ , let Γ be the orthogonal matrix of normalized eigenvectors of IVT and ∆ be the diagonal matrix of eigenvalues of IVT , respectively [see Equation (15)]. From (3) and using the delta-method, we have: √

1 0cn L−s Γ IV Γ − ∆ −→ UT , ∆n

(17)

where UT = Γ0 WT Γ and WT is given by (3). We have the following result. Theorem 3.1. Let X be an Itˆ o semimartingale represented by (1). If Assumption (H-r) holds for h 1 1 some r ∈ [0, 1) and the truncation level $ ∈ 2(2−r) , 2 , then: (a) Under H0 as in Proposition 3.1(b), T LRk −→ 2 2λk L−s

1 2 2 tr(Ukk ) − (tr(Ukk )) , qk

where Ukk is the (qk , qk )-submatrix of UT at the intersection of the (q1 + · · · + qk−1 + 1)-th through the (q1 + · · · + qk )-th rows and columns. (b) Under the alternative (i.e. if H0 does not hold), LRk → ∞, in probability. (c) Under H0 (λ) as in Proposition 3.1(d), L−s

LRk,λ −→

T 2 tr(Ukk ) 2λ2

and under the alternative, LRk,λ → ∞, in probability. Theorem 3.1 generalizes the results in Proposition 3.1(c,d) to the class of Itˆo semimartingales. The asymptotic distributions of LRk and LRk,λ are no longer guaranteed to be pivotal as previously. Indeed, they depend on nuisance parameters such as the common value λk of the relevant cluster of RT eigenvalues of IVT , the conditional variance-covariance matrix of WT which is equal to 0 (cils cjm s + jl cim s cs )ds, for i, j, l, m = 1, . . . , q and the matrix Γ of normalized eigenvectors of IVT . P As already mentioned, λk is consistently estimated by q1k i∈Lk di . Estimators of the conditional

variance of WT have been proposed by Barndorff-Nielsen and Shephard (2004); see also Jacod and Protter (2012) for jump robust estimators. If Γ can be consistently estimated, then this asymptotic distribution can be simulated to generate critical values for inference. However, the presence of multiple roots makes it impossible to consistently estimate Γ even if the identifying restriction that 12

the elements of its main diagonal are positive is maintained. Nevertheless, the fact that only the trace h , for h = 1, 2, is useful for these asymptotic distributions offers some possibility of simulating of Ukk

these distributions as we describe below. Write Γ = (Γ1 · · · Γr ), where Γk , for k = 1, ..., r, corresponds to the eigenvectors associated to the sorted eigenvalues with rank indexes in the cluster Lk so that Ukk = Γ0k WT Γk . Consider c n Γ and E b the matrix of its normalized eigenvectors with main diagonal elements restricted An = Γ0 IV c n have the same set of eigenvalues and Γ b = ΓE b is a matrix of normalized to be nonnegative. An and IV c n. eigenvectors of IV b = (E bkl )1≤k,l≤r , where, for k, l = 1, . . . , r, E bkl is a block (qk , ql )-submatrix of E b at the Write E

intersection of its rows and columns in Lk and Ll , respectively. Proposition A shows A.1 in Appendix bkl = oP (1) for k 6= l and E bkk E b= Γ b1 · · · Γ b r with Γ b k defined b 0 = Iq + oP (1). Thus, writing Γ that E kk

k

similarly to Γk (k = 1, . . . , r), we have b = ΓE, b Γ

b k = Γk E bkk + oP (1). and Γ

(18)

b is not a consistent estimator of Γ (unless qk = 1), this estimator is useful to consistently Even though Γ simulate the asymptotic distribution of LRk and LRk,λ . Indeed, (18) implies that

bk b 0 WT Γ Γ k

h

b 0 Γ0 WT Γk =E k kk

h

bkk + oP (1), E

for h = 1, 2,

which, in turn, implies that tr

bk b 0 WT Γ Γ k

h

h + oP (1). = tr Ukk

It follows that, if one can simulate from the distribution of WT (or its approximate distribution if b k can be used to obtain consistent simulations from the distribution the variance WT is estimated), Γ h ), which in turn can be used to generate asymptotically valid critical values for the tests of of tr(Ukk

interest. Even though useful, we expect this direct simulation approach to be tedious to implement. This motivates the bootstrap approach that we introduce in the next section. Before introducing the bootstrap tests, it is worth discussing some relevant applications of the tests presented in Theorem 3.1 beyond their usefulness for the characterization of eigenvalue structure in (7). Application 1: Factor structure in IVT . For both statistical modeling and financial management perspectives, it is of interest to know whether asset returns have some factor representation. The test of equality of latent roots of IVT is useful to test whether the following factor decomposition holds for IVT : IVT = ΛT FT Λ0T + ξT2 Iq ,

(19)

where ΛT is a possibly random (q, m)-matrix, FT is a random (m, m)-matrix and ξT2 is a positive random scalar, all measurable with respect to FT with m ≤ q. This decomposition is a testable

13

consequence of the factor representation of the continuous part X c of the Itˆo semimartingale process X, i.e. Xtc = ΛT ft + et ,

with

[f, e]t ≡ 0,

[e, e]t = s2t Id ,

∀t ∈ [0, T ],

(20)

where [f, e]t is the quadratic covariation between the processes f and e over the time interval [0, t]. In Rt Rt connection to (19), we have Ft = 0 [f, f ]a da and ξt2 = 0 s2a da. A¨ıt-Sahalia and Xiu (2017b) have stressed the empirical value of the factor decomposition in (19) by pointing out that the integrated variance matrix of S&P100 index constituents tend to display few eigenvalues that are large and distinct, whereas the remaining ones seem to be of the same magnitude reflecting a common magnitude of the idiosyncratic shocks. For an overview on factor models for high frequency data, we refer to A¨ıt-Sahalia and Xiu (2017a,b), Pelger (2015), and Li, Todorov, and Tauchen (2017). If Equation (19) is true, then the q − m smallest eigenvalues of IVT are all equal to ξT2 . This is a testable assumption using the result in Theorem 3.1. Similar tests for factor structure have been discussed in the standard low frequency analysis setting by Anderson (1963) and Davis (1977), among others. The decomposition in (19) can be tested and the number of factors m = m∗ can be determined as a byproduct of the sequential test of against H1 (m) : δm+1 6= δq ,

H0 (m) : δm+1 = δq ,

for m = 0 through m∗ ≤ q − 2. Note that, m∗ is the smallest value of m when H0 (m) is not rejected. If H0 (m) is rejected through m = q − 2, then the factor decomposition in (19) is not a likely correct description of the data. For this sequential test, one shall aim at controlling the family-wise error rate by relying for instance on the Holm-Bonferroni procedure (see Lehmann and Romano, 2005). Application 2: Test for the ratio of ‘unexplained’ volatility. By analogy with the principal component analysis of variance matrices, it is of interest to quantify the proportion of volatility captured by the principal components of X c - the continuous part of X - associated to the largest eigenvalues of IVT . Formally, given a ratio π ∈ (0, 1), we would like to test whether the total amount of volatility not captured by the first Q-principal components does not exceed π. This can be stated as: H0π :

q X

δi ≤ π

i=Q+1

q X

δi ,

(21)

i=1

where δ = λ(IVT ) is the vector of eigenvalues of IVT sorted in non-increasing order. Let us consider the test statistic Zn defined as follows. Zn = √

q X

q X

1 di − π di , ∆n i=Q+1 i=1

n c . We have the following result. where d = λ IV

14

(22)

Theorem 3.2. Let X be an Itˆ o semimartingale represented by (1). h Assume that Assumption (H-r) 1 1 holds for some r ∈ [0, 1) and the truncation level $ ∈ 2(2−r) , 2 . Assume that δQ > δQ+1 and let π ∈ (0, 1). (a) If H0π holds with equality, then L−s

Zn −→ Z ≡ tr (UQ+1:q,Q+1:q;T ) − π · tr (UT ) , where UT is defined in Equation (17) and UQ+1:q,Q+1:q;T is the bottom right (q − Q, q − Q)submatrix of UT determined by its last q − Q rows and columns. (b) If H0π holds with strict inequality, then lim P (Zn > c1−α ) = 0, n

with α ∈ (0, 1) and c1−α is the (1 − α)-quantile of Z. (c) If H0π does not holds, then Zn → ∞, in probability. Theorem 3.2 derives the asymptotic distribution of Zn when the exact ratio of unexplained volatility is π. If X is a continuous L´evy process, this asymptotic distribution is a centered Gaussian with variance derived from the result in Equation (42) in Appendix A: kX r 1 −1 X 2 2 (1 − π)2 qk λ2k + π 2 qk λ2k , T T k=1

k=k1

where k1 is such that Q+1 ∈ Lk1 . Even in the simple case of L´evy process, this asymptotic distribution is not pivotal in general and its simulation presents a similar challenge as that discussed at Theorem 3.1. Direct simulation can be performed from some approximation of the distribution of tr(UT ) and b of normalized eigenvectors of IV c n . If approximate copies of WT tr(UQ+1:q,Q+1:q;T ) using the matrix Γ can be generated, then the equalities b 0 WT Γ) b = tr(UT ) + oP (1) tr(Γ

b 0Q+1:q,Q+1:q WT Γ b Q+1:q,Q+1:q ) = tr(UQ+1:q,Q+1:q;T ) + oP (1) and tr(Γ

provide useful copies of tr(UT ) and tr(UQ+1:q,Q+1:q;T ). However, as already mentioned, simulating from the distribution of WT can be tedious and one shall rely on the bootstrap method that we propose in the next section. The divergence to infinity of Zn under the alternative guarantees that the test is consistent. Eigenvalue structure in correlation. Correlation matrices are insensitive to linear transformations of data and this makes them more appealing than variance matrices for principal component analysis in many applications. We derive similar results to those in Theorem 3.1 for testing the eigenvalue structure of correlation matrices. Let X be an Itˆo semimartingale described by (1) with integrated

15

variance IVT over [0, T ]. We define the correlation matrix RT of X over [0, T ] by RT = G(IVT ),

(23)

where G : M++ → M++ and ∀A ∈ M++ q q q , G(A) = S(A)AS(A), with S(A) is the diagonal matrix √ with diagonal elements 1/ Aii , for i = 1, . . . , q. bn ≡ G(IV c n ) is a consistent estimator of RT , and thanks to (5), we have Since G is differentiable, R √

1 bn L−s R − RT −→ WTG , ∆n

(24)

with WTG defined as in Equation (5) for ϕ = G. We propose a test of eigenvalue structure of the correlation matrix RT using the same statistics as those used for IVT . Assume that RT has an ρ eigenvalue structure as in (7), and let LRkρ and LRk,λ be defined as LRk and LRk,λ , respectively, but bn as estimator of the vector of eigenvalues δ of RT . using d = λ R 0

Let Γρ be the (q, q)-orthogonal matrix such that Γρ RT Γρ = ∆ρ , where ∆ρ is the diagonal matrix with diagonal elements equal to the eigenvalues δi ’s of RT . By the delta method and using (24), we can claim that √

1 ρ0 b n ρ L−s Γ R Γ − ∆ρ −→ UTρ , ∆n

(25)

0

where UTρ = Γρ WTG Γρ . We can state the following result. Theorem 3.3. Assume that the conditions of Theorem 3.1 hold. Then the results (a), (b) and (c) of ρ ρ , respectively, and Ukk Theorem 3.1 also hold when LRk , LRk,λ and Ukk are replaced by LRkρ , LRk,λ

with H0 and H0 (λ) involving restrictions on the eigenvalues of RT . As in Theorem 3.1, the asymptotic distributions provided in Theorem 3.3 are non standard and difficult to simulate directly. The bootstrap methods that we introduce in the next section will be useful to generate asymptotically correct critical values for these tests. Applications 1 and 2 also extend to the correlation matrix RT with the same testing procedures as those described for IVT .

4

The bootstrap

In this section, we introduce some bootstrap procedures to approximate the asymptotic distribution of eigenvalues and eigenvectors estimates when the latter are associated with simple eigenvalues. Bootstrap approximation of the asymptotic distribution of the test of eigenvalue structure and its applications to factor representation and principal component analysis are also proposed. Both bootstrap inference on integrated variance matrix IVT and correlation matrix RT are covered. We observe that all the statistics of interest in this paper are functions of the integrated covariance c n . Therefore, an important step towards bootstrapping these statistics consists in matrix estimator IV c n itself. The asymptotic distribution of estimators of IVT and some of its functions bootstrapping IV have been object of approximation by bootstrap in recent literature. Dovonon, Gon¸calves, and Meddahi (2013) have applied the non-parametric i.i.d. bootstrap to approximate the distribution of the 16

so-called realized beta and realized correlation between assets. However, as they point out, the nonparametric i.i.d. bootstrap is not capable, in general, of reproducing the exact asymptotic distribution of estimators of IVT . Hounyo (2017) has generalized to the multivariate setting the idea of wild blocks of blocks bootstrap of Hounyo, Gon¸calves, and Meddahi (2017), which is of interest to us. While standard bootstrap methods focus on sampling point-wise returns, the wild blocks of blocks bootstrap of Hounyo (2017) samples the terms that appear in the summation expression leading to some variant c n and, thereby, giving the possibility to such bootstrap scheme to successfully approximate the of IV exact asymptotic distribution of the quantity of concern. Note that the bootstrap method of Hounyo (2017) is designed to approximate the asymptotic distribution of estimators of IVT that are robust to market microstructure noise with asynchronous data. The complexity of his data structure and model justifies the fact that he resorts to block bootstrap schemes. c n by adapting his bootstrap scheme to the problem that We follow Hounyo (2017) to bootstrap IV we are primarily interested in this paper. Since we are concerned with synchronous price observations that depart from noisy environments, we rely on a version of the wild bootstrap that does not involve blocks. Even though - for simplicity of exposition- we do not account for noise and non-synchronicity in this paper, the test statistics that we introduce in the previous section can be based on noise robust estimators of IVT . In this case, one shall rely on the full wild blocks of blocks bootstrap method of Hounyo (2017) to obtain an accurate estimation of the asymptotic distributions. c n , we first introduce some notation. For i = 1, . . . , n, let To introduce the wild bootstrap for IV yi = ∆ni X1{k∆ni Xk≤α∆$ , n}

Zi = yi yi0

and ηi , for i = 1, . . . , n be a sequence of independent and identically distributed random variables that are all independent of yi ’s and such that E(ηi ) = 1 and V ar(ηi ) = 1/2. Consider the wild bootstrap sample Zi∗ (i = 1, . . . , n) of Zi (i = 1, . . . , n) which is given by

c Let IV

∗n

Zi∗ = Zi+1 + (Zi − Zi+1 )ηi if i = 1, . . . , n − 1, and Zn∗ = Zn . P c n and let = ni=1 Zi∗ be the bootstrap analogue of IV

Sn∗ = √

1 c ∗n c n IV − IV ∆n

be the bootstrap analogue of

Sn = √

(26)

1 cn IV − IVT . ∆n

Under some regularity conditions, we can show that Sn∗ has the same asymptotic distribution as Sn under the bootstrap measure making the wild bootstrap first-order asymptotically valid. Before stating this result formally in Proposition 4.1 below, we first recall the following standard notation related to the bootstrap theory. We let P ∗ , E ∗ and V ar∗ denote the probability measure, the expected value and the variance, respectively, induced by the bootstrap resampling conditional on the original P∗

sample. Let Yn∗ be a sequence of bootstrap statistics indexed by n. We say that Yn∗ → 0 in prob-P (also denoted by Yn∗ = oP ∗ (1) in prob-P ) if, for any ε > 0, P ∗ (|Yn∗ | > ε) → 0 in probability as n → ∞. Similarly, we say that Yn∗ = OP ∗ (1) in prob-P if supn P ∗ (|Yn∗ | > M ) → 0 in probability as M → ∞. d∗

Finally, we write Yn∗ → Y in prob-P if, conditionally on the sample, Yn∗ converges weakly to Y under

17

the measure P ∗ and this for all sample contained in a set with probability P converging to one. We can claim the following result. Proposition 4.1. Let X be an Itˆ o semimartingale represented by (1). Assume that Assumption (H-r) 1 holds for some r ∈ (0, 1) and $ ∈ [ 2(2−r) , 12 ), and assume that E|ηi |2+ < ∞ for some > 0. Then P∗

|P ∗ (vech(Sn∗ ) ≤ x) − P (vech(Sn ) ≤ x)| −→ 0,

sup x∈Rq(q+1)/2

in probability. k Let ϕ : M+ q → R be a smooth function and let

Tn = √

1 c n ϕ IV − ϕ(IVT ) ∆n

and Tn∗ = √

n 1 c ∗n c ϕ IV − ϕ IV . ∆n

Tn converges stably in law to WTϕ [see Equation (5)] with WTϕ

=

q X u,v=1

∂ϕ (IVT )WT,uv , ∂Muv

with WT defined in (3) and (4). The next result derives from Proposition 4.1 by the application of the delta method. In particular, it states that Tn and Tn∗ have the same asymptotic distribution. Corollary 4.1. Under the same conditions as Proposition 4.1, if WTϕ has a continuous distribution on Rk , then

P∗

sup |P ∗ (Tn∗ ≤ x) − P (Tn ≤ x)| −→ 0,

x∈Rk

in probability. Corollary 4.1 establishes the validity of the proposed bootstrap to approximate the asymptotic c n . The most practical benefit of this result is that there distribution of any smooth function of IV is no need to estimate any of the multiple nuisance parameters, including ∂ϕ(IVT )/∂Muv , that the asymptotic distribution of Tn depends on to estimate its quantiles. Bootstrap quantiles obtained from bootstrap replications of Tn∗ can serve as asymptotically valid quantiles for Tn . Corollary 4.1 has some immediate application for the inference on eigenvalues and eigenvectors of IVT . If the eigenvalue structure of IVT is known to be that displayed in (7) and ϕλ is the eigenvalue function defined in (8), then, by smoothness of ϕλ , we can claim using (9) that: ∗

Tnλ ≡ √

n ∗ λ 1 λ c n∗ d c ϕ IV − ϕλ IV −→ WTϕ , ∆n ∗

in probability. This means that Tnλ provides asymptotically valid approximation to the distribution λ

of WTϕ that can be used to carry out inference about any component of ϕλ (IVT ). By the same way, if the ith largest eigenvalue of IVT is simple, then up to some identifying sign restriction, the function γi (A) equal to the eigenvector associated to the ith largest eigenvalue of A is 18

smooth in a neighborhood of IVT and once again, thanks to (10), we can claim that: n ∗ 1 c n∗ d c √ −→ WTγi , γi IV − γi IV ∆n in probability and, from (10), inference on γi (IVT ) or on any of its components can be carried out using the bootstrap. We now turn to bootstrapping the test statistics in Theorems 3.1 and 3.2 and their related applications. A natural way to bootstrap these test statistics would consist in using bootstrap analogue of ∗ c n ) as input. However, such (13), (14) and (22), with the vector of bootstrap eigenvalues d∗ = λ(IV bootstrap procedures would fail since they would intrinsically test for the structure of eigenvalues in c n which is different than that of IVT . Indeed, considering the asymptotic distribution of IV c n in IV c n is a random matrix with any pair of eigenvalues different with probability (3), conditionally on F, IV cn approaching one. To obtain valid bootstrap test statistics, we rely on an adequate sampling of IV

and an analytic approximation of the asymptotic distribution of the test statistics of interest. As previously, let us write the matrix Γ of normalized eigenvectors of IVT as Γ = (Γ1 Γ2 · · · Γr ) where Γk is associated to the qk -multiple eigenvalue λk , for k = 1, . . . , r. Let Γ0 be equal to Γk or S (Γk1 · · · Γr ), or even Γ, and let Q be the integer defined such that rj=k1 Lj = {Q + 1, . . . , q}. We observe that all the asymptotic distributions of interest in the previous theorems are functions of i h h 0 h , tr UT,Γ ≡ tr (Γ W Γ ) 0 T 0 0 Γ0

h = 1, 2,

c n , which is defined in Equation (3). where WT is the asymptotic distribution of IV If Γ were known, using Corollary 4.1, the distribution of Γ0 WT Γ can be estimated by that of Γ0 Sn∗ Γ. However, as already mentioned, Γ is unknown and cannot be consistently estimated in general. We have also seen in Section 3 that the fact that the asymptotic distribution of UT appears through the b the trace operator makes some estimator of Γ useful for direct simulation. We have namely used Γ, n c [see Equation (18)]. Let Γ b 0 be defined from Γ b as Γ0 is defined matrix of normalized eigenvectors of IV from Γ. Using Equation (18), we have b 0 = ΓE b0 = Γ0 E ˇ0 + oP (1), Γ b0 is a matrix equal to the collection of columns of E b indexed by Lk or Sr Lk , or is equal to where E k=1 b depending of Γ0 . E ˇ0 = E bkk or a block-diagonal matrix with E bkk (k = k1 , . . . , r) on the main diagonal E ˇ0 = E b also depending on Γ0 . The order of magnitude above is obtained from the properties of or E b outlined in Equation (18); see Proposition A.1 in Appendix A. This proposition also ensures that E ˇ0 E ˇ 0 = I + oP (1). Thus, E 0

tr

b0 S ∗ Γ b Γ 0 n 0

h

h i h i b 0 Γ0 S ∗ Γ0 h E b0 + oP ∗ (1) = tr Γ0 S ∗ Γ0 h + oP ∗ (1), = tr E 0 0 n 0 n

19

h = 1, 2.

Therefore, Corollary 4.1 ensures that tr

in probability; showing that tr

b 00 Sn∗ Γ b0 Γ

b0 S ∗ Γ b Γ 0 n 0

h

h

d∗

→ tr

h

Γ00 WT Γ0

h i

(27)

is an asymptotically valid estimator of the distribution

h tr(UT,Γ ). With this insight, we can now introduce the bootstrap statistics for the tests of interest. 0 Γ0

Let

1 b 0 c n∗ b b 0 S ∗ Γ, b Γ IV Γ − D = Γ n ∆n n c , the vector of sorted eigenvalues where D is the diagonal matrix with diagonal equal to d = λ IV ∗ ∗ ∗ ˆk = 1 P c n . Let λ of IV i∈Lk di , and Ukk and UQ+1:q,Q+1:q be, respectively, the (qk , qk )-submatrix of U qk U∗ = √

at the intersection of the (q1 + · · · + qk−1 + 1)-th through the (q1 + · · · + qk )-th rows and columns and the lower-right (q − Q, q − Q)-submatrix of U ∗ . We consider the following bootstrap test statistics: LRk∗ = ∗ LRk,λ =

T ˆ2 2λ k

2

∗ )− tr(Ukk

1 qk

∗ ))2 (tr(Ukk

T ∗2 ) tr(Ukk 2λ2

(28)

∗ − π · tr (Sn∗ ) , Zn∗ = tr UQ+1:q,Q+1:q

for some π ∈ (0, 1).

Note that tr (Sn∗ ) = tr (U ∗ ). These bootstrap test statistics can be seen as the bootstrap analogues of the first-order asymptotic approximation of the original test statistics. The next result shows that these bootstrap test statistics are asymptotically valid. Theorem 4.1. Under the same conditions as in Proposition 4.1 and letting LRk , LRk,λ be defined as in Theorem 3.1 and Zn as in Theorem 3.2, we have: (a) Under H0 , as in Proposition 3.1(b), P∗

sup |P ∗ (LRk∗ ≤ x) − P (LRk ≤ x)| → 0, x∈R

in probability. (b) Under H0 (λ), as in Proposition 3.1(d), P∗ ∗ sup P ∗ LRk,λ ≤ x − P (LRk,λ ≤ x) → 0, x∈R

in probability. (c) Under H0π , as in (21) and if δQ > δQ+1 , P∗

sup |P ∗ (Zn∗ ≤ x) − P (Zn ≤ x)| → 0, x∈R

20

in probability. This theorem establishes the asymptotic validity of the bootstrap when the specified bootstrap statistics are used. Also, these bootstrap statistics are all bounded in probability even under the alternative so that the bootstrap tests are consistent. Remark 1. It is worth reiterating that by construction, the proposed bootstrap method targets the replication of the first-order asymptotic approximation of the test statistics of interest as opposed to mimicking the original test statistics - which, we know, leads to invalid bootstrap approximations. In that respect, we may not be able to obtain the standard higher order refinement properties for these bootstrap tests as this essentially amounts to a match of the higher-order cumulants of original and bootstrap test statistics. Nevertheless, as illustrated by the simulation results in the next section, the bootstrap approximation displays a satisfactory level of accuracy even for sample sizes as small as 78, which corresponds to 5-minute observations within a 6.5 hour-trading day. Remark 2. The above results carry over to our asymptotic analysis of the correlation matrix. The ∗ and Zn∗ in (28) but using useful bootstrap test statistics are defined similarly to LRk∗ , LRk,λ ∗

Uρ = √

1 b ρ0 c n ∗ b ρ Γ − Dρ Γ G IV ∆n

(instead of U ∗ ), where G is defined as in (23), Dρ is the diagonal matrix of the non increasing b ρ is the orthogonal matrix of eigenvectors of R bn . These bootstrap bn = G IV c n , and Γ eigenvalues of R ˆ k obtained from R bn instead of IV c n. test statistics also use λ Before ending this section, we provide detailed algorithms of the implementation of the bootstrap tests useful in Applications 1 and 2 in Section 3. Bootstrap algorithm for testing H0 (m) against H1 (m) in Application 1. c n as given by (2). 1. Compute IV c n ) and Γ b the vector of sorted eigenvalues and the associated orthogonal 2. Compute d = λ(IV b 0 IV c nΓ b = D. matrix of eigenvectors and let D be the diagonal matrix such that Γ 3. Compute the test statistic: LRm = −2 log `r , where `r is given as in (13) but using di (i = m + 1, . . . , q). 4. Bootstrap approximation of the asymptotic distribution of LRm : (a) Draw n independent copies of ηi such that E(ηi ) = 1 and V ar(ηi ) = 1/2. One possibility is to take ηi = (v1i + v2i )/4 with vi ∼ i.i.d.χ2 (2); and another one is to take: ηi ∼ NID(1, 1/2). (b) Get the bootstrap sample by computing Zi∗ (i = 1, . . . , n) using (26) ∗ c n = Pn Z ∗ and (c) Get IV i=1 i √ 0 n∗ b IV c Γ b−D . U∗ = n Γ 21

∗ is the lower-right (q − m, q − m) block of U ∗ . (d) U22

(e) Get a bootstrap copy of LRm as: ∗ LRm

ˆ m+1 = with λ

1 q−m

Pq

=

i=m+1 di

T ˆ2 2λ m+1

∗ 2 tr[(U22 ) ]−

1 ∗ 2 [tr(U22 )] , q−m

c n. is the average of the q − m smallest eigenvalues of IV

∗ , 5. Repeat Step 4 B times (e.g. B = 399) to get as many bootstrap copies of LRm : LRm,b

b = 1, . . . , B. ∗ 6. Use this bootstrap sample to obtain the (1 − α)-quantile, say lm,1−α , of LRm . ∗ 7. Reject H0 at the level α if LRm > lm,1−α .

Bootstrap algorithm for testing for ratio of ‘unexplained’ quadratic variation in Application 2. 1. Choose Q and π ∈ (0, 1): the number of factors to be tested for carrying at least a proportion π of variation in IVT . 2. Perform Steps 1 and 2 of the previous algorithm. 3. Compute Zn , the test statistic for H0π as given in (22). 4. Bootstrap approximation of the distribution of Z: (a) Same as 4.(a) in the previous algorithm. (b) Same as 4.(b) in the previous algorithm. (c) Same as 4.(c) in the previous algorithm. ∗ , the lower-right (q − Q, q − Q)-submatrix of U ∗ . (d) Obtain U22

(e) Get a bootstrap copy of Zn : ∗ Zn∗ = tr(U22 ) − π · tr(U ∗ ). ∗ , b = 1, . . . , B. 5. Repeat Step 4 B times (e.g. B = 399) to get as many bootstrap copies of Zn : Zn,b

6. Use this bootstrap sample to obtain the (1 − α)-quantile, say c∗1−α , of Zn . 7. Reject H0π at the level α if Zn > c∗1−α .

22

5

Monte Carlo simulations

We conduct a Monte Carlo simulation study to investigate the finite sample performance of the tests proposed in Sections 3 and 4. Our primary focus is on assessing the empirical size and power of the asymptotic test in Proposition 3.1 and the bootstrap-based test in Theorem 4.1 under a variety of data generating processes (DGPs) and for different sample sizes. We consider the continuous time vector process of prices Xt ∈ Rq with the standard factor model dynamics: Xt = Λ0 ft + et ,

(29)

where ft = (f1,t , . . . , fm,t )0 is an m-length vector of factors in Rm (m < q), and et = (e1,t , . . . , eq,t )0 is an Rq -valued vector of uncorrelated idiosyncratic noise process independent of ft . Λ is a (q, m)-matrix of loadings that capture the exposure of Xt to the systematic factors ft . The components of the signal (ft ) and noise (et ) processes are, respectively, pairwise independent semi-martingale processes driven by a Stochastic-Volatility Jump-Diffusion model [hereafter SVJD] with a constant jump intensity; see Pelger (2015). Generically, factors and noise are driven by the following stochastic volatility jump diffusion model: q z z J J2 ¯ z , with J z i.i.d. dzk,t = hzk,t dWk,t + Jk,t dN k,t k,t ∼ N (µz , σz ), q z,h , with dhzk,t =κz (ξz − hzk,t )dt + σz hzk,t dWk,t z,h z i = ρz , , Wk,t hWk,t

(30) (31) (32)

where z ∈ {f, e} represents the factor f and noise e processes, with their components indexed by ¯ z , for z ∈ {e, f }, are a set of independent Poisson k ∈ {1, . . . , m} and k ∈ {1, . . . , q}, respectively. N k,t

processes with intensity parameters νz . Finally, h·, ·i denotes the cross variation of two Brownian motions. Under the considered DGPs, a mild assumption that jumps in X from the components of ft do not offset those from the components of et ensures that the continuous part of X, X c has the factor decomposition: Xtc = Λftc + ect

and IVT ≡ [X c , X c ]T = Λ[f c , f c ]T Λ0 + ξT2 Iq .

We will test this factor structure by testing whether the q − m smallest eigenvalues of IVT are equal [see Application 1 in Section 3]. As previously indicated, these tests have power against the alternative that less than q − m smallest eigenvalues are equal meaning that there are in fact more than m factors. The choice of parameters is given in Table 1 and is driven by an attempt to replicate reasonably well realistic asset price dynamics for both futures contracts and equities, which we will look at in our empirical analysis and follows closely those chosen in Pelger (2015). However, we do vary the specification to attempt to uncover the impact of adding more complexity, such as jumps, to the DGPs of both the factor (signal) and the idiosyncratic noise. Table 1 reports the numerical values of parameters for different combinations of DGPs of signal-noise using L´evy process (L´evy), Stochastic Volatility (SV) model, and Stochastic Volatility Jump diffusion (SVJD) model. In particular, we

23

consider the following combinations for signal-noise: L´evy-L´evy, SV-L´evy, SV-SV, SV-SVJD, SVJDL´evy, SVJD-SV, and SVJD-SVJD.1 At each simulation, entries of Λ are drawn independently from the standard normal distribution. Our simulation experiments have two parts. First, for a standard asset pricing set-up (three factors with 20 and 100 assets) we perform an analysis of the size of the tests, when the data is generated under the null and the test is looking specifically at that null, that is three factors versus more than three factors, implied by the eigenvalue structure. Second, we simulate the power curves of these tests by testing various number of factors in the hypothetical factor representation including the true number of factors simulated. Historically, in the factor analysis literature, most proposed tests tend to overestimate the number of factors leading to a lack of parsimony in factor selection when there is either a) a very short sample or b) the data is subject to deviations from i.i.d. normal settings. We will illustrate these effects and then demonstrate that our bootstrap test effectively corrects this phenomenon, even in relatively short samples. In all of our experiments, tests are performed at the level α = 5%. Throughout, rejection rates are based on 10,000 replications and bootstrap critical values are based on 399 bootstrap samples.

5.1

Experiment 1: Rejection rate under the null

In our first simulation study, we fix the number of factors in the simulated data to be equal to three - a popular choice in both equity and term structure models. This is a case where the ratio of the number of factors to the total number of assets, say m/q, is very small: 3/20 and 3/100 for 20 and 100 assets, respectively. The latter is close to the scenario in Chamberlain and Rothschild (1983), which relies on relative ratio of the candidate factor eigenvalues to the trailing eigenvalues to converge to some limit. We then vary the number of available observations (sampling frequency and time horizon) and the data generating process according to the available options in Table 1. Tables 2 and 3 present Monte-Carlo results for each of the DGPs under consideration when the time horizon T is fixed to one day and the sampling frequency changes from 5 minutes to 1/2 minute and when the sampling frequency is fixed to 5 minutes and the time horizon changes from one day (approximately a sample size of 80 observations) to one month (approximately a sample size of 1500 observations), respectively. When the time horizon T is fixed, on the one hand, the upper panel of Table 2 provides the rejection rates for the asymptotic (chi-squared) test. The results in this panel show that the asymptotic test is generally oversized. The best performance of this test occurs, as expected, when factors and noise are independent L´evy processes corresponding to a L´evy price process. In this case, for q = 20, the rejection rates vary from 15.23% for 5-minute data (78 observations) to around 6% for 1-minute and 0.5-minute data (390 and 780 observations, respectively). However, as the cross section dimension becomes large, the rejection rate also increases substantially and is as large as 19.30% and 17.00% for 1

We do have some differences in specification to the parameter selection in Pelger. First, the size of jumps in the factor and idiosyncratic noise is set to 5% (as opposed to 50%), as our empirical data does not contain such large deviations on preliminary analysis. Second, we tune the jump intensity, so on average we have one jump within the time frame of analysis, this is to ensure that we observe sufficient jumps to discriminate between the SV and SVJD models.

24

Table 1: Summary of simulation experiments and parameters. θ L´evy-L´evy Factor parameters

SV-L´evy

SV-SV

SV-SVJD

SVJD-L´evy

SVJD-SV

SVJD-SVJD

κf ξf σf µJf σfJ νf ρf Noise parameters

0.252 -

5 0.1 0.5 -0.8

5 0.1 0.5 -0.8

5 0.1 0.5 -0.8

5 0.1 0.5 -0.1 0.05 1/T -0.8

5 0.1 0.5 -0.1 0.05 1/T -0.8

5 0.1 0.5 -0.1 0.05 1/T -0.8

κe ξe σe µJe σeJ νe ρe

0.052 -

0.052 -

5 0.1 0.5 -0.3

5 0.1 0.5 0 0.05 1/T -0.3

0.052 -

5 0.1 0.5 -0.3

5 0.1 0.5 0 0.05 1/T -0.3

Simulation Experiment 1 (Test under the null): m = 3 and q = 20, 100. Simulation Experiment 2 (Power curves): m = 6, q = 20 and m = 30, q = 100. Note: For parsimony, we presume that the parameters for each factor and each noise process are identical, hence θi,f = θf , ∀i ∈ {1, . . . , m} and θi,e = θe , ∀i ∈ {1, . . . , q}, for θ ∈ {κ, ξ, σ, µJ , σ J , ν, ρ}. The stochastic volatility model is simulated using the discretization procedure described in Andersen, J¨ackel, and Kahl (2010) with the jump component simulated using the approach of Lord, Koekkoek, and Dijk (2010). For the L´evy process, we set hzt = κz , ∀t ∈ (0, ∞). All values are in annualized equivalents. The time frame T represents the total business time within the market. Hence, the total hours of operation will be five minutes × the number of observations, for instance for daily data, 6.5 hours of trading is 390 minutes or 78 five-minute blocks. Our empirical analysis uses weekly and monthly data, hence 390 five-minute blocks for a week (5 times 6.5 hours of trading for the NYSE and Nasdaq data) and approximately 1,700 five-minute blocks for a month. Futures on the Chicago Mercantile Exchange trade for 23 hours and 15 minutes, but active trading is usually for an eight-hour block. To ensure we observe some jumps in the cross section, we set the intensity to 1/T , where T is the trading time in fractions of a year. For our first simulation experiment we utilize n = 80, 160, 250, 500 when q = 20 and n = 160, 250, 500, 1500 when q = 100 and then fix n = 500 for our second simulation experiment for both q = 20 and q = 100.

1-minute and 0.5-minute data, respectively, with q = 100. In the other DGPs, we can see that the test is oversized for both cross section dimensions q = 20 and 100 and all sampling frequencies, especially at lower sampling frequency (small number of observations) and large cross section. The empirical size reaches 100% when we add jumps in either or both processes of factors and noise. On the other hand, the lower panel of Table 2 reports the rejection rates for the bootstrap test. We can see that this test is oversized at lower sampling frequency, but has a far lower rejection rate than the asymptotic test. The bootstrap test is generally correctly sized for moderate or high sampling frequencies such as 1-minute and 0.5-minute. The real advantage of the bootstrap test becomes clearer as we increase the complexity of the DGP. As noted at the end of Section 3, the distribution of the test statistic under a more general DGP is non standard (not chi-squared) and difficult to simulate. The degree of deviation between asymptotic and bootstrap-based tests is apparent in the comparison 25

Table 2: Empirical size (in %) of asymptotic (chi-squared) and bootstrap-based tests, with three factors and different sampling time scales. Asymptotic (chi-squared) test Time Scale L´evy-L´evy SV-L´evy q = 20 5-minute 15.23 45.21 1-minute 6.25 31.82 0.5-minute 6.03 30.04 q = 100 1-minute 19.30 31.82 0.5-minute 17.00 30.04 Bootstrap test Obs. q = 20 5-minute 1-minute 0.5-minute q = 100 1-minute 0.5-minute

SV-SV

SV-SVJD

SVJD-L´evy

SVJD-SV

SVJD-SVJD

45.23 100.0 97.12

65.44 100.0 100.0

42.68 88.12 85.90

66.56 100.0 100.0

57.55 100.0 100.0

100.0 97.12

100.0 100.0

88.12 85.90

100.0 100.0

100.0 100.0

L´evy-L´evy

SV-L´evy

SV-SV

SV-SVJD

SVJD-L´evy

SVJD-SV

SVJD-SVJD

8.66 4.70 4.67

10.12 4.45 4.99

11.52 4.49 5.48

18.06 5.61 6.50

8.91 4.12 5.01

19.44 4.87 6.02

21.90 8.91 6.77

5.84 4.10

6.79 4.35

12.17 4.89

8.02 4.07

10.24 4.16

15.02 4.31

17.88 4.69

Note: This table reports the rejection rates of the standard asymptotic (chi-squared) test (see Proposition 3.1) and the bootstrap-based test (see Theorem 4.1) at 5%. Simulation results fixing T to one day and adjusting the time scaling from an observation every 5 minutes (78 observations in a day), 1 minute (390 observations in the day) and 0.5 minute (780 observations in the day). The top two sections correspond to the rejection rate for the standard chi-squared likelihood ratio test, for a cross section of 20 simulated assets and 100 assets, respectively. The bottom two sections of the table represent the rejection rate using the bootstrap to generate the critical statistic (at 5% one way). The simulation uses 10,000 Monte Carlo replications and bootstrap critical values are based on 399 bootstrap samples.

of the upper and lower panels of Table 2. Here, we see evidence that the size of the asymptotic test will not converge to 5% even when the number of available observations increases substantially. Same for Table 3. For cases where jumps and stochastic volatility are present in both the noise and the factors, the rejection rate for the asymptotic test approaches unity, hence this test will overstate the number of factors. By contrast, even for the short sample size, the bootstrap test is either very close or correctly sized, and the trend suggests that with greater number of observations (high sampling frequencies) the correct size will be obtained. Regarding the case where the time horizon T changes and the sampling frequency is fixed to 5 minutes, Table 3 reports the rejection rates for both asymptotic (upper panel) and bootstrap (lower panel) tests. Similarly to the results in Table 2, the asymptotic test performs best in the case of L´evy price process as its rejection rate lies between 5.33% and 4.92% for n = 160 and 500, respectively and q = 20. As the cross section dimension increases to q = 100, the rejection rate of this test expectedly is worse at n = 160 (55.37%) but decreases as n gets larger to establish itself at 5.79% and 5.77% for n = 500 and n = 1500, respectively. Across all the other DGPs, the asymptotic test is oversized even for large n and is more so as the cross section dimension increases. In contrast, the bootstrap-based test generally controls size for all DGPs and cross section dimen-

26

Table 3: Empirical size (in %) of asymptotic (chi-squared) and Bootstrap-based tests, with three factors and for an increasing T . Asymptotic (chi-squared) test n L´evy-L´evy SV-L´evy q = 20 160 5.33 5.20 250 5.12 5.32 500 4.92 4.98 q = 100 160 55.37 79.39 250 9.82 19.62 500 5.79 7.14 1500 5.77 9.81 Bootstrap test n L´evy-L´evy q = 20 160 5.06 250 4.99 500 5.08 q = 100 160 9.94 250 7.17 500 5.68 1500 5.61

SV-SV

SV-SVJD

SVJD-L´evy

SVJD-SV

SVJD-SVJD

6.54 7.46 7.27

20.17 17.34 12.49

18.93 13.29 10.88

18.94 15.34 14.35

27.77 22.91 15.96

79.88 29.72 27.11 18.29

82.17 29.83 25.39 17.15

58.12 31.25 17.45 15.11

85.91 32.37 26.18 19.15

88.87 41.87 27.34 23.51

SV-L´evy

SV-SV

SV-SVJD

SVJD-L´evy

SVJD-SV

SVJD-SVJD

6.69 5.29 5.86

6.87 6.00 5.53

7.09 5.20 6.48

6.98 5.10 5.63

7.49 5.44 5.21

7.85 5.24 5.62

10.15 6.84 6.20 5.93

10.39 7.62 6.66 5.01

10.27 7.15 5.73 5.91

10.78 7.44 5.14 5.12

10.46 8.69 6.51 5.92

11.86 8.83 5.73 5.41

Note: This table reports the rejection rates of the standard asymptotic (chi-squared) test (see Proposition 3.1) and the bootstrap-based test (see Theorem 4.1) at 5% over a variety of sample sizes for simulated 5 minute data under the null (null is three factors versus an alternative of more than three factors). The time scaling is kept at a presumed 5 minutes and the time horizon T is increased (n = 80, 160, 250, 500 and 1500 observations correspond approximatively to T = 1, 2, 3, 6 and 18 trading days, respectively). The simulation is conducted under the same conditions as the simulation in Table 2.

sions. For both q = 20 and q = 100, the test over-rejects slightly for n = 160 with rejection rates reaching the maximum of 7.65% (q = 20) and 11.86% (q = 100) when factors and noise processes are both SVJD (stochastic volatility with jumps). The over-rejection issue seems to disappear as n increases with rejection rates close to nominal (5%) in samples of moderate to large sizes.

5.2

Experiment 2: Power curves

In the second simulation study, we provide a slightly different set of experimental conditions. In particular, we increase the number of factors in the DGPs for the small (q = 20) and large (q = 100) simulated cross sections to m = 6 and m = 30, respectively. We do not suggest that typically this will be the number of factors in a model, but provides an upper feasible bound. We then test, from large to small, the number of identical smallest eigenvalues and, hence, infer the eigenvalue structure and by construction - the number of factors. In each case, the number of observations is fixed and equal to n = 500. For simplicity of exposition, we select a subset of the DGPs from the first simulation experiment to illustrate the performance of the asymptotic and bootstrap tests. 27

Figures 1 to 3 present the rejection rates (power functions) for the asymptotic versus bootstrap tests for the eigenvalue structure for the small (left) and large (right) cross sections. In each figure, the unbroken line presents the power curve for the asymptotic test and the dot dashed line is the power curve for the bootstrap test. The vertical dotted line demarcates the actual eigenvalue structure implicit in the parameterization of the DGP, whilst the horizontal line demarcates 5%, the anticipated rejection rate under the null. Note that, the test is only under the null at the point denoted by the vertical line, hence a correctly sized test should intersect at the intersection of the vertical and horizontal lines. Figure 1 provides the base case (L´evy-L´evy), when both the factors and noise are L´evy processes. This yields the anticipated baseline result that both the asymptotic and bootstrap tests are correctly sized for the small cross section. However, for the large cross section and when the number of factors in the DGP is large (m = 30), the asymptotic test performs poorly and is oversized, whereas the bootstrap test is still very close to nominal level. Note that, the bootstrap is also more discriminatory for the large cross section, as the power curve rises more sharply (power converges to one) than when using the chi-squared critical value from the asymptotic test. Similarly to the first simulation experiment, which just looked at the case under the null for a very small number of factors, the poor performance of the asymptotic test for large cross sections with large numbers of factors is well documented, even for normally distributed data. However, the bootstrap does perform as anticipated in providing more power to select the most parsimonious model. For a relatively large number of factors, Figure 2 illustrates the power functions of the asymptotic and bootstrap tests for the most challenging DGP (SVJD-SVJD), in which the underlying process exhibits both stochastic volatility and jumps in both the factors and idiosyncratic noise. This is clearly the case which the asymptotic (chi-squared) test is not designed for and as anticipated it is substantially oversized for both the small and large cross section cases. However, the bootstrap test performs in the anticipated manner and is still correctly sized and has very good power to discriminate in both cases. Finally, for comparison, in Figure 3 we present the power functions of the asymptotic and bootstrap tests when the data is generated from the DGP labelled SVJD-L´evy, in which only the factor structure exhibits stochastic volatility and jumps, but the noise is generated by i.i.d. normal increments. From this, we can see that the results are materially similar to those in Figure 2 illustrating the observation from theory made in Section 3 that the deviation from normality need only be in the factor structure or the idiosyncratic noise to force the test statistic substantially away from being chi-squared, and hence invalidating the test. We have also examined the power functions of both asymptotic and bootstrap-based tests under other combinations of data generating processes for signal-noise: SVJD-SV, SV-SVJD, SV-SV, and SV-L´evy. The results (not reported and available upon request) are quite similar to those reported in Figures 1-3.

28

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

2

4

6

8

10

12

14

16

18

20

10

20

30

40

50

60

70

80

90

100

Figure 1: Size/Power Comparison for Bootstrap versus the chi-squared asymptotic tests for H0 : m = m0 . The two panels present the low and high dimensional cases with q = 20, m∗ = 6 and q = 100, m∗ = 30 simulated assets, respectively. The DGPs correspond to the L´evy-L´evy case. See Table 1 for parameters specification. The black-solid line represents the standard asymptotic chisquared test (see Proposition 3.1(c)) and the dashed line corresponds to the bootstrap test specified in Section 4. The vertical dot-dashed line denotes the dimension (q) minus the number of factors (m∗ ). The horizontal line represents the 5% rejection rate for the null hypothesis. The number of observations is n = 500. 1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

2

4

6

8

10

12

14

16

18

20

10

20

30

40

50

60

70

80

90

100

Figure 2: Size/Power Comparison for Bootstrap versus the chi-squared asymptotic tests. The DGPs correspond to the SVJD-SVJD case. For more details see description of Figure 1.

6

Empirical applications

We outline two empirical applications that illustrate two different facets of the analysis that is feasible within our framework. In the first one, we look at a relatively low dimensional data on the term structure of futures contracts, where the individual assets are the prices of futures contracts with a particular maturity. In this application, we use our bootstrap test to build a hedging strategy for the spot of oil using future contracts. In the second one, we analyze a relatively high dimensional data on equity prices. In particular, we use our ratio test of ‘unexplained’ quadratic variation to illustrate the usefulness of PCA from high frequency data (five minute time scaling) for generating factors at a comparatively low frequency (monthly returns) and embed them in a standard two pass Fama-Macbeth regression framework.

29

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

2

4

6

8

10

12

14

16

18

20

10

20

30

40

50

60

70

80

90

100

Figure 3: Size/Power Comparison for Bootstrap versus the chi-squared asymptotic tests. The DGPs correspond to the SVJD-L´evy case. For more details see description of Figure 1.

6.1

Optimal hedging ratios using PCA from Crude Oil Futures

The goal of this empirical application is to build a hedging strategy for the spot of Crude Oil using future contracts. In other words, for each 1$ invested in the spot, we would like to know how much to invest in future contracts to minimize risk of exposure. To answer this question, we estimate the optimal hedge ratio using factors from the term structure of Crude Oil Futures. Typically, in a hedging problem there is a log spot price ps,t , which is possibly observed infrequently (for instance a periodic LIBOR fixing or a physical delivery price of crude oil) and a vector of hedging future contracts pf,t , which incompletely offset the variation in the spot price by forming a portfolio ˜ t = ps,t − δ 0 pf,t . Various hedging strategies can be postulated, but most can be approximated as Π †

follows: min δ†

XN n=1

Et [(∆ps,n − δ†0 ∆pf,n )2 ],

(33)

where δ† is a vector of hedging ratios and n ∈ {1, . . . , N } is some discrete number of steps within the planning horizon. That is minimize the anticipated quadratic variation of a portfolio of spot and hedging instruments over N time periods. When the cross section of hedging instruments is formed of ˜ t = ps,t − δ 0 ft , and if the cross section a small number of factors ft , then the portfolio is formed by Π of hedging contracts is a linear combination of the underlying factors then ft = Λpf,t and δ† = Λδ. 6.1.1

Methodology

We first extract all best bid and ask prices for WTI futures contracts traded electronically on the CME/NYMEX platform. For each day, we observe at ultra high-frequency 120 Crude Oil Futures prices, that is one for each month for up to ten years, although typically trading is infrequent after approximately two years. Figure 4 plots the evolution of WTI futures prices from first of January 1996 to first of January 2017. For each day, we plot the end-of-day traded price of each contract against its delivery date. The nearest delivery future is plotted on the day of trading to proxy the spot price. Figure 4 shows that active trading in longer maturity futures has grown steadily since 1996, when hedging was constrained to being within five months. After 2004, we regularly see futures trading out to just under three to five years, which is between 28 and 53 actively traded contracts. Thus, each 30

Figure 4: This figure illustrates the end-of-day WTI Crude Futures Prices from the nearest delivery [black unbroken line], which proxies the spot price, to the maximum maturity traded.

day of the post 2004 has 28 to a maximum of 53 concurrent continuously traded time series. Formally, let pf (t, T ) = log(Pf (t, T )) be the log futures price for a contract at time t delivering a pre-set quantity of an asset at time T for price Pf (t, T ). Let T = [Tn ]qn=1 be a vector of delivery dates such that t < T1 < T2 < · · · < Tq , where Tq is the longest maturity future. Typically we assume that T1 − t → 0, that is the nearest delivery future proxies the spot price ps (t). The collection of futures prices at time t and for all q maturities can be compactly written using the vector Pf,t = [pf (t, T1 ), . . . , pf (t, Tq )]0 . Futures prices are typically modelled using a set of m uncorrelated driving factors ft ∈ Rm , using the following relation: Pf,t = Λft + et ,

(34)

where Λ is a matrix of factor loadings that captures the exposure of futures prices Pf,t to the systematic factors ft , and et ∈ Rq is the process of idiosyncratic components of prices independent of ft . The noise vector et is assumed to have a semimartingale dynamics as in Equation (1), with diagonal integrated variance matrix that has equal diagonal elements. The vector of factors ft is also assumed to be an Itˆo semimartingale. We now define the returns on the futures prices. Before we do so, let ∆n be the discrete time increment as defined in Section 2. We denote r∆in (Tj ) = pf (i∆n , Tj ) − pf ((i − 1)∆n , Tj ) the log-return of futures price at maturity Tj , for j = 1, ..., q, over the period ((i − 1)∆n , i∆n ], for i = 1, . . . , bT /∆n c. The collection of futures returns across all the maturities, except at the first maturity T1 , can be

31

0.05

0.02

0 0.04 -0.02

-0.04 0.03 -0.06

0.02

-0.08

-0.1 0.01 -0.12

-0.14 0 -0.16

-0.01

-0.18 0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Figure 5: This figure illustrates a complex [left panel] and a simple [right panel] term structure for Crude Oil Futures prices at the end of day on June 15, 2011 and August 9, 2016, respectively.

compactly written using the vector ri = [r∆in (T2 ) , ...., r∆in (Tq )]0 . The vector ri will be used to compute the estimated integrated covariance matrix of futures returns as defined in Equation (2), with ∆in X replaced by the return ri across all q maturities. The factors ft are expected to summarize the variations in the term structure of futures prices, c n . Let Γ ˜ be the thus they will be extracted using the estimated integrated covariance matrix IV (m, q)-matrix formed by the columns of eigenvectors associated with the first m largest eigenvalues of c n , and denote f˜i the m-vector of extracted factors of the following form: IV ˜ ri , f˜i = Γ˜

(35)

where ˜ ri is the vector of return ri truncated for jumps. f˜i represents the m-vector of the orthogonal c n. principal components that we extract from the estimated integrated covariance matrix IV Next, our main interest is to construct minimum variance hedges of the following form: bT /∆n c

˜∗ = arg min β ˜ β

X

˜0 f˜i,− )2 , (r∆−i (T1 ) − β n

(36)

i=1

where r∆in ,− (T1 ) is the return on the future price at short maturity T1 , which proxies the return on the spot price. The subscript “−” in r∆i ,− (T1 ) and f˜i,− indicate that the returns r∆i (T1 ) and n

n

˜∗ factors fi are computed using data on the previous period prices, then roll the optimal loadings β ˜ forward over the following period. In our application we use weekly prices, and weighting matrix Γ hence r∆i ,− (T1 ) and f˜i,− are the returns and factors computed using the previous week’s prices. After n

setting r1,− = [r1,− (T1 ) , . . . , rbT /∆n c,− (T1 )]0 to be a bT /∆n c-vector of futures returns at short maturity and f˜− = [f˜1,− , . . . , f˜bT /∆ c,− ]0 to be a (bT /∆n c, m)-matrix of factors, the minimization problem in n

˜ can be computed using the Least Squares regression of the form (36) implies that the loadings β −1 ˜∗ = f˜−0 f˜− β f˜−0 r1,− . 32

(37)

The previous period discrete futures returns r1,− can be of arbitrary length relative to [0, T ], however for simplicity of exposition, we keep this to weekly return blocks and ∆n to a five minute return. Finally, the reduction in variance and hence the effectiveness of the hedge from the factors can be ascertained, over a given sample period, via the following coefficient of determination: R2 = 1 −

˜∗ )0 (r1 − f˜β ˜∗ ) (r1 − f˜β , r01 r1

(38)

where r1 = [r1 (T1 ) , . . . , rbT /∆n c (T1 )]0 and f˜ = [f˜1 , . . . , f˜bT /∆n c ]0 . Hence, the usefulness of the PCA approach is that one reduces the hedging problem to a simple portfolio of calendar spreads, simultaneously holding different maturity contracts, that can be kept constant over a given period [0, T ]. 6.1.2

Results

The empirical results from employing our bootstrap methodology on WTI Light Crude futures are presented in Figures 5 to 7. Figure 5 reports the term structure of Crude Oil Futures prices on two different days: 15 of June 2011 and 9 of August 2016. The former is a day where the world economy was affected by the financial crisis of 2007-2008, whereas the later one belongs to a period where the world economy was in recovery. This figure indicates that the number of factors might not be constant over time. In the left panel of Figure 5, we see that it is more likely that many factors are needed in order to approximate the term structure of futures prices during the financial crisis, whereas the right panel indicates that possibly only few factors are needed to summarize the term structure of futures when the economy is stable. Consequently, the number of factors has to be selected dynamically at each point of the sample, hence the result in Figure 6. Figure 6 shows the number of factors selected every week using the bootstrap test developed in Section 4. This test is applied in a sequential procedure by incrementing the null of m∗ factors from 0 up until the null is not rejected. We can see that this number is time-varying and has increased since the financial crisis, especially during the European sovereign debt crisis in 2009. This might be explained by the fact that the term structure of Crude Oil Futures prices become noisy during financial turmoil, which might affect the goodness of fit of the Least Squares regression in Equation (36), and consequently the performance of the hedging strategy. Moreover, since the number of factors has increased dramatically during the 2009 period, this might indicate that the factor model under consideration is misspecified for those periods, and the bootstrap-based ratio test in Section 4 can be used instead to select a smaller number of factors. Finally, Figure 7 plots the week-on-week coefficient of determination R2 of the out-of-sample hedging strategy in (36). It compares three different approaches: (i) weekly rebalanced dynamic factors, (ii) weekly rebalanced static factors, and (iii) naive rebalancing. The first approach (weekly rebalanced dynamic factors) consists of using our bootstrap-based test to select the number of factors dynamically from the previous week’s prices, then computing β˜∗ in (37) using the next week spot price. The second approach (weekly rebalanced static factors) is popular with practitioners. It consists in extracting the same number of factors (three factors) every week using past weekly spot and futures prices to compute β˜∗ in (37). The final approach (naive rebalancing) does not use factor analysis and instead it 33

Weekly Estimated Number of Uncorrelated Factors 15

10

5

0 1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

Figure 6: The time evolution of the number of factors from the term structures of Crude Oil Futures prices that are selected every week using the bootstrap-based test in Proposition 4.1. The sample runs from the first of January 1996 to the first of January 2017.

computes the optimal hedge ratio by regressing the nearest spot price on the future price at maturity T1 (one-month maturity). The results in Figure 7 show that the best approach which leads to highest R2 is the one where our bootstrap-based test is used to select the number of factors, followed by the weekly rebalanced static factors approach, and then the naive rebalancing. Related to the discussion in the previous paragraphs, the R2 has decreased after the financial crisis of 2007-2008, especially during the European sovereign debt crisis in 2009. This can again be explained by the fact that the term structure of futures prices is noisy during financial turmoil, and the factor structure does not provide information about market uncertainty that can better predict the spot price and minimize the risk of exposure.

6.2

Asset pricing using PCA from S&P 500 stocks

In this second application, we use a sequential application of our ratio test of ‘unexplained’ quadratic variation to select the number of factors from a large cross-sectional data on stocks prices. These factors are then used to explain an even larger number of individual stock prices. Thereafter, we test the statistical significance of pairwise correlations between the pricing errors or the residuals of what is left after regressing the returns of these stocks on our factors. Theoretically, if these factors provide common information for the stocks, we expect the pairwise correlations to be equal to zero. We also compare the performance of our methodology with many classical asset pricing models such as CAPM model, the Fama and French (1993) three factor (FF3) model, the liquidity factor model of 34

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

Figure 7: Weekly R2 for a nearest delivery hedging strategy. This figure illustrates the week-onweek R2 of the out-of-sample hedging strategy. It reports the weekly rebalanced dynamic factors, weekly rebalanced static factors, and naive rebalancing. The sample runs from the first of January 1996 to the first of January 2017. The objective is to hedge the nearest delivery future (excluded from the PCA estimation) using factors built from the other tenors. The weekly rebalancing hedging strategy uses our PCA test to adjust the number of factors hence the calendar spread of futures in each portfolio changes weekly. Static factors simply uses the first three factors from the previous quarters end-of-daily returns, The January 1, 1996 to May 1, 2003 period is used as a ‘burn-in’ period for this model, hence the truncation. Na¨ıve rebalancing, only uses the next shortest maturity future to hedge the nearest delivery.

P´astor and Stambaugh (2003), and a model that is based on eleven sector portfolios from the stocks associated with Standard Industrial Classification codes. The best empirical asset pricing model is the one that leads to insignificant pairwise correlations. Thus, the objective of this empirical exercise is to illustrate the usefulness of high-frequency factor analysis for a low-frequency cross-sectional asset pricing setting. 6.2.1

Methodology

We use two different data sets for this analysis: (i) the historical record of tick-by-tick data for 564 members of the S&P 500 between January 1, 1996 to December 31, 2014 from the Thomson Reuters Tick History Time and Sales files, and (ii) for the same time period, the monthly CRSP returns data file for US stocks with 20, 011 firms. The tick-by-tick prices (first dataset) are used to extract and select the number of factors using our statistical procedure. These factors are then used to explain the monthly CRSP returns (second dataset). We next compare the performance of this approach to a set of standard asset pricing models using simple pair-wise sphericity tests. Our analysis is performed as follows. For the tick-by-tick data, we remove: (i) all zero prices and instant reversions from the bid and ask series; (ii) any records where the standing bid price is higher than the standing ask price; and (iii) any records where the bid/ask price is more than 500% different 35

1.5

1

0.5

0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Figure 8: Comparison of the monthly cumulative return of the first principal component normalized such that the sum of the eigenvector is unity and sampled from high frequency data at the end of each month with the value weighted US Stock Market index from CRSP.

from the daily median bid/ask. We next use the bid-ask tick series to compute the tick-by-tick mid price and record the tick times. We block each firm’s mid-prices into calendar months from January 1996 to December 2014. For each month, there will be a set of price changes for the available stocks in the cross-section of S&P 500. Thereafter, for each firm we compute the number of informative price changes; that is the number of ticks with a different price from the previous tick. For each month, we rank the stocks by informative price updates. Following A¨ıt-Sahalia and Xiu (2017b), we take the 100 stocks with largest number of informative price changes for a particular month and use a “previous nearest neighbour” interpolation to construct a five minute business time grid, and place the selected stock price data into a one month block by 100 stocks and compute the returns. After cleaning the tick-by-tick data, we compute the estimated integrated covariance matrix (2) and record its sorted eigenvalues from largest to smallest and their eigenvectors. We next apply the ratio test proposed in Section 4 in a sequential procedure by setting the required proportion of explained cross sectional quadratic variation to be π = 95% of total variation, and we record the suggested number of principal components contained in the 100 assets. We then run our bootstrap-based test for the ratio of ‘unexplained’ quadratic variation over the entire set of months in the data set (227 calendar months) and find the maximum and minimum number of principal components indicated from the test. Thereafter, we construct the component portfolios by using as weights the highest number of eigenvectors indicated by the ratio test over the sample, and we record the end-of-month value of the component portfolio in a vector ft .

36

16 0.6

14

0.5

12

10 0.4

8 0.3

6 0.2

4

0.1

2

0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

Figure 9: The left panel of this figure illustrates the number of factors selected using our ratio test. From a 100 tick-by-tick mid price (return), this number of factors is tested by setting the required proportion of explained cross sectional variation to be 95%. The right panel of this figure illustrates the proportion of variation explained by several factors: first factor, second factor, third factor, fourth factor, 50th factor, and 90th factor.

In Figure 8, we plot the value weighted return index for the US stock market from CRSP and the estimated cumulative return for the first component of ft , which represents the portfolio of 100 most active stocks per month weighted by the eigenvector corresponding to the largest eigenvalue of the covariance matrix. An interesting point to note here is that whilst there are some common trends between the two variables, there are notable differences, particularly in the down turns. After building the monthly component portfolios (our factors), we compare their effectiveness in explaining the cross-sectional variation of large number of stocks from the second data set (CRSP). The following describes our procedure for preparing the CRSP dataset and illustrates the effect of the component portfolios on the degree of sphericity in CRSP returns. From the CRSP data file, we extract the returns, which include dividends or/and other profits received by investors. We eliminate any stock that has less than 75 months of returns over the period 1996 to 2014, which leaves us with 9, 255 stocks in the sample. We next run the following time series regression for each stock i m ri,t = βi0 ftm + εi,t , for i = {1, . . . , 9255},

(39)

m is the monthly CRSP stock return and f m is the cumulative return over a month for the where ri,t t

vector of factors. We then compute the pairwise covariances between pricing errors and record this in a matrix S = [cov(ei,t , ej,t )]. Notice that because the elements of S are constructed pairwise, the matrix S is not guaranteed to be positive semi-definite. It is inherently reduced rank as the number of variables is larger than the the number of observations. We next calculate the pairwise correlations p between pricing errors ρij = cov(ei,t , ej,t )/ var(ei,t )var(ej,t ), for i 6= j and i, j = 1, ..., 9255, and record this in a matrix S ρ . Thereafter, for each correlation ρij , we use an adjusted standard t-statistic to test H0 : ρij = 0 against H1 : ρij 6= 0, and in a sized matrix M we record 1 if the null is rejected at 5% significance level, and 0 otherwise. Following A¨ıt-Sahalia and Xiu (2017b), we apply pair-wise sphericity tests and adjust the individual p-values to match the joint power of these tests; see Onatski, Moreira, and Hallin (2013). In this instance, 9,255 yields 42,822,885 pairwise correlations. Using a 37

Figure 10: This figure illustrates the outcome of the tests for significant pairwise correlations (see Section 6.2.1) between pricing errors εi = ri,monthly − βi0 Fmonthly from the asset pricing models CAPM, FF3, FF3+Liq, and Sector portfolios across all CRSP stocks traded over the Jan 1996 to Dec 2014 period, using the CRSP total return measure. We exclude stocks with less than 75 available monthly returns. The dots in this figure indicate the rejection of the null hypothesis of no pairwise-correlation. In each subfigure, the total number of tested pairwise correlations is 42,822,885 i.e. the number of off-diagonal correlations ((N 2 − N )/2, with N = 9255). We set the p-value to 0.05/42,822,885, which yields a critical value of around 6.22 for the t-test. ‘nz’ is the number of statistically significant pairwise correlations.

p-value of 0.05/42,822,885 - the most conservative version of the test - yields a critical statistic of around 6.22 for 227 months of data and 6.36 for 147 months, the average number of available returns. We then construct matrix plots with points representing the significance of the pairwise correlation. The stocks are sorted as follows. First, they are placed into sector bins following the example in A¨ıtSahalia and Xiu (2017b) and second, they are sorted (largest to smallest) by the absolute value of the weighting of the stock in the eigenvector corresponding to the largest eigenvalue of S. This exercise is repeated for each of the standard models and for four models based on our PCA analysis. Notice that unlike in the previous application where the model is run ‘pseudo out-of-sample’, this approach follows in the tradition of Cochrane (1996); Gomes, Kogan, and Zhang (2003); Jagannathan and Wang (2007); Cooper, Gulen, and Schill (2008), but using individual stock returns in the spirit of Ang, Liu, and Schwarz (2008), with the objective of assessing to what extent do the factors from the high frequency PCA mimic the well established factors in orthogonalizing a very large cross section of lower frequency returns.

38

Figure 11: Significant pairwise correlations for our Principal Component Factors derived from the S&P 500 cross section’s 100 most actively traded assets. The monthly factors are computed from five minute data. For the definition of the points in the plot see the caption in Figure 10.

6.2.2

Results

The number of factors that we should select each month according to our ratio test is illustrated in the left panel of Figure 9 whereas the right panel reports the proportion of variation explained by each factor. The left panel shows that the selected number of factors is time-varying as it changes every month. If we set the required proportion of explained cross-sectional variation to be 95%, then from the 100 tick-by-tick mid price (return) and for the time period under consideration, this number varies between a minimum of one factor and a maximum of 14 factors. The pre-2001’s period is characterized by high number of factors, followed by a 2003-2011 period where this number is low and more or less stable, and then the post-2011’s period where this number is higher again. This motivates the idea of considering an asset pricing model where the number of factors changes over time. For a fair comparison with the standard benchmark models - where the number of factors is assumed to be the same over time - for the analysis of the effectiveness (in terms of explaining the cross-sectional variation of CRSP stocks) of our factors compared to those of classical models, we assume that the number of factors is the same over the entire set of months in the data set. This number is equal to the maximum number of factors suggested by our ratio test, which is 14 factors. In

39

Appendix B, we estimate a Fama-Macbeth (Fama and MacBeth, 1973) cross sectional regression on the individual stocks in the spirit of Ang, Liu, and Schwarz (2008), and we compare the PCA factors from high frequency prices against those from the standard asset pricing models. The estimation results are illustrated in Table 4 of Appendix B. We can see that the R2 for the second pass Fama-Macbeth regressions are quite comparable to the standard models (improving in most cases). Hence, the PCA from high frequency prices appears useful as it can generate an arbitrary number of factors to suit the requirement of the econometrician. In particular, our largest specification runs with 14 factors, which is the maximum number of factors detected using our ratio test. However, the 4 and 11-factor models have nearly identical R2 and each is better than the equivalent sized standard asset pricing model. We next compare the effectiveness of our factor-based asset pricing models with those of standard benchmark models. For each model, the total number of tested pairwise correlations is 42,822,885. The results for our models and benchmark models are reported in Figures 10 and 11, respectively. The squares in these two figures highlight the CRSP stocks that belong to the same sector, and the dots represents the statistical significance of the pairwise correlations. If the factors provide a common information shared by the stocks, we expect the pairwise correlations to be equal to zero. Thus, roughly speaking, an effective model is the one that leads to less dots in the figure. The comparison between Figure 10 and Figure 11 illustrate some interesting observations. First, the value weighted market portfolio outperforms the first PCA factor in terms of the number of pairwise correlations that are significantly different from zero. Second, in contrast, once we get to four factors, the PCA factors from high frequency prices beats the FF3 and Liquidity factors models. Finally, comparison of 11 PCA factors versus the value weighted sector portfolios illustrates a very significant improvement with only 43,156 significant correlations versus 82,692 for the sectors. The lack of performance of the sector portfolios as factors versus the CAPM is partially due to the fact that sectors only partly reflect the common factor structure and thus they appear to add more noise into the first pass regressions. Indeed, if we consider the maximum indicated factors over the 1996-2015 time frame, which is 14, the number of pairwise correlations is substantially less again at 42,734.

7

Conclusion

We consider the class of Itˆ o semimartingale processes and propose a likelihood-ratio-type test for the structure of eigenvalues of integrated covariance matrix. Unlike the existing approaches, where the cross-section dimension grows to infinity, our test does not require large cross-section and thus it opens the door to a wide variety of applications. In particular, our test can be applied to test for some specific factor representation like the one where the integrated covariance matrix is decomposed to sum of factor structure that exhibit few eigenvalues that are large plus the remaining eigenvalues of same magnitude reflecting the common magnitude of the idiosyncratic shocks. This structure is very useful for the empirical analysis of financial data as it represents the factor structure displayed by the integrated covariance matrix of S&P100 index constituents. Furthermore, a test for ‘unexplained’ quadratic variation is proposed to investigate whether a given set of factors ‘explains’ at least a given proportion of integrated variance in the continuous part of the underlying process. We derive the

40

asymptotic distribution (under the null) of our likelihood ratio test statistic and find that the latter is non-standard with many nuisance parameters. Another main contribution of this paper consists in proposing some variant of the blocks of blocks bootstrap of Hounyo, Gon¸calves, and Meddahi (2017) to approximate the above asymptotic distribution and establish its first-order asymptotic validity. The bootstrap procedure that we introduce does not require the estimation of the nuisance parameters, it provides a better approximation for the asymptotic distribution of our test statistic, and it is simple to implement. The finite sample properties of the asymptotic and bootstrap tests have been investigated by an extensive Monte Carlo simulation study, where several data generating processes have been considered as well as small and large cross-section dimensions. The results reveal that the bootstrap test has a very good size and power performance. However, the test based on the standard chi-squared asymptotic distribution - which is valid only if the underlying process is continuous L´evy - systematically overrejects the null except, as expected, in the case of L´evy dynamics. Finally, we consider two empirical applications to illustrate different facets of the analysis that is feasible within our framework. In our first application, we apply our bootstrap-based test to a relatively low dimensional data to build a hedging strategy for the spot of Crude Oil using future contracts. Comparing our method to several standard approaches, the results show that the most effective hedging strategy is the one that is based on the number of factors that are selected using our test. In our second application, we apply our ratio test of ‘unexplained’ quadratic variation to select the number of factors from a relatively high dimensional cross-sectional data on stocks prices. These factors are then used to explain an even larger number of individual stock prices. We then test the statistical significance of pairwise correlations between the pricing errors of what is left after regressing the returns on our factors. After comparing the performance of our methodology with several standard asset pricing models (benchmark models), we find that our factor-based asset pricing model lead to reduction in the number of statistically significant pairwise correlations between pricing errors, thus it performs better than the benchmark models.

References A¨ıt-Sahalia, Y., and Jacod, J. (2014). High-Frequency Financial Econometrics. Princeton University Press. A¨ıt-Sahalia, Y., and Xiu, D. (2017a). ‘Principal component analysis of high frequency data’, Journal of the American Statistical Association, Forthcoming. (2017b). ‘Using principal component analysis to estimate a high dimensional factor model with high-frequency data’, Journal of Econometrics, Forthcoming. Andersen, L. B., J¨ ackel, P., and Kahl, C. (2010). ‘Simulation of Square-Root Processes’, Encyclopedia of Quantitative Finance. Anderson, T. W. (1963). ‘Asymptotic theory for principal component analysis’, The Annals of Mathematical Statistics, 34(1): 122–148. Ang, A., Liu, J., and Schwarz, K. (2008). ‘Using individual stocks or portfolios in tests of factor models’, .

41

Barndorff-Nielsen, O. E., and Shephard, N. (2004). ‘Econometric analysis of realized covariation: High frequency based covariance, regression, and correlation in financial economics’, Econometrica, 72(3): 885–925. Chamberlain, G., and Rothschild, M. (1983). ‘Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets’, Econometrica, 51(5): 1281–1304. Chu, K.-W. E. (1990). ‘On Multiple Eigenvalues of Matrices Depending on Several Parameters’, Journal on Numerical Analysis, 27(5): 1368–1385. Cochrane, J. H. (1996). ‘A cross-sectional test of an investment-based asset pricing model’, Journal of Political Economy, 104(3): 572–621. Cook, R. D., and Setodji, C. M. (2003). ‘A model-free test for reduced rank in multivariate regression’, Journal of the American Statistical Association, 98(462): 340–351. Cooper, M. J., Gulen, H., and Schill, M. J. (2008). ‘Asset growth and the cross-section of stock returns’, The Journal of Finance, 63(4): 1609–1651. Davis, A. (1977). ‘Aymptotic theory for principal component analysis: Non-normal case’, Australian & New Zealand Journal of Statistics, 19(3): 206–212. Dovonon, P., Gon¸calves, S., and Meddahi, N. (2013). ‘Bootstrapping realized multivariate volatility measures’, Journal of Econometrics, 172(1): 49–65. Eaton, M. L., and Tyler, D. E. (1991). ‘On Wielandt’s inequality and its application to the asymptotic distribution of the eigenvalues of a random symmetric matrix’, Annals of Statistics, pp. 260–271. Fama, E. F., and French, K. R. (1993). ‘Common risk factors in the returns on stocks and bonds’, Journal of financial economics, 33(1): 3–56. Fama, E. F., and MacBeth, J. D. (1973). ‘Risk, return, and equilibrium: Empirical tests’, Journal of political economy, 81(3): 607–636. Gomes, J., Kogan, L., and Zhang, L. (2003). ‘Equilibrium cross section of returns’, Journal of Political Economy, 111(4): 693–732. Hiriart-Urruty, J.-B., and Ye, D. (1995). ‘Sensitivity analysis of all eigenvalues of a symmetric matrix’, Numerische Mathematik, 70(1): 45–72. Hounyo, U. (2017). ‘Bootstrapping integrated covariance matrix estimators in noisy jump–diffusion models with non-synchronous trading’, Journal of Econometrics, 197(1): 130–152. Hounyo, U., Gon¸calves, S., and Meddahi, N. (2017). ‘Bootstrapping pre-averaged realized volatility under market microstructure noise’, Econometric Theory, 33(4): 791–838. Jacod, J., and Protter, P. (2012). Discretization of processes. Springer-Verlag, Berlin Heidelberg. Jagannathan, R., and Wang, Y. (2007). ‘Lazy Investors, Discretionary Consumption, and the CrossSection of Stock Returns’, The Journal of Finance, 62(4): 1623–1661. Lehmann, E. L., and Romano, J. P. (2005). ‘Generalizations of the familywise error rate’, Annals of Statistics, pp. 1138–1154. Li, J., Todorov, V., and Tauchen, G. (2017). ‘Jump regressions’, Econometrica, 85(1): 173–195. Lord, R., Koekkoek, R., and Dijk, D. V. (2010). ‘A comparison of biased simulation schemes for stochastic volatility models’, Quantitative Finance, 10(2): 177–194. Muirhead, R. J. (1982). ‘Aspects of multivariate statistical analysis.’, John Wiley & Sons. 42

Onatski, A. (2010). ‘Determining the number of factors from empirical distribution of eigenvalues’, The Review of Economics and Statistics, 92(4): 1004–1016. Onatski, A., Moreira, M. J., and Hallin, M. (2013). ‘Asymptotic power of sphericity tests for highdimensional data’, Annals of Statistics, 41(3): 1204–1231. P´astor, L., and Stambaugh, R. F. (2003). ‘Liquidity risk and expected stock returns’, Journal of Political economy, 111(3): 642–685. Pelger, M. (2015). ‘Large-dimensional factor modeling based on high-frequency observations’, Discussion paper, Stanford University. Tyler, D. E. (1981). ‘Asymptotic inference for eigenvectors’, Annals of Statistics, pp. 725–736. Waternaux, C. M. (1976). ‘Asymptotic distribution of the sample roots for a nonnormal population’, Biometrika, 63(3): 639–645.

43

A

Proofs of Theorems, Propositions and Lemmas

First of all, we introduce the following result that characterizes the asymptotic distribution of estimated eigenvalues and normalized eigenvectors. Let us first introduce some notation. Let Bn be a (q, q)matrix that consistently estimates a symmetric positive definite matrix Σ, and ∆ the (q, q)-diagonal matrix with the eigenvalues δ1 ≥ . . . ≥ δq of Σ as diagonal elements. Assume that these eigenvalues have the structure in (7). Let Γ be an orthogonal matrix of normalized eigenvectors of Σ, i.e. ΓΓ0 = Iq

and

Γ0 ΣΓ = ∆.

Let An = Γ0 Bn Γ. Note that An and Bn have the same eigenvalues d1 ≥ · · · ≥ dq , and let Dn be the b an orthogonal matrix of normalized eigenvectors diagonal matrix containing those eigenvalues and E of An with nonnegative diagonal elements. Let Un = rn (An − ∆)

b = rn (Dn − ∆), and H

bkl and Un,kl denote the (qk , ql )-submatrix of E b and Un , respectively, with rn → ∞ as n → ∞. Let E with elements at the intersection of rows and columns with index in Lk and Ll , respectively; for b k be defined as E bkk , but from H b and Fbkl = rn E bkl , for k 6= l. We have the k, l = 1, . . . , r. Let H following result which is a simple adaptation of the results of Anderson (1963) [see also Davis (1977)]. Proposition A.1. If the eigenvalues of Σ have the structure in (7), almost surely and L−s

Un −→ U, and the functionally unrelated elements of U have a joint distribution that is continuous with respect to the Lebesgue measure in Rq(q+1)/2 , then, for k, l = 1, . . . , r, −2 bkk E b0 E kk = Iqk + OP (rn ) bkk H bk E b 0 + OP (rn−1 ) Un,kk = E kk

b 0 + OP (rn−1/2 ), k 6= l bkk Fb0 + λl Fbkl E Un,kl = λk E ll lk bkk Fb0 + Fbkl E b 0 + OP (rn−1/2 ), k 6= l, 0 = E lk

(40)

ll

bkk , H b k and Fbkl converge stably in law to limiting distributions Ekk , Hk and Fkl , respectively, and E uniquely defined in terms of U by the equations: 0 Ekk Ekk = Iqk 0 Ukk = Ekk Hk Ekk 0 + λ F E0 , Ukl = λk Ekk Flk l kl ll 0 + F E0 , 0 = Ekk Flk kl ll

k 6= l

(41)

k 6= l,

where Hk is diagonal and Ekk is restricted to have nonnegative diagonal elements; and Ukl is defined similarly to Un,kl but from U .

44

√ The proof of this proposition follows readily from Anderson (1963), with n replaced by rn . bkk , H b k and Fbkl follows from the stable convergence in The stable convergence in law deduced for E bkl = oP (1) for k 6= l. law of Un . The last two equalities in (40) imply that Fbkl = OP (1), therefore, E Proof of Proposition 3.1: ˆ k of λk (k = 1, . . . , r) is obtained by solving the first order (a) The maximum likelihood estimator λ ˆk = condition associated with the log-likelihood function in (12) and it is straightforward to get λ P 1 ˜ i∈Lk di . qk (b) Since the log-likelihood in (12) is additively separable in λk ’s, it is maximized under H0 by C−

X d˜i n ˜k − n qk log λ , ˜k 2 2 λ i∈L

X ˜k = 1 d˜i , with λ qk i∈Lk

k

where C is the maximum of the part of log-likelihood that depends on λs , for s 6= k. The unrestricted likelihood is maximized by C−

X d˜i n X ˜i − n log λ , ˜i 2 2 λ i∈L i∈L k

˜ i = d˜i . with λ

k

The expression of the likelihood ratio criterion in (13) follows by straightforward derivations. (c) Note that ˜b has finite mean and variance and therefore is OP (1). As a result, f n − IVT = IV

n n X X (∆ni X)(∆ni X)0 − IVT + OP (∆n ) = yi yi0 − IVT + OP (∆n ), i=1

i=1

where yi = ∆ni X − ∆n b ∼ N 0, ∆Tn IVT . In this derivation we use the simplification that T /∆n is integer, hence equal to n. Let Γ be the matrix of normalized eigenvectors of IVT defined such that: Γ0 Γ = Iq

Γ0 IVT Γ = ∆,

and

where ∆ is the diagonal matrix with diagonal vector δ = λ(IVT ). Let Un = √ We have

1 0fn Γ IV Γ − ∆ . ∆n

n ∆ 1 X Un = √ zi − + oP (1), n ∆n i=1

with zi = (Γ0 yi )(Γ0 yi )0 . We can easily verify the Lyapunov central limit theorem conditions and deduce that: d

Un −→ U,

45

(42)

where U is normally distributed with mean 0 and covariance: Cov(uij , ugh ) =

δi δj (δig δjh + δih δjg ), T

with uab denote a generic component of U and δab = 1 if a = b and 0 otherwise. gk . We have We now derive the asymptotic distribution of LR

X

gk = −n LR

i∈Lk

b = Let H

√1 ∆n

X 1 log d˜i − qk log d˜i . qk

(43)

i∈Lk

˜ − ∆ , where D ˜ is the diagonal matrix containing d˜ = λ(IV f n ) as diagonal. Let E b D n

f Γ, that is: be the matrix of normalized eigenvectors of An = Γ0 IV bE b 0 = Iq , E

b 0 An E b = D, ˜ E

bkk (Un,kk , Ukk ) is the submatrix of E b (Un , U ) with rows and columns with indexes in Lk . and E From Proposition A.1, we have b k = OP (1), H

b 0 + oP (1) bk E bkk H Un,kk = E kk

b 0 = Iq + oP (1). bkk E and E kk k

(44)

b ii for i ∈ Lk , we have Hence, with hi ≡ H P

gk = −n LR

log 1 +

i∈Lk

= −n

" P

i∈Lk

=

n∆n 2λ2k

P

∆n hi λk

∆n λk h i

1 qk

−

∆n 2λ2k

P i∈Lk

T 2λ2k

!!

!2

√

h2i − qk qk∆λkn

(tr(Hk ))2 =

h

√ i∈L i − qk log 1 + ∆n qk λkk

#

√

tr(Hk2 ) −

√

P

hi −

i∈Lk

2 )− tr(Un,kk

1 qk

∆n 2qk2 λ2k

P

hi

+ oP (∆n )

(45)

i∈Lk

(tr(Un,kk ))2 + oP (1),

where the second and third equalities follow from a second order Taylor expansion and Equation (44), gk converges in distribution to respectively. Thus LR T 1 T 2 )− tr(Ukk (tr(Ukk ))2 = 2 2 qk 2λ2k 2λk

2

X i

u2ij +

X

u2ii −

i∈Lk

1 qk

X

uii .

i∈Lk

Thanks to Equation (42), uij = uji and is independent of all the other entries of U . Moreover, uii ∼ N (0, 2λ2k /T ) and uij ∼ N (0, λ2k /T ) for i 6= j. Therefore, it follows that: 2 T X 2 1 X uii − uii ∼ χ2qk −1 qk 2λ2k i∈Lk

i∈Lk

46

and is independent of T λ2k

X

u2ij ∼ χ21 q

(q −1) 2 k k

i

.

gk is asymptotically distributed as a χ21 As a result, LR (q

k −1)(qk +2)

2

.

gk diverges to infinity under the alternative. Let λk,i , for i = 1, . . . , qk , be We now show that LR the eigenvalues of IVT inthe cluster Lk . Under the alternative, at least two of them are distinct. As 1 ˜ previously, let hi = √∆ di − λk,i , i ∈ Lk . From Equation (43), we have n

P

gk = −n LR

log(λk,i +

√

1 qk

∆n hi ) − qk log

i∈Lk

= −n

P

log λk,i +

i∈Lk

−qk log

√

P i∈Lk

P

λk,i

i∈Lk

qk

i ∆n λhk,i

√ − qk ∆ n

−

P

h2i 1 2 ∆n λ2k,i

P

λk,i

P

= nqk

log

λk,i

qk

−

1 qk

∆ n hi )

P

hi

i∈Lk P

− 12 ∆n

!2

λk,i

+ oP (∆n )

i∈Lk

! P

!!

i∈Lk

hi

i∈Lk

(λk,i +

√

+ oP (∆n )

i∈Lk

i∈Lk

P

log λk,i

√ + OP (n ∆n ),

i∈Lk

where the second equality isPobtained from a second order Taylor expansion. Since log is strictly λk,i P i∈Lk gk → ∞ as n → ∞, in probability. concave, it follows that log > q1k log λk,i and LR qk i∈Lk

(d) Similar to the proof of (b), under H0 (λ), the log-likelihood in (12) is maximized by C−

n X d˜i n qk log λ − 2 2 λ i∈Lk

and the expression of the likelihood ratio criterion in (14) follows easily. gk,λ is obtained similarly to that of LR gk derived above. We have The asymptotic distribution of LR gk,λ = −n P log d˜i + nqk log λ + LR i∈Lk

P

= −n

log 1 +

√

i∈Lk

P √∆n hi

= −n

i∈Lk

=

n∆n 2λ2

P i∈Lk

λ

−

∆n hi λ

∆n h2i λ2

h2i + oP (1) =

47

n λ

P ˜ di − nqk i∈Lk

√

+n

∆n λ

P

hi

i∈Lk

√ P n + oP (∆n ) + n ∆ hi λ

T 2 ) tr(Un,kk 2λ2

i∈Lk

+ oP (1),

(46)

! T 2λ2

which converges in distribution to

P

2

i

u2ij

+

P i∈Lk

u2ii

. Recalling the distribution of uij as

gk,λ converges in distribution to χ21 given in the proof of (c) above, we can claim that LR q

(q +1) 2 k k

.

gk,λ diverges under the alternative to H0 (λ). Similar calculations to those in We next show that LR gk above lead to the proof of divergence of LR X λk p λki i + OP (n ∆n ). − 1 − log λ λ

gk,λ = n LR

i∈Lk

Note that x 7→ x − 1 − log x is nonnegative on (0, +∞) and takes value 0 only at x = 1. Since under gk,λ → ∞, in probability. the alternative λk /λ 6= 1 for at least one i ∈ Lk , we can conclude that LR i

Proof of Equation (16): First of all, recall that ∆in X ∼ N (∆n b, ∆n c). We have fn = IV

n X

∆in X

n 0 X n i ˜ ˜ − ∆n b ∆n X − ∆n b = (∆in X)(∆in X)0 − ∆n T ˜b˜b0 = IV + OP (∆n ).

i=1

i=1

To prove the second equality, observe that f n − IV n = IV

n X

(yi − ∆n˜b)(yi − ∆n˜b)0 1{kyi k>α∆$ + OP (∆n ), n}

i=1

where yi ≡ ∆in X. Hence, p 1 f n −IV c n ) = ∆n c1/2 √ (IV ∆n

n X

! −1/2 ∆−1 (yi − ∆n˜b)(yi − ∆n˜b)0 c−1/2 1{kyi k>α∆$ n c n}

p c1/2 +OP ( ∆n ).

i=1

The first term in the right hand side is then of the same order of magnitude as n X p p ∆n zi zi0 1{kyi k>α∆$ + O ( ∆n ), P } n i=1

where zi ’s are N (0, Iq ). By The Cauchy-Schwarz inequality, we have

n n

p

p X X

0 zi zi 1{kyi k>α∆$ kzi k2 1{kyi k>α∆$ .

≤ ∆n

∆n n } n}

i=1

i=1

To conclude the claimed order of magnitude, it suffices to show that the right hand side of the above inequality converges in absolute mean to 0. By the Cauchy-Schwarz and Markov inequalities, there exists a (generic) constant C > 0 such that, for any ` > 0, E

n X p ∆n kzi k2 1{kyi k>α∆$ n} i=1

! ≤C

n n X X 1 p ` − `$ 1/2 2 2 2 ∆n P (kyi k > α∆$ ) ≤ C∆ E ky k . n i n i=1

48

i=1

Note that

E kyi k

` 2

` ` p ` ` ` 1/2 4 2 ˜ ˜ 2 2 2 , = Ek ∆n c zi + ∆n bk ≤ C ∆n Ekzi k + ∆n Ekbk

where we have used the Cauchy-Schwarz and the Cr inequalities. n 1 ` − `$ P Hence, the leading term of ∆n2 2 E kyi k 2 is at most of order i=1 1

n∆n2

− `$ + 4` 2

− 1 − `$ + 4` 2

= T ∆n 2

and the expected convergence to 0 is warranted since we can find ` such that $ < Proof of Theorem 3.1: Since Un ≡

√1 ∆n

1 2

− 1` .

n c n Γ − ∆ = Γ0 √ 1 c − IVT IV Γ converges Γ0 IV ∆ n

stably in law to UT , which has a continuous distribution, Proposition A.1 can be applied as in the gk in Equation (45) also holds for proof of Proposition 3.1 and we can claim that the expansion of LR gk,λ in Equation (46) holds for LRk,λ . We can therefore write LRk and that of LR

T LRk = 2 2λk

1 2 2 tr Un,kk − (tr [Un,kk ]) + oP (1), qk

and LRk,λ =

T 2 + oP (1), tr Un,kk 2 2λ

where Un,kk is the submatrix of Un at the intersection of rows and columns with indexes in Lk . The claimed result then holds by the continuous mapping theorem. The proof of divergence of these test statistics follows the exact same lines as the proof of their respective counterparts of Proposition 3.1.

Proof of Theorem 3.2: Under the conditions on the theorem, Un =

√1 ∆n

c n Γ − ∆ converges Γ0 IV

stably in law to UT = Γ0 WT Γ; see Equation (17). Assume without loss of generality that the eigenvalues of IVT have the structure in Equation (7) and let Lk (k = k1 , . . . , r) be the clusters of the q−Q smallest eigenvalues of IVT . We have: q X

di =

i=Q+1

r X X k=k1 i∈Lk

di =

r X

X

k=k1

[di − λk ] + qk λk .

i∈Lk

c n , Un = Un , etc., this proposition allows Using the notations leading to Proposition A.1, with Sn = IV us to claim that √

1 X b k ) = tr(Un,kk ) + oP (1). (di − λk ) = tr(H ∆n i∈L k

Thus, q r r r X X 1 1 X 1 X √ tr(Un,kk )+ √ qk λk +oP (1) = tr (Un,Q+1:q,Q+1:q )+ √ qk λk +oP (1). di = ∆n i=Q+1 ∆n k=k ∆n k=k k=k 1

1

1

49

Similarly,

q r 1 X 1 X √ di = tr (Un ) + √ qk λk + oP (1). ∆n i=1 ∆n k=1

As a result, r r X 1 X Tn = tr (Un,Q+1:q,Q+1:q ) − π · tr (Un ) + √ qk λ k − π · qk λk + oP (1). ∆n k=k k=1 1

(a) If the null hypothesis holds with equality, the claimed asymptotic distribution is obtained thanks to the continuous mapping theorem. (b) If the null holds with strict inequality, Tn diverges to −∞. (c) If the null does not hold, Tn diverges to +∞, thus (c). Proof of Theorem 3.3: The proof of this theorem follows the same lines as that of Theorem 3.1 and uses the continuity of the asymptotic distribution UTρ in (25).

Proofs of Proposition 4.1 and Corollary 4.1: We rely on Theorems 3.1 and 3.2 of Hounyo (2017) to establish these results. For this, it suffices to check the Condition A in Hounyo (2017). That is: (i) For k, l, k 0 , l0 = 1, . . . , q, n−1

nX (yi,k yi,l − yi+1,k yi+1,l ) yi,k0 yi,l0 − yi+1,k0 yi+1,l0 2 i=1

Z converges in probability to (ii)

n1+ √

(iii)

n P

T

h 0 0 i ll kl0 lk0 ckk ds, s cs + cs cs

0

|yi,k yi,l

|2+

= OP (1), for k, l = 1, . . . , q and some > 2 and

i=1 2+

n 1+ n

is obvious.

= o (1). Both (i) and (ii) follow from Theorem 9.4.1 of Jacod and Protter (2012) while (iii)

∗ and U ∗ b0 ∗ b b b Proof of Theorem 4.1: Ukk Q+1:q,Q+1:q can both be written as Γ0 Sn Γ0 with Γ0 = ΓE0 , where S r b0 is a matrix equal to the collection of columns of E b indexed by Lk or E k=k1 Lk = {Q + 1, . . . , q}. ∗ Thanks to the arguments leading to Equation (27), we can claim that LRk∗ , LRk,λ and Zn∗ converge

in distribution to the same limit distributions as LRk , LRk,λ and Zn as given in Theorem 3.1(a) and (b) and Theorem 3.3(a), respectively. The claimed uniform consistencies in part (a), (b) and (c) of Theorem 4.1 follow from the continuity of these asymptotic distributions.

50

B

Online Appendix: Fama and Macbeth Regressions

Second pass Fama and Macbeth cross sectional regressions for the 9,255 stocks with more than 75 months of returns in CRSP for the January 1, 1996 to December 31, 2014 time period. The first four models are for comparison purposes and present results for a standard set of factors. The second four models look at one, four, eleven and 14 factors extracted using our tests from the 100 most actively traded stocks from the S&P 500 cross section. Eleven and 14 factors represent the mid and upper bound for the number of factors extracted each month from five minute return data.

Table 4: Cross Sectional Factor Regressions

γ1 γ2

CAPM E[Ri − rf ] 0.0678∗∗∗ (0.0023) 0.0068∗∗∗ (0.0002)

FF3 E[Ri − rf ] 0.0676∗∗∗ (0.0024) 0.0056∗∗∗ (0.0003) 0.0024∗∗∗ (0.0002) 0.0002∗∗∗ (0.0000)

FF3 + Liq. E[Ri − rf ] 0.0668∗∗∗ (0.0023) 0.0056∗∗∗ (0.0003) 0.0023∗∗∗ (0.0002) 0.0001∗∗∗ (0.0000) 0.0002∗∗ (0.0001)

Sectors E[Ri − rf ] 0.0952∗∗∗ (0.0017) 0.0012∗∗∗ (0.0002) 0.0002∗ (0.0002) 0.0012∗∗∗ (0.0002) 0.0001 (0.0001) 0.0004∗∗∗ (0.0001) 0.0003∗∗∗ (0.0001) 0.0005∗∗∗ (0.0001) 0.0003∗∗∗ (0.0001) 0.0006∗∗∗ (0.0001) 0.0004∗∗∗ (0.0001)

PCA(1) E[Ri − rf ] 0.0899∗∗∗ (0.0014) 0.0710∗∗∗ (0.0027)

PCA(4) E[Ri − rf ] 0.0857∗∗∗ (0.0016) 0.0639∗∗∗ (0.0028) 0.0027 (0.0022) 0.0313∗∗∗ (0.0018) 0.0102∗∗∗ (0.0013)

PCA(11) E[Ri − rf ] 0.0746∗∗∗ (0.0017) 0.0569∗∗∗ (0.0026) 0.0030∗ (0.0022) 0.0172∗∗∗ (0.0013) 0.0020∗ (0.0014) 0.0203∗∗∗ (0.0014) 0.0090∗∗∗ (0.0008) 0.0068∗∗∗ (0.0010) 0.0001 (0.0009) 0.0046∗∗∗ (0.0008) 0.0004 (0.0008) 0.0021∗∗∗ (0.0008)

PCA(14) E[Ri − rf ] 0.0707∗∗∗ (0.0018) 0.0578∗∗∗ (0.0027) 0.0049∗∗ (0.0021) 0.0121∗∗∗ (0.0016) 0.0023∗ (0.0014) 0.0216∗∗∗ (0.0013) 0.0065∗∗∗ (0.0008) 0.0064∗∗∗ (0.0010) 0.0011 (0.0010) 0.0043∗∗∗ (0.0008) 0.0008 (0.0008) 0.0036∗∗∗ (0.0008) 0.0035∗∗∗ (0.0007) 0.0027∗∗∗ (0.0005) 0.0060∗∗∗ (0.0007)

0.042 9,255

0.047 9,255

0.046 9,255

0.058 9,255

0.045 9,255

0.058 9,255

0.057 9,255

0.057 9,255

γ3 γ4 γ5 γ6 γ7 γ8 γ9 γ10 γ11 γ12 γ13 γ14 γ15 ¯2 R Obs.

Notes: Cross sectional Fama and Macbeth regressions for 9,255 CRSP selected stocks with more than 75 months of data within the 1996 to 2015 period. The stock returns are inclusive of all payments, risk free rates are extracted from the WRDS Fama and French monthly factor series. CAPM refers to the capital asset pricing model, FF3 the Fama and French Three factor model, FF3 + Liq. the Fama and French Three factor model with the PastorStambaugh liquidity factor, and ‘Sectors’ refers to the 12 digit SIC code sectors recorded in the CRSP data set. PCA(#) refers to the number of principal component portfolios constructed from five minute business time returns for a month using the 100 most actively traded stocks from the S&P 500, the eigenvectors corresponding to the sorted (from largest to smallest) eigenvalues are used to build the PCA portfolios at a monthly frequency, using the corresponding stock return including all payments from the CRSP dataset. The linear form of the Fama and Macbeth cross sectional regression is as follows: E[Ri − rf ] = γ1 + γ 0 bi + ξi with γ = [γi ], for i ∈ {2, . . . , q}, for q-factors and ξi is an i.i.d. disturbance term. Removal of the alpha does not materially affect the results. All regressions are estimated with heteroskedasticity robust standard errors. All of the variables in the first stage. Recall that the presumption is that γ1 = 0, hence for the intercept for the CAPM model of the security market line is at the risk free rate.

51