Estimating the Number of Common Factors in Serially ...

Viewer
Transcript

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models∗ Chirok Han Department of Economics Korea University

Ryan Greenaway-McGrevy Bureau of Economic Analysis Washington, D.C.

Donggyu Sul Department of Economics University of Texas at Dallas April 2009

Abstract When panel data have considerable serial dependence, the Bai-Ng criteria are shown to overestimate the true number of common factors. To overcome this problem, we suggest filtering the data before applying the Bai-Ng method. Despite possible bias and model misspecification, both the AR(1) least squares dummy variable and the first-differencing filtering methods are shown to be consistent in both weakly and strongly serially correlated panels. According to simulations these filtering methods considerably improve the finite sample performance of the Bai-Ng selection methods applied to the original data. Keywords: Factor Model, Selection Criteria, Serial Correlation, Weakly Intergrated Process, Weak Unit Root, Principal Components Estimator, Bai-Ng Criteria. JEL Classification: C33

∗

The views expressed herein are those of the authors and not necessarily those of the Bureau of Economic Analysis or the Department of Commerce. Han gratefully acknowledges support from the Marsden Fund. The authors thank Donald Andrews, Yoosoon Chang, In Choi, Yoonseok Lee and H. Roger Moon for helpful comments. Corrresponding author: Chirok Han, Department of Economics, Korea University, Anam-dong Seongbuk-Gu, Seoul, Korea, tel. +82 2 3290-2205, fax +82 2 928-4948, email [email protected].

1

1

Introduction

The precise estimation of the number of common factors is a corner stone of the latent factor model literature, and several recent studies have suggested various methods for selecting the factor number (e.g., Connor and Korajczyk, 1993, Forni et al., 2000; Bai and Ng, 2002; Bai, 2004; Stock and Watson, 2005; Hallin and Liska, 2005; Amengual and Watson, 2007; and Onatski, 2007). In this paper we focus on the impact of serial correlation on Bai and Ng’s (2002) information criteria method. In the approximate factor model (Bai and Ng, 2002), weak-form serial (and cross-sectional) dependence is permitted in the idiosyncratic component as long as N (cross section dimension) and T (time-series dimension) are large. This is because dependence due to the factor structure asymptotically dominates any weak dependence in the idiosyncratic component, and hence welldesigned criteria (e.g., Bai and Ng, 2002) can eventually determine the dimension of the factors as both N and T grow. However if the idiosyncratic component exhibits high serial dependence relative to the given sample size, or equivalently if the sample size is not large enough for the given degree of serial dependence, then the Bai-Ng factor number estimate may be different from the truth with considerable probability. To investigate this issue, we examine the Bai-Ng method in a local asymptotic setting, in which the autoregressive parameter lies in a neighborhood of unity shrinking to zero as T increases. As a prototype, we consider panels Xit that follow Xit = λ0i Ft + eit ,

eit = ρT eit−1 + εit ,

εit ∼ iid,

where ρT → 1 as T increases at a controlled rate.1 In a dedicated example (Example 1 in Section 2), we will show that the Bai-Ng criteria applied to Xit can overestimate the true factor-number For example, we can let ρT = 1 − cT −α for some c > 0. The unit-root case corresponds to α = ∞. Processes with α = 1 have gained special attention in the literature (e.g., Phillips, 1987; Elliot et al., 1996; Cavanagh et al., 1995; Moon and Phillips, 2000 and 2004; Phillips et al., 2001, among others). When α ∈ (0, 1), the process is said to be nearly stationary (Giraitis and Phillips, 2004, Phillips and Magdalinos, 2007) or weakly integrated (Park, 2003). 1

2

with probability approaching one if 1 − ρT = O(T −α ) for any α > 0. In essence, Bai-Ng criteria consistency requires weaker serial correlation than that implied by 1 − ρT = O(T −α ) for any α > 0. To overcome this problem, we provide a simple linear filtering method such that the Bai-Ng criteria applied to the transformed data produce consistent estimates for a wide-ranging degree of serial dependence in the idiosyncratic component. The filter may be nonrandom or data-dependent. An obvious nonrandom filter is first differencing, which is already widely used in practice. The first differenced data ∆Xit give a consistent estimate because ∆Xit would satisfy regularity in Bai and Ng (2002) in general. But as one may well guess, when using the first difference filter there is the risk of over-differencing and thus inducing negative serial correlation in the transformed data. This is particularly the case when eit is close to white noise. A more effective filter in this case is the pooled least squares dummy variable (LSDV) fitting. In spite of possible model misspecification and estimator bias, the pooled LSDV AR(1) filter is shown to yield a consistent factor number estimate for a wide range of processes with considerable serial dependence. We show that the filter should be common to all cross sections in order to preserve the factor-number in the transformed panel. Thus in the LSDV setting, pooling is crucial regardless of whether the ‘true’ parameters governing serial dependence in eit are heterogenous. By means of a Monte Carlo study, we show that these approaches work quite well in the finite sample. The rest of the paper is organized as follows. In section 2 we provide an intuitive explanation of why the Bai-Ng criteria applied to the original data may not work well in the finite sample. Section 3 provides general theoretical results for the consistency of the estimates based on both the simple AR(1) and first differencing filters. Section 4 discusses how to apply the method to dynamic factor models. Section 5 provides Monte Carlo simulation results and discusses some practically important issues. Section 6 contains concluding remarks. Mathematical proofs and lemmas are gathered in the appendix.

3

2

Inconsistency under Strong Serial Dependence

For the approximate factor model Xit = λ0i Ft + eit , Bai and Ng (2002) propose estimating the factor number r by minimizing P C(k) = VN T (k) + kg(N, T ),

N T 1 XX k 2 (Xit − λk0 VN T (k) = min i Ft ) k k {λi },{Ft } N T i=1 t=1

(1)

or some variant thereof, with respect to k ∈ [0, kmax ], where g(N, T ) is a penalty function and λki , Ftk ∈ Rk . We hereafter refer to the above as the PC criterion. The eigenvalue-eigenvector decomposition of the covariance matrix of either (Xi1 , . . . , XiT )0 or (X1t , . . . , XN t )0 is usually used as a solution to the above minimization problem. The VN T (k) term is the mean sum of squared residuals from a least squares projection of Xit onto the column space of the first k eigenvectors, and it decreases monotonically as more eigenvectors are allowed (equivalently, as k increases). The PC criterion balances VN T (k), decreasing in k, against the penalty function kg(N, T ), increasing in k, so that the criterion is minimized at the true factor number r asymptotically. To be more precise, according to Bai and Ng (2002), VN T (k) − VN T (r) 6→ 0 for k < r, thus (1) is not (asymptotically) minimized at a k < r if g(N, T ) → 0. On the other hand, if k > r, then VN T (k) − VN T (r) → 0 at a certain rate, CN2 T say. So if g(N, T ) diminishes to zero at a rate slower than CN2 T , then the penalty will eventually become dominant, and overfitting will be prohibited. Under the stationarity assumption (allowing only weak serial and cross-sectional dependence in eit ), Bai and Ng (2002) show that CN T = min{N, T }1/2 , thus g(N, T ) = CN−2T log CN2 T (and variants thereof) give consistent estimates. When eit shows considerable serial correlation but the sample size is not sufficiently large, the PC criterion may overfit. To see this, we consider a simple example with no common factors and weakly integrated idiosyncratic errors: Example 1. Let there be no common factors so r = 0, and eit = ρT eit−1 + εit , where εit ∼ iid N (0, 1) for simplicity. Assume that T (1 − ρT ) → ∞ (e.g., ρT = 1 − T −α with 0 < α < 1). 4

P 0 Let `ˆ1 , . . . , `ˆT be the eigenvalues of (N T )−1 N i=1 Xi Xi ordered from largest to smallest, where Xi = (Xi1 , . . . , XiT )0 . Then VN T (k) − VN T (r) = −

Pk

ˆ

j=r+1 `j .

In particular, when r = 0 we have VN T (1) − VN T (0) = −`ˆ1 , thus P C(1) − P C(0) = −`ˆ1 + g(N, T ). When N ≥ T and T (1 − ρT ) → ∞, we have T (1−ρT ) c(1 − ρTT ) with ρTT = [1 − (1 − ρT )]1/(1−ρT ) → 0, `ˆ1 ≥ T (1 − ρT )

(2)

for some uniform constant c > 0 with arbitrarily high probability (see the appendix for the proof). So when g(N, T ) = CN−2T log CN2 T with CN T = min{N, T }1/2 = T 1/2 (assume that N ≥ T ), we asymptotically have P C(1) − P C(0) ≤ −cT −1 (1 − ρT )−1 + T −1 log T ≤ T −1 log T − c(1 − ρT )−1 , which eventually becomes negative if ρT approaches to unity at a rate faster than log T (e.g., if ρT = 1 − T −α for any α > 0) as T → ∞. This means that P C(1) < P C(0) eventually, and hence the PC criterion over-estimates r. This inconsistency can easily be extended to all specific penalty functions considered by Bai and Ng (2002). The overfitting arises because when eit = ρT eit−1 + εit with ρT → 1, some of Bai and Ng’s (2002) regularity conditions are violated. In particular, the variance of eit is of order (1−ρ2T )−1 and autocorrelation cor(eit , eit−k ) = ρkT , both of which increase at the rate that ρT approaches unity (violating Assumption C2 in Bai and Ng, 2002). Note that scaling Xit by its inverse sample standard deviation does not solve the problem, because the standardized idiosyncratic error eit /sdi (eit ) still exhibits strong autocorrelation.

5

3

Consistent Filtering Procedures

The whitening filter has been used in many areas of econometrics. The basic idea is to remove most of the temporal dependence in the data (usually by an autoregression) in order to make the transformed data closer to white noise. (See Andrews and Monahan, 1992. for references to the use of whitening filters in the existing literature.) We employ an autoregressive filtering, and as such we must first focus on two preliminary specification issues: (i) whether to perform an individual or a pooled filtering, and (ii) AR lag order. To address the first issue, first consider the transformed data Zit = Xit −

Pp

j=1

φij Xit−j ,

Xit = λ0i Ft + eit ,

where the filter φij is permitted to be different for each i. Let r be the true factor number. Writing Zit as P P Zit = λ0i Ft − pj=1 φij Ft−j + eit − pj=1 φij eit−j , we see that if φij = φj (i.e., homogeneous for all i), then the common factors of Zit are Ft − Pp j=1 φj Ft−j and the dimension of factors is preserved under the transformation. (Note that we consider static factors here; dynamic factors are considered in the next section.) Without the hoP mogeneity restriction, the filtered common component λ0i (Ft − pj=1 φij Ft−j ) cannot generally be expressed as a factor structure with the same dimension as Ft , though we can write it in terms of 0 0 r(p + 1) common factors (Ft0 , Ft−1 , . . . , Ft−p )0 . Thus the filter must have coefficients common to

all i in order to preserve the factor number in the residuals in Zit . The second issue is the choice of the lag order p. Conveniently, an AR(1) fitting (p = 1) Zit = Xit − φXit−1 .

(3)

is sufficient for consistent factor number estimation for many common panel processes, as we show below. Of course other orders p can also be used but we do not see any particular advantage in 6

using more lags. Hence we focus only on AR(1) filtering throughout the paper. Note that ‘φ’ may not be a true AR(1) coefficient in the sense that Xit − φXit−1 may not be independently distributed over t for any given i. We now turn to deriving pooled AR(1) filters that lead to consistent factor number estimates. We consider two methods for choosing φ. The first is a nonrandom filter with φ = 1 (first difference); the second is a data-dependent filter that uses the least squares dummy variable estimator, ˆ lsdv . (More specifically, the LSDV estimator considered has an idiosyncratic constant for evφ ery cross section, thereby permitting the time-series mean of Xit to be different across i. We do not include dummies for time effects.) For this latter filter, we demonstrate below that consistent ˆ lsdv to be consistent for a “true”AR(1) coefficient. factor-number estimation does not require φ The rest of this section is organized as follows. We first derive the consistency conditions for the first differenced filter in section 3.1. We then consider other nonrandom filters in section 3.2 as an intermediate step toward deriving the conditions required for the consistency of the LSDV filtering method, which are addressed in the final subsection. Our consistency results are built on the main findings of Bai and Ng (2002). More precisely, we will show that properly transformed data satisfy their assumptions (Assumptions A–D of Bai and Ng). For completion we present these assumptions here as the definition of “regularity”:

Definition (BN-regularity). A common factor process {Ft }, idiosyncratic error processes {eit }, and factor loadings {λi } are said to be BN-regular if A. EkFt k4 < ∞ and both T −1

PT

t=1

Ft Ft0 and T −1

PT

t=1

E(Ft Ft0 ) converge to a nonrandom

positive definite matrix as T → ∞; ¯ < ∞ and N −1 PN λi λ0 converges to a nonsingular matrix as N → ∞; B. kλi k ≤ λ i i=1 C. There exists a positive constant M < ∞ such that for all N and T ,

7

1. E(eit ) = 0, E|eit |8 ≤ M ; 2. N −1

PN

Eeis eit = γ N (s, t), |γ N (s, s)| ≤ M for all s, and P P T −1 Ts=1 Tt=1 |γ N (s, t)| ≤ M for all s and t; i=1

3. E(eit ejt ) = τ ij,t with |τ ij,t | ≤ |τ ij | for some τ ij and for all t; in addition, P PN N −1 N i=1 j=1 |τ ij | ≤ M ; 4. E(eit ejs ) = τ ij,ts and (N T )−1

PN PN PT PT i=1

j=1

t=1

s=1

|τ ij,ts | ≤ M ;

P 4 5. for every (t, s), E|N −1/2 N i=1 (eis ejt − Eeis ejt )| ≤ M ; 

2  N T

X X 1

1

 D. E Ft eit  ≤ M .

√

N i=1 T t=1

3.1

Consistency of the First Differencing Filter

ˆ Let k(φ) be the factor number estimate obtained by applying a Bai-Ng criterion to the filtered data ˆ Xit − φXit−1 . In this subsection will establish a theorem for the consistency of k(1). To establish consistency we make the following assumption: Assumption 1 {λi }, {∆Ft } and {∆eit } are BN-regular. This assumption is generally satisfied if the common factors and the idiosyncratic errors are at most once integrated. When this assumption is violated, higher order autoregressive filtering may solve the problem, and extension to this case would be straightforward. ˆ Theorem 1 (FD filter) Under Assumption 1, k(1) →p r. Theorem 1 can be proven directly from Bai and Ng (2002). According to this theorem, applying Bai-Ng criteria to the differenced data produces a consistent factor number estimator. However, unless the idiosyncratic error is exactly a random walk,

8

first differencing does not eliminate the dependence in the transformed data. Indeed, if the idiosyncratic error is near white noise, first-differencing creates negative serial correlation in the filtered panel. As we illustrate in simulations later (Table 1), this over-differencing reduces the frequency of selecting correct number of factors in finite samples when the serial dependence in eit is weak and T is small. To avoid this problem we will consider filtering based on an LSDV fitting. However, as an intermediate step, we next consider nonrandom but parameter dependent filtering methods. This step is instructive because it allows us to discern how much bias and misspecification is permissible in a data-dependent procedure if the Bai-Ng criteria are to be consistent.

3.2

Nonrandom Filters

Let φT be a nonrandom number, possibly dependent on the time series dimension. We ask under what conditions this nonrandom filter can yield a factor number estimate kˆ (φT ) that is consistent. To investigate this issue, we start by rewriting the filtered data as Zit = (Xit − Xit−1 ) + (1 − φT ) Xit−1 = ∆Xit + (1 − φT ) Xit−1

(4)

For stationary and once integrated processes, we may assume the first term satisfies BN regularity (i.e. λi , ∆Ft , and ∆eit are BN-regular). Next we will outline the required regularity for the P PT PT 2 −1 0 second term in (4). Let σ 2e,T = (N T )−1 N t=1 Eeit and ΣF F,T = T t=1 E(Ft Ft ). Note i=1 that σ 2e,T may depend on N as well as T but the N subscript is omitted for notational brevity. (It does not depend on N if the random variables are iid across i.) Also σ 2e,T and ΣF F,T may −1 ∗ diverge as T → ∞. Further define e∗it = σ −2 e,T eit and Ft = ΣF F,T Ft . It is worth noting that

eit and Ft are divided by their variances rather than their standard deviations in the definition P PT PT −2 −1 ∗2 −1 ∗ ∗0 of e∗it and Ft∗ , so (N T )−1 N i=1 t=1 Eeit = σ e,T and T t=1 Ft Ft = ΣF F,T . The reason for this normalization is to ensure both that the variables e∗it and Ft∗ behave regularly when the 9

original processes eit and Ft are stationary, and that e∗it and Ft∗ are negligible when eit and Ft ∗ are integrated. Hence the downscaled variables (Ft−1 and e∗it−1 ) themselves do not endanger the

validity of the Bai-Ng method. Now Zit of (4) can be rewritten as ∗ Zit = λ0i [∆Ft + (1 − φT )ΣF F,T Ft−1 ] + [∆eit + (1 − φT )σ 2e,T e∗it ],

(5)

∗ so the common factors of the transformed series Zit are ∆Ft + (1 − φT )ΣF F,T Ft−1 and the id-

iosyncratic component is ∆eit + (1 − φT )σ 2e,T e∗it . If φT is chosen such that (1 − φT )ΣF F,T and (1−φT )σ 2e,T are bounded, then those new common factors and idiosyncratic components are likely to satisfy BN-regularity. For a rigorous treatment along this line, we make the following regularity assumption and present several remarks to show when Assumption 2 is usually satisfied. ∗ Assumption 2 For any constant b1 and b2 , {λi }, {∆Ft + b1 Ft−1 } and {∆eit + b2 e∗it−1 } are BN-

regular.

∗ Remark 1. If {Ft } itself is BN-regular, then ∆Ft + b1 Ft−1 would also be BN-regular for any P 0 given b1 as long as T −1 Tt=1 Ft Ft−1 follows a law of large numbers, which is usually the case.

Similarly, if {eit } is BN-regular itself, then ∆eit + b2 e∗it would also be BN-regular for any b2 . Remark 2. When Ft is highly serially correlated, Ft itself may violate Condition A of the BN∗ regularity. However, in this case as well, {∆Ft +b1 Ft−1 } does satisfy the condition for any constant P b1 . To see this, let Ft be a scalar. Let T −1 Tt=1 (∆Ft )2 follow a law of large numbers and Ft ∼ I(1) ∗ such that T −1/2 Ft follows an invariance principle. Let F˜t = ∆Ft + b1 Ft−1 . Then T T T T 1 X ˜2 1X 2b1 1X T b2 1 X 2 Ft = (∆Ft )2 + · ∆Ft Ft−1 + 2 1 · 2 F , T t=1 T t=1 ΣF F,T T t=1 ΣF F,T T t=1 t−1

where T −1

PT

t=1

∆Ft Ft−1 and T −2

PT

t=1

2 −1 Ft−1 are Op (1). But Σ−1 ), so F F,T = O(T

T T 1 X ˜2 1X F = (∆Ft )2 + Op (T −1 ), T t=1 t T t=1

10

thus condition A of the BN-regularity is satisfied by F˜t . Remark 3. The key condition that would be violated by eit when it is strongly serially correlated is Condition C2 of the BN-regularity. But ∆eit + b2 e∗it still satisfies this condition for any constant b2 when eit is quite persistent as well. To see this, let eit be iid across i. Let e˜it = ∆eit +b2 e∗it−1 , where PT 2 −1 2 e∗it−1 = σ −2 e,T eit−1 and σ e,T = T t=1 Eeit . We want to see if condition C2 of the BN-regularity is satisfied, i.e., T P N T P T 1 P T 1 P 1 P ∗ ∗ Eeit eis = |Ee∗it e∗is | ≤ M < ∞. T t=1 s=1 N i=1 T t=1 s=1 Suppose that eit =

P∞

This is bounded if

P∞

j=0 cT j εit−j

where εit are iid and cT j may depend on T so local asymptotic P 2 analysis can be included. Then Ee2it = ∞ j=0 cT j , and P P∞ P∞ 2 T P T 2 ∞ 1 P k=1 j=0 |cT j cT,j+k | 0 cj ∗ ∗ P∞ |Eeit eis | ≤ P∞ 2 2 + T t=1 s=1 ( 0 cT j ) ( 0 c2T j )2 " # P P∞ 2 j=0 |cT j | ∞ 1 k=1 |cT,j+k | P∞ 2 ≤ P∞ 2 1 + . 0 cj 0 cT j 0

|cT j | and

P∞ 0

c2T j are of the same order (as T increases), which is so for

general square-summable processes. (Nonsummable processes cannot be handled by this arguP∞ 2 P 2 ment.) More importantly, this converges to zero if ∞ j=0 cT j is 0 cT j → ∞ as T → ∞ (though P ∗ ∗ finite for all T ) and if ∞ k=1 |cT,j+k | ≤ M supk≥0 |cT,j+k | for some M < ∞, which happens if, for example, eit is weakly-integrated. Once the regularity of e∗it is shown, it is straightforward to show that ∆eit + b2 e∗it satisfies condition C2 of the BN-regularity. Remark 4. If eit = ei0 + σ 2e,T

Pt

εis (i.e., integrated), where εit are iid and independent of ei0 , then P P = O(σ 20,T ) + O(T 2 ) (and it is the exact order) and T −1 Tt=1 Ts=1 |Eeit eis | is O(T σ 20,T ) + s=1

O(T 2 ) where σ 20,T = Ee2i0 . So T T 1 XX |Ee∗it e∗is | = O T t=1 s=1

T σ 20,T + T 2 [σ 20,T + T 2 ]2

! =O

σ 20,T /T + 1 [σ 20,T /T + T ]2

! → 0,

thus validating condition C2 of the BN-regularity for ∆eit + b2 e∗it−1 for any constant b2 . 11

Remark 5. Assumption 2 can be satisfied if the persistency is different across i as well. For example, suppose that eit = ρi eit−1 +εit , where ρi = 1−ci /T α with 0 < c ≤ ci ≤ c¯ < ∞ and εit ∼ −1 iid (0, σ 2ε ). Also suppose that ci is iid. Then σ 2e,T = σ 2ε E[(1 − ρ2i )−1 ] = σ 2ε T α E[c−1 i (1 + ρi ) ].

When T is large so ρi > 0 for all i, we furthermore have ∞ T t X 1 XX σ 2ε −1 = σ 2ε T 2α E[c−2 |Eeit eit−r | = E |Eeit eis | ≤ i (1 + ρi ) ]. 2 T t=1 s=1 (1 − ρ )(1 − ρ ) i i r=0 Therefore, T t −1 1 XX 4E(c−2 4c−2 E[c−2 i ) i (1 + ρi ) ] = 4(¯ c/c)2 < ∞. ≤ ≤ |Ee∗it e∗is | ≤ −1 −1 2 −2 −1 2 T t=1 s=1 c ¯ E[ci (1 + ρi ) ] (Eci )

and condition C2 of the BN-regularity is satisfied. Given Assumption 2, the following is true. Theorem 2 (Nonrandom filtering) Under Assumption 2, if (i) (1−φT )ΣF F,T converges to a finite ˆ T ) →p r. limit, and (ii) (1 − φT )σ 2e,T = O(1), then k(φ An intuitive explanation of the result is as follows. According to (5), the common factor of ∗ Xit −φT Xit−1 is ∆Ft +(1−φT )ΣF F,T ·Ft−1 and the idiosyncratic error is ∆eit +(1−φT )σ 2e,T ·e∗it−1 . ∗ Assumption 2 states that ∆Ft + b1 Ft−1 and ∆eit + b2 e∗it−1 are BN-regular for any constant b1 and

b2 . Conditions (i) and (ii) in the theorem impose the necessary restrictions on φT such that the terms corresponding to b1 and b2 behave as required under Assumption 2 in the limit as T → ∞. If eit is integrated, then σ 2e,T increases at an O(T ) rate. So if φT = 1 − cT −1 , for example, then conditions (i) and (ii) of the theorem are satisfied, and the Bai-Ng method applied to Xit − φT Xit−1 will yield a consistent factor number estimate. Though this criterion is not practically useful (because in practice T is given, and the performance would depend on the constant c), it informs us of how much bias is allowed when we consider a data-dependent filtering procedure. This is the topic we investigate next. 12

3.3

Consistency of the LSDV Filter

In this subsection we consider a specific data-dependent filter, namely the LSDV filter. AR(1) LSDV prewhitening involves the following issues. Firstly, the AR(1) model may be misspecified in the sense that there is no φ such that Xit − φXit−1 is independent over time; secondly, the LSDV estimator is biased (toward zero) even when Xit follows an AR(1); and lastly, the LSDV estimator is random. The problems of misspecification and bias were discussed in the previous subsection in a general context. In this subsection, we show that the center of the LSDV estimator satisfies the conditions given in section 3.2. We then proceed to addressing the issue of randomness in ˆ lsdv . By this method we will verify that the LSDV estimator φ ˆ lsdv satisfies all the LSDV filter φ ˆ lsdv Xit−1 to satisfy BNthe required conditions in order for the prewhitened data Zˆit := Xit − φ regularity. Let us rewrite the filtered data as ˆ lsdv )Xit−1 = ∆Xit + (1 − φT )Xit−1 − (φ ˆ lsdv − φT )Xit−1 , Zˆit = ∆Xit + (1 − φ

(6)

ˆ lsdv . where φT can be understood as the center of φ Comparing (6) to (4), we see that in the data dependent framework the filtered data involves ˆ lsdv − φT )Xit−1 . Hence an additional term compared to the nonrandom framework, namely (φ in order for the Bai-Ng criteria to be consistent when applied to Zˆit , we require not only that ˆ lsdv the center φT satisfies conditions (i) and (ii) of Theorem 2, but also that the variability of φ ˆ lsdv − φT )σ 2 · σ −2 Xit−1 , where around φT is limited. The third term can be written as (φ X,T X,T P PT 2 σ 2X,T = (N T )−1 N i=1 t=1 EXit (with the N subscript again suppressed for notational brevity). 2 ˆ Because σ −2 X,T Xit−1 is bounded, we can imagine that if (φlsdv − φT )σ X,T = op (1), then this third

term has negligible impact on the behavior of Zˆit (See Theorem 3 and Remark 7.) As discussed above the filter is employed to reduce the serial dependence in eit . However, in order for this method to work any strong serial correlation in Ft and eit should be explained by 13

an AR(1) structure. This is satisfied by a wide class of processes including integrated or nearly integrated processes. But we also note that this is not always possible, in particular if the process is I(2). Formally, we make the following assumption. Assumption 3 The common factors Ft and idiosyncratic errors eit satisfy T N T 1 XX 1X 0 EFt−1 ∆Ft = O(1) and Eeit−1 ∆eit = O(1). T t=1 N T i=1 t=1

Remark 6. Assumption 3 is satisfied by general stationary and integrated (of order one) processes. P For example, if eit is integrated, then for each i, the limit of T −1 Tt=1 eit−1 ∆eit can be expressed as a stochastic integral, and Assumption 3 is satisfied. If eit = ρT eit−1 + εit for some absolutely P∞ 2j P j 2 summable linear stationary process εit , then Eeit−1 εit = ∞ j=0 ρT γ j , and Eeit−1 = j=0 ρT γ 0 + P P P∞ 2j+k 2 ∞ γ k , where γ k = Eεit εit−k . Because ∞ k=1 j=0 ρT k=1 |γ k | < ∞ (absolute summability), P P∞ γ0 2 2 |Eeit−1 εit | ≤ ∞ j=0 |γ j | < ∞ and (1−ρT )Eeit−1 ≤ 1+ρ + 1+ρ k=1 |γ k | = O(1). The common T

T

factors Ft are similarly treated. Note that Assumption 3 is not satisfied if eit or Ft is more than once integrated. We have the following result. Theorem 3 (LSDV filtering) Under Assumptions 2 and 3, if σ 2e,T = o(T ) and ΣF F,T = o(T ), ˆ φ ˆ lsdv ) →p r. then k( The theorem states that the Bai-Ng factor number estimator based on the AR(1) LSDV-filtered data ˆ lsdv Xit−1 is consistent under suitable regularity. Unlike Assumptions 2 and 3, the regularity Xit − φ that σ 2e,T = o(T ) and ΣF F,T = o(T ) is binding. See the following remark. Remark 7. Suppose that eit = ρT eit−1 + εit where εit is stationary. Let γ k = Eεit εit−k and further P P∞ P∞ assume εit = ∞ j=0 cj wit with wit being iid and 0 |ck | < ∞. Then we have k=0 |γ k | < ∞. If T (1 − ρT ) → ∞, then σ 2e,T = o(T ). Note that Ee2it = O((1 − ρT )−1 ), so σ 2e,T = O((1 − ρT )−1 ) = 14

o(T ) if T (1 − ρT ) → ∞. If 1 − ρT = O(T −α ), then this condition holds if α < 1. The process for the common factor Ft is treated similarly. The AR(1) LSDV based method consistently estimates the factor number if the process is reasonably far from unit root (e.g., 1 − ρT = O(T −α ) with α < 1), while the first differencing works well when the process is closer to unit-root (e.g. 1 − ρT = O(T −1 )) or even integrated. On the other hand, the performance of the LSDV prewhitening performs better than first differencing if the process does not exhibit too strong serial correlation. Since the strength of the dependence is unknown in practice, a simple strategy for combining these two methods is discussed in Section 5. According to simulations, the small sample performance of this combined method is better than first differencing and LSDV separately. Other φ values may possibly satisfy Assumptions 1 and 2, but the validity of each should be checked before they are employed.

4

Filtering Procedure for Dynamic Factor Models

The proposed filtering procedures can also be used to estimate the number of dynamic factors underlying a panel. Following Amengual and Watson (2007), we begin with the vector autoregressive P dynamics for the factors, Ft = pj=1 Πj Ft−j + Gη t , where η t is q × 1 and G is r × q with full column rank (see Amengual and Watson, 2007, for details). When Xit = λ0i Ft + eit , the estimation of the number q of dynamic factor shocks η t requires first step consistent estimation of the number r of static factors Ft (see Amengual and Watson, 2007, or Bai and Ng, 2007, for more details on the second step in estimating q). Filtering the data using the methods suggested above preserves the dimensions of both the static and dynamic factors because Ft − φFt−1 =

p X

Πj (Ft−j − φFt−j−1 ) + G(η t − φη t−1 ),

j=1

15

Here the transformed static factors Ft −φFt−1 are still r ×1, while the transformed dynamic factors η t − φη t−1 are still q × 1. When the autoregressive coefficients Πj are such that Ft is nearly integrated, the Bai-Ng criteria applied to the raw Xit will inconsistently estimate the factor number as we have seen in Section 2. In that case, the static factors can be consistently estimated using the prewhitening method explained in the previous section. That is, applying the Bai-Ng criteria to the filtered data will consistently estimate the number of transformed static factors Ft − φFt−1 required in the first step.

5

Monte Carlo Studies

As discussed at the end of Section 3, the first differencing filter works well when the idiosyncratic process is close to a unit root, while the AR(1) LSDV filter works well with less persistency. Of course in practice the persistence is unobservable, particularly when the sample size is small. Hence a simple way to enhance the small sample performance is to choose the minimum factor number estimate from the first differencing and the LSDV filtering, i.e., n o ˆ ˆ φ ˆ lsdv ) . kˆmin = min k(1), k(

(7)

This ‘minimum rule’ is justified by the fact that serial correlation usually causes over-estimation rather than under-estimation of the factor number. (One may concern that the minimum rule underestimates the factor number, but our Monte Carlo study shows that the minimum rule almost never under-estimates the factor number if N and T are moderately large.) We conduct simulations to illustrate our results. We consider various data generating processes but report only the AR(1) with two (dynamic) factors here to save space.2 We consider data gener2

More detailed results for Monte Carlo simulations including AR(2) and r = 1 cases can be found at http://greenawaymcgrevy.com/research/MC1.xls

16

ating processes nested by Xit = λ1i F1t + λ2i F2t + δ 1i F1t−1 + δ 2i F2t−1 + eit , Fst = θFst−1 + vst for s = 1, 2; eit = ρi eit−1 + εit , εit =

PJ

j=−J,j6=0

(8)

βui−j,t + uit for J = int(N 1/3 )

The DGP permits a maximum of four static and two dynamic factors. Note that eit can exhibit both weak cross-sectional and time-series dependence. Indeed, the DGP for eit mimics that employed in the Monte-Carlo study of Bai and Ng (2002).3 However, in contrast to Bai and Ng (2002), we permit the idiosyncratic component to exhibit heterogeneity in the serial dependence structure through ρi . In many applications the Bai-Ng criteria are applied to panels that exhibit considerable heterogeneity in the persistence of each cross section. For example, Stock and Watson (2002) use a panel of 215 series that span various sectors of the economy such as output, employment, monetary aggregates, prices, etc. In such datasets there is likely to be considerable heterogeneity in the persistence of each idiosyncratic component (even after transformations to ensure stationarity). Throughout all simulations, vst are drawn from iid N (0, 1) for s = 1, 2 and all t, and uit are drawn from iid N (0, 1) for all i and t. The factor loadings λsi are drawn from iid N (1, 1) for s = 1, 2, and, where applicable, the factor loadings on the lagged factors δ si are drawn from iid N (1, 1). We report simulations with θ = 0.5.4 The maximum number of common factors tested for is set to 7 in all simulations. We compare the conventional Bai-Ng ICp2 estimator using the original data, the first-differencing filter estimator, the LSDV filter estimator, and the minimum rule estimator. For each method we report the frequency of correct factor number selections from the 1000 replications; hence a selection frequency equal to one indicates that the criterion selected the correct factor number in every replication. We begin with a static factor DGP: 3

Bai and Ng (2002) set J = max{10, N/20} in their table VII. However, since we are focusing on small N we us a different J. Also, the Bai-Ng simulations do not consider weak cross section and time series dependence simulataneously, as we do here. 4 For other values of θ, see www.greenawaymcgrevy.com/reasearch/MC1.xls

17

DGP I (static factors, cross sectional independence, and homogenous AR(1)): Xit follows (8) with β = 0, δ si = 0, ρi = ρ for all i and s = 1, 2. In this case there are two factors. We consider a range of ρ = (0, 0.1, 0.3, 0.5, 0.7, 0.9). Figure 1 gives highlights of our simulations with ρ = 0.7 and N, T = (25, 50, 100, 200). For the unfiltered case (Panel A in Figure 1), the Bai-Ng criterion works rather poorly and usually overestimates the number of common factors with small N and T. Nonetheless, as N and T increase the selection frequency converges to one. Meanwhile for the AR(1) prewhitening case (Panel B in Figure 1), the selection frequency exceeds 0.95 even with small T = N = 25. For first difference filter and the minimum rule (not shown), the correction probability is almost identical to the AR(1) LSDV filtering for the moderate degree of correlation ρ = 0.7 . For the remaining simulations we set T, N = (20, 30, 40, 50) in order to investigate the finite sample performance. Table 1 presents the selection frequencies. Evidently, the ICp2 criterion applied to the unfiltered data rarely selects the true number of common factors for high serial correlation (ρ ≥ 0.7) even when T and N are as large as 50 (the selection frequency does not exceed 0.12). In contrast, the prewhitened ICp2 criterion improves the finite sample performance dramatically. When the LSDV fitting is used for the prewhitening, the selection frequency is 0.80 or higher when T is as low as 20 as long as ρ ≤ 0.7. The first differencing prewhitening method has a selection frequency exceeding 0.80 as long as ρ ≥ 0.3. Naturally, the minimum rule provides the best results for all ρ. As long as N, T ≥ 30, the minimum rule almost always chooses the correct factor-number regardless of the value of ρ. Next we consider the dynamic factor model: DGP II (dynamic factors, cross sectional independence, and homogenous AR(1)): Xit follows (8) with β = 0 and ρi = ρ for all i. In this case there are four static and two dynamic factors. After preliminary estimation of the 18

static factor number, we use the Amengual-Watson method to estimate the number of dynamic factors. Table 2 presents the frequencies of the correct selection of the dynamic factor number. The results are similar to those in Table 1. The IC2 criterion applied to the unfiltered data exhibits a selection frequency less than 0.40 when serial correlation is high (ρ ≥ 0.7) even when N = T = 50. Note that the minimum rule selects the correct factor number almost always as long as N, T ≥ 30. To consider the case when eit is weakly cross section dependent, we have: DGP III (cross section dependence and homogenous AR(1)): Xit follows (8) with δ si = 0, ρi = ρ In this case we set β = 0.2 so that eit is correlated with ekt for all k ≤ |i − J|. As shown in Table 3, the performance of all estimators is worse relative to DGP I, indicating that the weak dependence between cross sections in eit also reduces accuracy. Nonetheless, the AR(1) filtering and minimum rule methods out-perform the unfiltered method for all N and T provided ρ ≥ 0.3. As above, the minimum rule performs best across all considered ρ among the different filter-based methods considered. To consider heterogenous serial correlation in eit , we have: DGP IV (heterogenous AR coefficients): Xit follows (8) with β = 0, δ si = 0, and CASE (a): ρi ∼ iid U (−0.15, 0.15) for i = 1, ..., N/2, ρi ∼ iid U (0.65, 0.95) for i = N/2 + 1, ..., N , CASE (b): ρi ∼ iid U (0, 0.95). In case (a), half the cross sections of eit are close to white noise, while the other half of eit are highly serially dependent but stationary.5 Case (b) creates uniform variation in ρi . As discussed in Section 3, a common filter must be applied even if ρi are heterogenous to preserve the true 5

We thank an anonymous referee for suggesting this DGP.

19

factor number in the filtered data. As demonstrated in Table 4 in the appendix, the LSDV, firstdifference and minimum rule filtering methods all continue to out-perform the unfiltered method by a substantial margin. Note the Bai-Ng criteria applied to the original data appear to perform worse as N and T increase, but this appears to be a small sample phenomenon. (In additional simulations performed by the authors, the IC2 selection frequency is about 0.96 when N = T = 100.) For the considered data generating processes, the filtering methods are generally more accurate in estimating the true number of static factors than the Bai-Ng method applied to the unfiltered data. Among them the minimum rule shows the best small-sample performance.

6

Empirical Example

In this section we demonstrate the importance of our prewhitening procedures in practice. In the US the headline CPI-U inflation rate published by the Bureau of Labor Statistics is calculated by taking the average inflation rate over 27 metropolitan areas. This measure of inflation may not reflect changes in the price level for all cities within the US, and hence practitioners may wish to consider a second method of measuring the ‘representative’ inflation rate. One such measure could be to construct a handful of common factors underlying the panel of inflation rates. Of course as a first step the factor-number must be determined. We use annual CPI-U data spanning 1984–2007 for 23 metropolitan areas (in our application Washington-Baltimore, Tampa, Miami and Phoenix are omitted because sampling only begins in 1997, 1998, 1978 and 2002, respectively). This dataset has been used by, among others, Cecchetti et al. (2002) and Phillips and Sul (2007) to test for convergence in regional prices. We log and first-difference the indices to obtain the panel of inflation rates. Figure 2 shows the median, max and min inflation rates across 23 cities over time. Evidently, all series look stationary, and in fact the estimated AR(1) coefficient is far less than unity. The Bai-Ng IC2 criterion detects five 20

common factors. In contrast, when applied to either the AR(1) filtered or first differenced data, the IC2 criterion selects only one common factor. This is coherent with the small sample performance exhibited by the Monte Carlo study.

7

Conclusion

Factor models are increasingly being used in empirical econometrics, and are often employed to summarize comovements in a glut of data using a handful of estimated factors. An integral part of estimating the factors is estimating the dimension of the factor space, i.e., the number of common factors underlying the panel. Existing factor selection criteria require large N and T for consistency, and may be inaccurate when one or both of the dimensions of the panel is moderate to small. Using a local alternative approach we analyze the impact of serial correlation on the popular Bai and Ng factor-number selection criteria. We demonstrate that even a moderate degree of serial correlation in the idiosyncratic errors (relative to the given sample size) can cause the Bai-Ng criteria to overestimate the true number of factors. To overcome this problem, we suggest filtering the panel prior to applying Bai and Ng’s method. We theoretically analyze how the filtering method can work for general processes with serial correlation and verify the applicability of the method by simulations.

21

A

Mathematical Proofs

ˆ ee = N −1 PN ei e0i . Let A be a matrix such that AA0 = Σee . Proof of (2). Let Σee = Eei e0i and Σ i=1 ˆ ee = AΣ ˆ ζζ A0 , where Σ ˆ ζζ = N −1 PN ζ i ζ 0i , and Let ζ i = A−1 ei ∼ N (0, IT ). Then Σ i=1 ˆ ee ξ ˆ ζζ η ξ 0 Σee ξ ξ0Σ η0Σ · = , η0η ξ0ξ ξ0ξ

η = A0 ξ

for any conformable ξ satisfying ξ 0 ξ 6= 0. So max ξ

ˆ ee ξ ˆ ζζ η ξ0Σ η0Σ ξ 0 Σee ξ · min ≥ max . η η η0η ξ0ξ ξ0ξ

(9)

(To see this, let h = f g, f ≥ 0 and min g > 0. Then f = h/g, so max f ≤ max h/ min g, i.e., max h ≥ max f min g.) By Yin, Bai and Krishnaiah (1988), the first term converges to p (1 + lim T /N )2 , and by the Perron-Frobenius theorem, the second term is no less than the smallest row sum of Σee , which is of order T α for large T , when ρT ≥ 0, which is eventually so. Thus the left hand side of (9) is of order T α . The result is obtained by dividing it by T . Proof of Theorem 1. See Theorem 2 of Bai and Ng (2002). Proof of Theorem 2.

Let Zit = Xit − φT Xit−1 as before. Let b1,T = (1 − φT )ΣF F,T and

∗ ) + (∆eit + b2,T e∗it ). Assumption 2 b2,T = (1 − φT )σ 2e,T . By (5), we have Zit = λ0i (∆Ft + b1,T Ft−1 ∗ and condition (i) of the theorem imply that {∆Ft + b1,T Ft−1 } is BN-regular, and Assumption 2 and

condition (ii) of the theorem imply that {∆eit + b2,T e∗it−1 } is also BN-regular. The result follows from Bai and Ng (2002, Theorem 2) again. Before proving Theorem 3, we present a slightly more general result on data-dependent preˆ be a random variable and φT a nonrandom quantity. The result to be presented whitening. Let φ ˆ and φT are sufficiently close, then filtering based on φ ˆ and filtering based on below states that if φ ˆ it−1 . For panel data Xit , let VN T (k; Xit ) = φT give the same probability limit. Let Zˆit = Xit − φX P −1 0 0 0 0 minF ∈RT ×k (N T )−1 N F . Let i=1 Xi MF Xi , where Xi = (Xi1 , . . . , XiT ) and MF = F (F F ) hN T (k; Xit ) = VN T (k; Xit ) − VN T (r; Xit ). 22

ˆ − φT )σ 2 →p 0, then k( ˆ φ) ˆ →p r. Lemma A.1 Under the assumptions for Theorem 2, if (φ X,T ˆ − φT )σ 2 for notational brevity, such that a Proof. Let a ˆ = (φ ˆ →p 0 under the supposition of the X ˆ ˆ lemma. Also let h(k) = hN T (k; Zˆit ) and h(k) = hN T (k; Zit ). The goal is to show that (i) h(k) ˆ does not shrink to zero for k < r, and (ii) h(k) = O(CN−2T ) for k > r. (See Bai and Ng’s, 2002, ∗ proof of Theorem 2.) Note that Zˆit = Zit + a ˆXit−1 , where Xit∗ := Xit /σ 2X,T .

(i) When k < r, h(k) does not shrink to zero by Assumption 2, so it suffices to show that ˆ ˆ h(k) − h(k) →p 0. But h(k) − h(k) = ˆξ r − ˆξ k , where ˆξ j = max

F ∈RT ×j

N N 1 X 0 1 X ˆ0 ˆ Z PF Zi − max Z i PF Z i . N T i=1 i F ∈RT ×j N T i=1

So the proof can be done by showing that ˆξ j →p 0 for every j ≤ r. This part is easy: Because | max f − max g| ≤ max |f − g|, we have N 1 X |ˆξ j | ≤ max (Zˆi0 PF Zˆi − Zi0 PF Zi ) →p 0, F ∈RT ×j N T i=1 where we used the fact that a ˆ →p 0 and all the averages are stochastically bounded. (ii) For the case with k > r, write Zˆit as ˆ t−1 ) + (eit − φT eit−1 ) − a Zˆit = λ0i (Ft − φF ˆe∗it−1 . ∗ ˆ t−1 can be written as (Ft − φT Ft−1 ) − a The common factors Ft − φF ˆFt−1 , which satisfies As-

sumption A of Bai and Ng (2002) because Ft − φT Ft−1 satisfies it and a ˆ = op (1). Next, both uit := eit − φT eit−1 and e∗it−1 satisfy the assumptions of Bai and Ng (2002), where the idiosyncratic error of Zˆit is uit − a ˆe∗it−1 . Then Theorem 1 of Bai and Ng (2002) still holds with this idiosyncratic error, but some part of the proof of Bai and Ng’s Lemma 4 should be redone. More precisely, we need to show that N T 1 XX max (ui − a ˆe∗i,−1 )0 MF (ui − a ˆe∗i,−1 ) = Op (CN−2T ), F ∈RT ×k N T i=1 t=1

23

which corresponds to (1) of Bai and Ng (2006). But this holds because both uit and and e∗it−1 satisfy the assumptions of Bai and Ng (2002, 2006) and a ˆ →p 0. ˆ lsdv obtained by regressing Xit on Xit−1 satisfies the Now we prove that the LSDV estimator φ assumptions for Theorem 2 and Lemma A.1 under suitable assumptions. Let ˆ0 = Γ

N X T X σ −2 X,T ˜2 , X N T i=1 t=1 it−1

ˆ1 = Γ

N X T X σ −2 X,T ˜ it−1 X ˜ it , X N T i=1 t=1

ˆ 0 is nonsingular. where the ‘˜’ notation stands for the within-group transformation. Note that E Γ −1 ˆ lsdv = Γ ˆ −1 ˆ ˆ The AR(1) LSDV estimator is φ 0 Γ1 . Let φlsdv = Γ0,T Γ1,T , where Γj,T = E Γj . We will

ˆ lsdv satisfy the conditions for Theorem 2 and Lemma A.1 under regularity. show that φlsdv and φ Lemma A.2 If T −1 σ 2X,T = O(1), then under Assumption 3, (1 − φlsdv )σ 2X,T = O(1). −1 (Γ0,T − Γ1,T )σ 2X,T . Because Γ−1 Proof. We have (1 − φlsdv )σ 2X = Γ0,T 0,T is finite, it suffices to show

that (Γ0,T − Γ1,T )σ 2X,T = O(1), i.e., N T 1 XX ˜ ˜ it = O(1). E Xit−1 ∆X N T i=1 t=1

We use T T T T 1X 1 XX 1X ˜ ˜ E Xit−1 ∆Xit = EXit−1 ∆Xit − 2 EXit−1 ∆Xis . T t=1 T t=1 T t=1 s=1

This is bounded by Assumption 3. Let Xit∗ = Xit /σ 2X,T as before. P 2 2 2 ≤ M σ4 ) Lemma A.3 Suppose that (i) var(Xit−1 ) ≤ M σ 4X,T , and (ii) ∞ cov(X , X it X,T it+k k=1 ˆ lsdv − φlsdv )σ 2 = op (1). for all i and t for some M < ∞. If T −1 σ 2X,T = o(1), then (φ X,T ˆ=φ ˆ lsdv and φ = φlsdv for notational simplicity. We have φ ˆ−φ = Γ ˆ −1 ˆ ˆ Proof. Let φ 0 (Γ1 − φΓ0 ). ˆ −1 ˆ ˆ 2 Note that Γ1 − φΓ0 = 0. Because Γ 0 is Op (1), we shall show that (Γ1 − φΓ0 )σ X = op (1), i.e., N T 1 XX ¯ i,−1 )(Uit − U¯i ) = op (1), (Xit−1 − X N T i=1 t=1

24

Uit = Xit − φXit−1 .

(10)

∗ Because Uit = ∆Xit + aXit−1 , where a = (1 − φ)σ 2X = O(1) by Lemma A.2, (10) can be proved

by showing that N 1 X Yai = op (1), Ya := N i=1

T 1X Yai = (Xit−1 ∆Xit − EXit−1 ∆Xit ), T t=1

N a X Ybi = op (1), Yb := N i=1

T 1X ∗ ∗ Ybi = (Xit−1 Xit−1 − EXit−1 Xit−1 ), T t=1

N T 1 XX Yc := Ycit = op (1), N T i=1 t=1

Yd :=

N T a XX Ydit = op (1), N T i=1 t=1

Ycit

T 1X = (Xit−1 ∆Xis − EXit−1 ∆Xis ), T s=1

Ydit =

T 1X ∗ ∗ (Xit−1 Xis−1 − EXit−1 Xis−1 ). T s=1

Because Ya , Yb , Yc and Yd are averages over i, we will show that Yji = op (1) for all j = a, b, c, d, where the convergence holds uniformly over all i. Furthermore, because EYji = 0 for j = a, b, c, d, we will show that EYji2 → 0 for j = a, b, c, d, where the convergence and boundedness are uniform in i. For Yai , we have Yai = T −1 [Xit−1 (XiT − Xi0 ) − EXit−1 (XiT − Xi0 )] , so 2 EYai2 ≤ T −2 var(Xit−1 ) ≤ M (T −1 σ 2X )2 → 0.

Next EYbi2

T T −1 T 2 X X 1 X ∗ ∗ ∗ var(Xit−1 Xit−1 ) + 2 = 2 cov(Xit−1 Xit−1 , Xis−1 Xis−1 ). T t=1 T t=1 s=t+1

−1 But var(Xit Xit∗ ) = var(Xit2 )/σ 4X = O(σ −2 ), and the second X ) = O(1), so the first term is O(T

term is also O(T −1 ) by (iii). Next, Ycit = T −1 [Xit−1 (XiT − Xi0 ) − EXit−1 (XiT − Xi0 )], and the proof is similar to that for Yai . Finally, Ydit is handled similar to Ybi . Note that the convergences are uniform in i and t. Proof of Theorem 3. The first differenced process ∆Xit clearly gives a consistent estimate. For ˆ lsdv Xit−1 , we note that the assumptions that T −1 σ 2 = o(1) and T −1 ΣF,T = o(1) imply Xit − φ e,T that T −1 σ 2X,T = o(1). Then it is straightforward to see that conditions for Lemma A.2 and A.3 are satisfied under the regularity Assumptions 1–3. The result follows from Lemmas A.2 and A.3. 25

References Amengual, D. and M. W. Watson, 2007, Consistent estimation of the number of dynamic factors in a large N and T panel, Journal of Business and Economic Statistics 25, 91–96. Andrews D. W. K. and J. C. Monahan, 1992, An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator, Econometrica 60, 953–966. Bai, J., 2004, Estimating cross section common stochastic trends in non-stationary panel data, Journal of Econometrics 122, 137–183. Bai, J. and S. Ng, 2002, Determining the number of factors in approximate factor models, Econometrica 70, 191–221. Bai, J. and S. Ng, 2006, Determining the Number of Factors in Approximate Factor Models, Errata, available at http://www.columbia.edu/˜sn2294/papers/correctionEcta2.pdf. Bai, J. and S. Ng, 2007, Determining the number of primitive shocks in factor models, Journal of Business and Economic Statistics 26, 52–60. Cavanagh, C. L., G. Elliot and J. H. Stock, 1995, Inference in models with nearly integrated regressors, Econometric Theory 11, 1131–1147. Cecchetti, S.G., Mark, N. C. and R. Sonora, 2002, Price index convergence in United States cities, International Economic Review 43, 1081–1099. Connor, G. and R. A. Korajczyk, 1993, A test for the number of factors in approximate factor models, Journal of Finance 48, 1263–1291. Elliot, G., T. J. Rothenberg and J. H. Stock, 1996, Efficient tests for an autoregressive unit root, Econometrica 64, 813–36. Forni, M., M. Hallin, M. Lippi and L. Reichlin, 2000, The generalized dynamic-factor model: Identification and estimation, Review of Economics and Statistics 82, 540–554. Giraitis, L. and P. C. B. Phillips, 2004, Uniform limit theory for stationary autoregression, Cowles Foundation Discussion Paper No. 1475. Hallin, M. and R. Liska, 2007, Determining the number of factors in the generalized dynamic factor model, Journal of the American Statistical Association 102, 603–617. Moon, R. H. and P.C.B. Phillips, 2000, Estimation of autoregressive roots near unity using panel data, Econometric Theory 16, 927–88. Moon, R. H. and P.C.B. Phillips, 2004, GMM estimation of autoregressive roots near unity with panel data, Econometrica 72, 467–522. Onatski, A., 2007, Determining the number of factors from empirical distribution of eigenvalues, Working paper, Columbia University.

26

Park, J. Y., 2003, Weak unit roots, mimeo, Rice University. Phillips, P. C. B., 1987, Towards a unified asymptotic theory for autoregression, Biometrika 74, 535–47 Phillips, P. C. B. and T. Magdalinos, 2007, Limit theory for moderate deviations from a unit root, Journal of Econometrics 136, 115–130. Phillips, P. C. B., H. R. Moon and Z. Xiao, 2001, How to estimate autoregressive roots near unity, Econometric Theory 17, 29–69. Phillips, P. C. B. and D. Sul, 2007, Transition modelling and econometric convergence tests, Econometrica 75. 1771-1855. Stock, J. H. and M. W. Watson, 2002, Macroeconomic forecasting using diffusion indices, Journal of Business and Economic Statistics 20, 147–62. Stock, J. H. and M. W. Watson, 2005, Implications of dynamic factor models for VAR analysis, NBER Working Paper No. W11467. Yin, Y. Q., Z. D. Bai and P. R. Krishnaiah, 1988, On the limit of the largest eigenvalue of large dimensional sample covariance matrix, Probability Theory and Related Fields 78, 509–521.

27

Figure 1: Performance of Bai and Ng’s Criteria ICp2 (ρ = 0.7, θ = 0.5 in DGP I)

100

Panel B: Prewhitening

60

80

N,T=25 N,T=50 N,T=100 N,T=200

0

20

40

Frequency (%)

0

20

40

60

80

N,T=25 N,T=50 N,T=100 N,T=200

0

1

2

3

4

5

6

7

0

Estimated of number common factors

1

2

3

4

0.08 0.06 0.04 0.02 0.00 −0.02

Max 1985

1990

5

6

7

Estimated of number common factors

Figure 2: US metro-area CPI-U inflation rates

Inflation

Frequency (%)

100

Panel A: Non−Prewhitening

Median 1995 Year

28

Min 2000

2005

Table 1: Frequency of selecting correct number of factors (static factor case; DGP I) r = 2, θ = 0.5, kmax = 7 Non-prewhitening ρ 0.0

0.1

0.3

0.5

0.7

0.9

T

N\

20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50

20 .99 1 1 1 .99 1 1 1 .93 .95 .97 .96 .46 .33 .24 .16 .02 .00 .00 .00 .00 .00 .00 .00

30 1 1 1 1 1 1 1 1 .98 1 1 1 .70 .88 .84 .78 .02 .04 .00 .00 .00 .00 .00 .00

40 1 1 1 1 1 1 1 1 .99 1 1 1 .81 .94 .98 .97 .04 .05 .06 .01 .00 .00 .00 .00

50 1 1 1 1 1 1 1 1 1 1 1 1 .91 .98 .99 1 .08 .12 .09 .08 .00 .00 .00 .00

AR(1) Fitting 20 30 40 50 .94 .99 1 1 .97 1 1 1 .99 1 1 1 .98 1 1 1 .96 1 1 1 .99 1 1 1 1 1 1 1 1 1 1 1 .97 1 1 1 .99 1 1 1 1 1 1 1 1 1 1 1 .97 1 1 1 .98 1 1 1 .99 1 1 1 .99 1 1 1 .84 .98 .99 1 .83 .99 1 1 .80 .99 1 1 .78 .98 1 1 .43 .59 .68 .74 .32 .61 .71 .78 .24 .46 .67 .80 .19 .39 .57 .76

29

Prewhitening First Differencing 20 30 40 50 .53 .89 .95 .98 .51 .99 1 1 .49 .99 1 1 .45 1 1 1 .69 .93 .98 .99 .73 1 1 1 .75 1 1 1 .77 1 1 1 .87 .98 .99 1 .94 1 1 1 .96 1 1 1 .98 1 1 1 .94 1 1 1 .99 1 1 1 1 1 1 1 1 1 1 1 .97 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .98 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Min{AR(1),FD} 20 30 40 50 .93 .98 1 1 .97 1 1 1 .99 1 1 1 .98 1 1 1 .96 .99 1 1 .98 1 1 1 1 1 1 1 1 1 1 1 .97 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .98 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .98 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .98 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Table 2: Frequency of selecting correct number of dynamic factors (dynamic factor case; DGP II) q = 2, r = 4, θ = 0.5, kmax = 7 Non-prewhitening ρ 0.0

0.1

0.3

0.5

0.7

0.9

T

N\

20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50

20 .85 .95 .98 .98 .81 .91 .95 .97 .59 .65 .67 .70 .31 .25 .15 .10 .12 .05 .02 .01 .07 .02 .01 .00

30 .91 .99 1 1 .88 .99 1 1 .76 .93 .96 .98 .52 .67 .66 .65 .29 .39 .27 .16 .10 .11 .03 .01

40 .92 1 1 1 .90 1 1 1 .79 .95 .99 1 .52 .73 .84 .82 .25 .37 .40 .29 .03 .03 .01 .00

50 .93 1 1 1 .93 1 1 1 .77 .96 .99 1 .52 .73 .84 .93 .24 .32 .38 .39 .01 .01 .00 .00

AR(1) Fitting 20 30 40 50 .64 .89 .90 .90 .70 .98 .99 .99 .71 .99 1 1 .70 1 1 1 .70 .90 .91 .91 .80 .99 1 .99 .85 .99 1 1 .88 1 1 1 .73 .90 .92 .92 .88 .99 1 1 .94 1 1 1 .95 1 1 1 .64 .87 .90 .89 .78 .97 .99 .99 .85 .99 1 1 .87 1 1 1 .50 .73 .77 .77 .57 .88 .92 .94 .60 .90 .96 .96 .61 .92 .97 .99 .35 .63 .59 .56 .40 .76 .78 .72 .40 .83 .81 .78 .42 .87 .85 .86

30

Prewhitening First Differencing 20 30 40 50 .13 .74 .82 .84 .06 .92 .96 .96 .02 .94 .99 .99 .01 .94 1 1 .21 .77 .85 .85 .14 .94 .98 .97 .07 .97 1 .99 .04 .97 1 1 .40 .84 .88 .89 .43 .97 .99 .99 .38 .99 1 1 .39 1 1 1 .58 .87 .90 .90 .70 .98 .99 .99 .76 1 1 1 .82 1 1 1 .70 .89 .92 .90 .85 .99 .99 1 .92 1 1 1 .96 1 1 1 .72 .90 .91 .90 .89 1 1 1 .96 1 1 1 .98 1 1 1

Min{AR(1),FD} 20 30 40 50 .64 .89 .90 .91 .70 .98 .99 .99 .71 .99 1 1 .70 1 1 1 .69 .90 .92 .92 .80 .98 1 .99 .85 .99 1 1 .88 1 1 1 .74 .91 .93 .93 .89 .99 1 1 .94 1 1 1 .96 1 1 1 .75 .92 .94 .93 .88 .99 1 1 .94 1 1 1 .96 1 1 1 .76 .92 .94 .93 .89 1 1 1 .94 1 1 1 .97 1 1 1 .75 .91 .92 .91 .90 1 1 1 .96 1 1 1 .98 1 1 1

Table 3: Frequency of selecting correct number of factors (cross section dependence case; DGP III) r = 2, θ = 0.5, β = 0.2, kmax = 7 Non-prewhitening ρ 0.0

0.1

0.3

0.5

0.7

0.9

T

N\

20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50

20 .48 .28 .38 .54 .46 .26 .35 .49 .30 .12 .16 .20 .06 .02 .01 .00 .00 .00 .00 .00 .00 .00 .00 .00

30 .40 .49 .62 .74 .38 .45 .59 .71 .22 .30 .34 .45 .05 .07 .06 .04 .00 .00 .00 .00 .00 .00 .00 .00

40 .36 .30 .74 .87 .34 .28 .70 .83 .21 .18 .51 .60 .04 .03 .14 .15 .00 .00 .00 .00 .00 .00 .00 .00

50 .28 .22 .65 .89 .27 .22 .61 .88 .17 .12 .42 .70 .03 .03 .10 .24 .00 .00 .00 .00 .00 .00 .00 .00

AR(1) Fitting 20 30 40 50 .31 .27 .26 .19 .14 .34 .23 .14 .20 .43 .58 .48 .30 .54 .70 .79 .35 .32 .29 .23 .17 .39 .26 .18 .26 .50 .64 .54 .37 .62 .76 .83 .42 .41 .35 .28 .22 .45 .31 .21 .32 .58 .71 .63 .47 .70 .84 .89 .39 .40 .36 .28 .20 .45 .31 .22 .31 .57 .72 .62 .40 .70 .83 .89 .26 .27 .27 .21 .12 .35 .23 .18 .16 .43 .60 .52 .20 .52 .69 .79 .09 .09 .10 .06 .03 .12 .07 .05 .04 .11 .20 .17 .04 .12 .17 .29

31

Prewhitening First Differencing 20 30 40 50 .05 .08 .07 .06 .01 .13 .07 .03 .00 .13 .24 .18 .01 .14 .31 .43 .08 .11 .09 .09 .02 .17 .10 .06 .01 .20 .35 .23 .02 .22 .42 .53 .17 .19 .16 .13 .07 .25 .15 .09 .06 .33 .48 .37 .12 .40 .58 .69 .28 .28 .24 .20 .12 .34 .22 .15 .16 .43 .59 .49 .26 .55 .71 .79 .38 .37 .32 .25 .18 .40 .27 .19 .27 .53 .66 .57 .39 .65 .80 .85 .43 .42 .35 .30 .23 .44 .32 .22 .35 .60 .73 .63 .49 .74 .84 .89

Min{AR1,FD} 20 30 40 50 .31 .28 .26 .19 .14 .35 .23 .14 .20 .43 .59 .48 .30 .54 .70 .79 .35 .33 .30 .23 .18 .39 .27 .18 .26 .50 .65 .55 .37 .62 .77 .84 .43 .43 .36 .29 .24 .47 .32 .22 .33 .60 .73 .64 .48 .72 .85 .90 .46 .45 .40 .32 .25 .51 .35 .26 .35 .64 .77 .67 .47 .77 .88 .91 .44 .45 .39 .33 .24 .51 .36 .26 .33 .63 .77 .68 .44 .75 .87 .92 .45 .44 .37 .31 .24 .47 .34 .24 .35 .61 .75 .66 .50 .75 .85 .91

Table 4: Frequency of selecting correct number of factors (heterogenous AR1 coefficient case; DGP IV) r = 2, θ = 0.5, kmax = 7 Non-prewhitening T

N\

20 Case(a): 20 .13 30 .12 40 .03 50 .01 20 30 40 50

.09 .07 .01 .00

Prewhitening AR(1) Fitting First Differencing Min{AR1,FD} 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 ρi ∼iidU(-0.15,0.15) i = 1, .., N2 ; ρi ∼iidU(0.65,0.95) i = N2 + 1, .., N .05 .02 .01 .84 .91 .91 .93 .74 .88 .94 .97 .93 .98 .99 1 .15 .07 .05 .89 .97 .97 .98 .82 .99 1 1 .97 1 1 1 .02 .01 .01 .80 .92 .97 .98 .84 1 1 1 .96 1 1 1 .01 .00 .00 .79 .89 .92 .96 .87 1 1 1 .97 1 1 1 Case(b): ρi ∼iidU(0,0.95) for all i = 1, ..., N .06 .04 .04 .87 .94 .95 .96 .90 .97 .99 1 .97 1 1 1 .14 .09 .08 .90 .98 .98 .99 .94 1 1 1 .99 1 1 1 .02 .04 .02 .83 .96 .98 .99 .98 1 1 1 1 1 1 1 .01 .01 .01 .82 .93 .96 .96 .99 1 1 1 1 1 1 1

32

Estimating the Number of Remaining Links in ... - Gerardo Canfora