http://www.econometricsociety.org/

Econometrica, Vol. 82, No. 5 (September, 2014), 1799–1851 IDENTIFICATION USING STABILITY RESTRICTIONS LEANDRO M. MAGNUSSON Business School, The University of Western Australia, Crawley, WA 6009, Australia SOPHOCLES MAVROEIDIS Institute New Economic the Oxford Martin School, University of Department of for Economics andThinking INET atatthe Oxford, Oxford, OX1 3UQ, U.K.

The copyright to this Article is held by the Econometric Society. It may be downloaded, printed and reproduced only for educational or research purposes, including use in course packs. No downloading or copying may be done for any commercial purpose without the explicit permission of the Econometric Society. For such commercial purposes contact the Office of the Econometric Society (contact information may be found at the website http://www.econometricsociety.org or in the back cover of Econometrica). This statement must be included on all copies of this Article that are made available electronically or in any other format.

Econometrica, Vol. 82, No. 5 (September, 2014), 1799–1851

IDENTIFICATION USING STABILITY RESTRICTIONS BY LEANDRO M. MAGNUSSON AND SOPHOCLES MAVROEIDIS1 This paper studies inference in models that are identified by moment restrictions. We show how instability of the moments can be used constructively to improve the identification of structural parameters that are stable over time. A leading example is macroeconomic models that are immune to the well-known (Lucas (1976)) critique in the face of policy regime shifts. This insight is used to develop novel econometric methods that extend the widely used generalized method of moments (GMM). The proposed methods yield improved inference on the parameters of the new Keynesian Phillips curve. KEYWORDS: GMM, identification, structural stability, Lucas critique.

1. INTRODUCTION TWO MAJOR CONCERNS in the macroeconometrics literature are, on the one hand, the problem of limited variation in the data that often leads to weak identification, and on the other hand, the widespread parameter instability in empirical relations. In this paper, we show how one of these problems can be turned on its head to solve the other. In particular, parameter instability can be used constructively to identify structural relations that are time-invariant. The contribution of this paper is twofold. First, it makes a formal case for using stability restrictions (e.g., immunity to the well-known Lucas (1976) critique) as a source of identification of the stable structural parameters in economic models. The key insight is that changes in the distribution of the data induced by, for example, policy regime shifts, provide additional exogenous variation that can be usefully exploited for inference. This information is ignored by the usual generalized method of moments (GMM) approach that relies only on full-sample exclusion or cross-equation restrictions to identify the structural parameters of the model. The current practice can be justified if there are no breaks in the data generating process, or if breaks are small relative to the information coming from full-sample restrictions (Li (2008), Li and Müller (2009)). We argue that these assumptions are too strong in many contexts, since there is widespread evidence of parameter instability (Stock and 1 We would like to thank Daniela Scida for research assistance. We also thank Don Andrews, Debopam Bhattacharya, Steve Bond, Xu Cheng, Victor Chernozhukov, Guillaume Chevillon, Bruce Hansen, Michael Jansson, Frank Kleibergen, Blaise Melly, Ulrich Müller, Whitney Newey, Serena Ng, Barbara Rossi, Argia Sbordone, and Jim Stock for helpful comments and discussion. We also benefited from comments by seminar participants at Columbia, MIT, the NBER summer institute, the Greater New York Metropolitan Area workshop at Princeton, the North American summer meeting of the Econometric Society at Boston University, the CIREQ conference in Montreal, and the World Congress of the Econometric Society in Shanghai. Financial support by NSF Grant SES-1022623 is gratefully acknowledged. Mavroeidis would like to thank the European Commission for research support under FP7 Marie Curie Fellowship CIG 293675. Magnusson would like to thank Tulane University support under a Research Summer Fellowship.

© 2014 The Econometric Society

DOI: 10.3982/ECTA9612

1800

L. M. MAGNUSSON AND S. MAVROEIDIS

Watson (1996), Clarida, Galí, and Gertler (2000), Sims and Zha (2006)) and weak identification (Stock, Wright, and Yogo (2002), Canova and Sala (2009)). The second contribution is to develop new econometric methods for structural inference that exploit the information in stability restrictions and require only mild assumptions about the nature of instability in the distribution of the data. Specifically, our methods do not require any prior knowledge about the incidence, number, and timing of breaks. Our main assumption, which is used in the literature on structural breaks (cf. Perron (2006)), is that partial-sample moments satisfy a functional central limit theorem. Because no assumptions about identification are required, the main regularity conditions are strictly weaker than those used to justify the stability tests that are widely used in applied work, for example, Andrews (1993), Andrews and Ploberger (1994), Elliott and Müller (2006). Therefore, the scope of the proposed methods is very wide. We focus mostly on methods that are robust to weak identification, but we also discuss the case of strong identification, as well as the case in which some parameters are strongly identified and others are possibly weakly identified. We examine the empirical relevance of the proposed methods by applying them to a widely used macroeconomic model, the New Keynesian Phillips curve (NKPC). This model is known to suffer from problems of weak identification; see Kleibergen and Mavroeidis (2009) and the references therein. Weak identification robust confidence intervals that use only full-sample information are very wide, in some cases containing the entire parameter space. However, methods that exploit the stability restrictions yield drastically smaller confidence sets on the parameters. This paper relates to Rossi (2005), who proposed GMM-based methods for testing parametric restrictions jointly with the hypothesis of stability of the parameters. Rossi did not consider the implications of stability restrictions for the identification of structural parameters, but focused instead on the implication of stability restrictions for nested model comparisons; see also Giacomini and Rossi (2009). Our proposed methods also differ from theirs in that they are robust to identification failure. The paper also relates to the literature on identification via heteroskedasticity; see Lewbel (2012), Rigobon (2003), and Klein and Vella (2010). These papers obtained identification by exploiting a certain heterogeneity in the data generating process. In the case of Rigobon (2003), this heterogeneity came from changes in the variance of the shocks in structural vector autoregressions. These changes might be small, but while this suggests identification might be weak, heretofore the implications of small changes for inference had not been investigated. By specifying the moment conditions appropriately, our framework nests Rigobon’s method, and is more general, in that it does not require any rank condition for identification, nor knowledge about the timing of breaks.

IDENTIFICATION USING STABILITY RESTRICTIONS

1801

The outline of the paper is as follows. Section 2 presents our assumptions and motivating examples and describes the proposed methods. Section 3 provides the underlying asymptotic theory. Section 4 reports asymptotic power comparisons of the tests and simulation results on their size in finite samples. Section 5 presents an empirical application followed by a brief conclusion. Proofs, tables of critical values, and additional empirical results are given in the Supplemental Material (Magnusson and Mavroeidis (2014)), which is available online. We use the following notation: [x] denotes the integer part of a scalar x, 1B is the indicator function that takes the value of 1 when event B is true and 0 otherwise, PX = X(X  X)−1 X  , and A1/2 denotes the symmetric square root of a positive definite matrix A. Unless otherwise stated, all limits are taken as T → ∞. 2. ASSUMPTIONS AND TESTS Consider a p-dimensional vector of structural parameters θ whose parameter region Θ is a subset of Rp , and suppose that we observe a sample of size T given by a triangular array of random variables {YTt : t ≤ T T ≥ 1}. The triangular array construction is used to account for instabilities in the data generating process. For notational convenience, we will drop the dependence of random variables in the sample on T where no confusion arises. We assume that economic theory gives rise to a set of moment conditions that can be represented in terms of a k-dimensional function of data and parameters f (θ YTt ), abbreviated as ft (θ) dropping the dependence on T for convenience, whose expectation vanishes at the true value of θ, that is,   E ft (θ) = 0 for all t ≤ T T ≥ 1 (1) For example, a typical Euler equation model with G equations gives rise to a set of conditional moment restrictions of the form E[ht (θ)|It ] = 0, where ht (θ) is a G-dimensional function of data and parameters, for example, a vector of residuals or structural errors, and It is the information set at time t. Given any set of instrumental variables Zt ∈ G×k in It , the conditional moment restrictions can be converted to unconditional restrictions in (1) by defining ft (θ) = Zt ht (θ). Let the partial sums of the moment function ft (θ) be denoted by FsT (θ) =

[sT ] 

ft (θ)

t=1

where s ∈ [0 1]. The moment conditions (1) are equivalent to E[FsT (θ)] = 0 for all s ∈ [0 1]. We will refer to FsT (θ) as partial-sample moments, and FT (θ) ≡ F1T (θ) as full-sample moments.

1802

L. M. MAGNUSSON AND S. MAVROEIDIS

Our interest lies in testing the null and alternative hypotheses (2)

H0 : θ = θ0

and

H1 : θ = θ0 

using tests with significance level α.2 The robustness requirement is that α-level tests should not reject H0 more often than the nominal level asymptotically for a wide range of data generating processes (DGPs), satisfying a multivariate invariance principle for the sample moments; see Müller (2011) for a motivation. There is a large class of tests that meet this requirement, and there is generally no uniformly most powerful (UMP) test. We shall therefore address the question of efficiency by means of weighted average power (WAP) criteria. The key observation motivating this paper is that the moment conditions (1) together with the null hypothesis H0 : θ = θ0 give rise to the kT identifying restrictions E[FsT (θ0 )] = 0 for all s ∈ [0 1]. These restrictions can be written equivalently as the restriction that E[ft (θ0 )] is zero on average, that is, E[FT (θ0 )] = 0, and the restriction that E[ft (θ0 )] is stable over t. By defining (3)

sT (θ0 ) = FsT (θ0 ) − sFT (θ0 ) F

sT (θ0 )] = 0 for the stability restriction can be equivalently expressed as E[F all s ∈ [0 1]. The usual approach to inference on the hypothesis (2) utilizes only the restrictions E[FT (θ0 )] = 0. We show in the next section that this approach wastes information unless E[ft (θ0 )] is constant over t under H1 , that is, sT (θ0 )] = 0 holds for all s ∈ [0 1] and every possible true value of θ. E[F 2.1. Identification An important requirement for robustness is that tests should control size in cases where θ may be arbitrarily weakly identified. This covers situations in which identification remains weak when we consider both the full-sample and the stability restrictions. For the full-sample restrictions, weak identification is characterized using the local-to-zero asymptotic nesting of Stock and Wright (2000). For the stability restrictions, weak identification corresponds to small instability/breaks under the alternative. To make these precise, define the step function mT (θ r) ≡ Eθ [f[rT ] (θ0 )], r ∈ [0 1], where the subscript in the expectation operator is used to highlight that expectation is taken with respect to the distribution of the data at the true value of the parameter θ. We distinguish four cases, which are summarized in Table I: strong full-sample identification; weak full-sample identification—this is the nesting of Stock and Wright (2000); large breaks (strong identification via stability restrictions); and small breaks (weak identification via stability restrictions). Thus, there are four possibilities to consider in total: (i) weak fullsample identification/small breaks; (ii) weak full-sample identification/large 2

Tests of general nonlinear hypotheses are described in the Supplemental Material.

IDENTIFICATION USING STABILITY RESTRICTIONS

1803

TABLE I DIFFERENT IDENTIFICATION SETTINGSa Identification

Full-sample

Nesting

Strong Weak

Stability

Strong Weak

mT (θ) → m(θ), and m(θ) = 0 iff θ = θ0 √ T mT (θ) → m(θ), with m(θ0 ) = 0 T (θ r) → m (θ r), m (θ r) = 0 for all r ∈ [0 1] iff θ = θ0 m √ T (θ r) → m (θ r), with m (θ0  r) = 0 for all r ∈ [0 1] Tm

1 a m (θ r) ≡ E [f T (θ r) = mT (θ r) − mT (θ). Cases: (i) fullθ [rT ] (θ0 )], r ∈ [0 1]; mT (θ) = 0 mT (θ r) dr ; m T

sample: weak, stability: weak; (ii) full-sample: weak, stability: strong; (iii) full-sample: strong, stability: weak; (iv) fullsample: strong, stability: strong.

breaks; (iii) strong full-sample identification/small breaks; (iv) strong fullsample identification/large breaks. Case (i) corresponds to (overall) weak identification, whereas cases (ii)–(iv) correspond to strong identification. We focus on the case of weak identification, which is appropriate for robustness in several applications. Strong identification will be treated in Section 3.4. The case in which some of the parameters are weakly identified and others are strongly identified is discussed in Section 3.5. Weak identification is characterized by the following assumption. s ASSUMPTION 1: Eθ [T −1/2 FsT (θ0 )] → 0 m(θ r) dr uniformly in s, where the function m(θ ·) belongs to Dk[01] , the space of functions on [0 1] that are rightcontinuous with finite left limits (also known as cadlag), and m(θ0  r) = 0 for all r ∈ [0 1]. At s = 1, Assumption 1 corresponds to the weak identification assumption in Stock and Wright (2000).3 This assumption makes precise the notion that the moment conditions (1) are nearly satisfied even when the true value θ is far from the hypothesized value θ0 . The key addition to Stock and Wright’s framework is that Assumption 1 allows us to characterize the behavior of the moment conditions also over subsamples, and thus model time variation in Eθ [ft (θ0 )] under H1 . The special case in which Eθ [ft (θ0 )] is approximately constant to order T −1/2 in large samples corresponds to m(θ s) being constant in terms of s. Assumption 1 implies that any time variation in Eθ [ft (θ0 )] under H1 is of the same order of magnitude as the full-sample moment conditions Eθ [FT (θ0 )]. This ensures that the informational content of stability restrictions is comparable to that of the fullsample moment conditions. 3 Because we do not seek to characterize the behavior of estimators of θ, we do not need uniform convergence and differentiability of m(θ; s) with respect to θ.

1804

L. M. MAGNUSSON AND S. MAVROEIDIS

The function m(θ ·) in Assumption 1 can accommodate most types of instability that have been used in the literature on structural change. Specifically, m(θ ·) can be a step function with a finite number of discontinuities, corresponding to a fixed number of structural “breaks” or distinct “regimes,” as in Andrews (1993), Sowell (1996), or Bai and Perron (1998). It can also be a realization of a continuous stochastic process, such as a martingale process, as in Stock and Watson (1996), or the general persistent time variation process studied in Elliott and Müller (2006), representing slow continuous time variation. It could also be a smooth deterministic function of time, such as a spline, representing a smooth transition between different regimes. 2.2. Examples EXAMPLE IV—Linear IV Regression: The model consists of a structural equation (4) and a reduced-form (first-stage) equation (5): (4)

y1t = Y2t θ + ut 

(5)

Y2t = Zt Πt + V2t 

t = 1     T

where (y1t  Y2t ) is a 1 × (1 + p) random vector, ut is a (structural) error, θ ∈ p is the unknown structural parameter vector, Zt ∈ R1×k is the observed vector of instrumental variables, V2t ∈ R1×p is a (reduced-form) error vector, and Πt ∈ Rk×p , t = 1     T is a sequence of unknown parameters. Let v1t = ut + V2t θ. The model can be written compactly as (6)

Yt = Zt Πt A + Vt  Yt = (y1t  Y2t )

t = 1     T A = (θ Ip )

where and Vt = (v1t  V2t )

The reduced-form errors Vt satisfy E(Zt Vt ) = 0. The identifying restrictions are given by (1) with ft (θ) = Zt (y1t − Y2t θ), and Eθ [ft (θ0 )] = mT (θ Tt ) = E(Zt Zt )Πt (θ − θ0 ). If we assume that E(Zt Zt ) = ΣZZ for all t, then mT (θ) = T T (θ Tt ) = ΣZZ (Πt − Π T ) × ΣZZ Π T (θ − θ0 ), where Π T = T1 t=1 Πt , and m −1/2 (θ − θ0 ). Assumption 1 is satisfied if Πt = T C( Tt ), for some non-stochastic cadlag function C : [0 1] → k×p , such that m(θ s) = ΣZZ C(s)(θ − θ0 ). EXAMPLE RS—Identification Through Policy Regime Shifts: Consider the structural model (7)

yt = βE[yt+1 |It ] + γxt + εt 

where E[·|It ] denotes expectations conditional on the information set It , and εt is an unobserved shock, which is assumed to be uncorrelated with lags of the observables, y and x. The above equation can be thought of as (a possibly linearized version of) some Euler equation that determines the optimal choice of yt by an economic agent given their objective function. The parameters θ =

IDENTIFICATION USING STABILITY RESTRICTIONS

1805

(β γ) will then be directly related to some “deep” structural parameters that characterize the objective function. We want to do inference on θ using the identifying assumption E[Zt ht (θ)] = 0, ht (θ) = yt − βyt+1 − γxt , with Zt a 1 × k vector containing lags of yt and xt . Identification depends on the distribution of xt . Suppose that xt is a policy variable determined according to an inertial feedback rule of the form (8)

xt = ρxt−1 + (1 − ρ)ϕyt + ηt 

where ηt is an unobserved “policy” shock. Then, in a determinate rational exy pectations equilibrium, the dynamics of yt and xt are given by yt = α1 xt−1 + vt , y xt = ρ1 xt−1 + vtx , where vt  vtx are innovations.4 So, there is only one relevant instrument, xt−1 , and θ is underidentified. Now, suppose that policy changes over time, for example, ϕ becomes ϕt , but the parameters in (7) remain stable, that is, immune to the Lucas critique. Then, a single change in the policy parameters at date tb , say, suffices to induce identification: interacting xt−1 with the indicator 1{t
γρ1 The reduced-form parameters and errors are α1 = 1−βρ , ρ1 = 1

vt =

εt +(βα1 +γ)ηt , 1−(1−ρ)ϕ(βα1 +γ)

and vtx =

ηt +(1−ρ)ϕεt ; 1−(1−ρ)ϕ(βα1 +γ)



1+βρ−γϕ(1−ρ)−

see the Supplemental Material.

(1+βρ−γϕ(1−ρ))2 −4βρ , 2β

1806

L. M. MAGNUSSON AND S. MAVROEIDIS

(γ − γ0 )ω11t − (βγ − β0 γ0 )ω12t . If there is more than one volatility regime, then the model is identified on the order condition. The rank condition is satisfied if the equations are not linearly dependent; see Rigobon (2003, Proposition 1). Weak identification occurs when time variation in the volatilities is small, or when volatilities are nearly proportional across regimes. Rigobon (2003) assumed strong identification, and suggested a GMM estimator that is based on either known or estimated regime dates. 2.3. Proposed Tests To use the information in the stability restrictions, a natural approach is to model the instability in the reduced form, derive additional moment restrictions, and apply GMM. Indeed, this is the approach used in the identificationby-heteroskedasticity literature; see Example HET and Rigobon (2003). When identification is strong, this approach leads to efficient tests under some regularity conditions; see Section 3.4. But when identification is weak, there are other procedures that might have better power. The general point is that there is no uniformly most powerful (UMP) or nearly UMP test in this context, and so there is scope for looking at various alternatives. To gain some intuition for this, consider the following example. EXAMPLE IV—Continued: Suppose there is a single endogenous regressor in (4), p = 1, and a single break in the first-stage regression (5) occurring at time [τT ], where τ ∈ (0 1). Define the dummy variable dtτ = 1{t<[τT ]} . Then, Zt Πt = Zt dtτ Π1 + Zt (1 − dtτ )Π2 in (5), so the model can be written as a constant-parameter model with 2k “split-sample” instruments Z t (τ) =  [Zt dtτ  Zt (1−dtτ )] and 2k×1 first-stage regression coefficients Π = [Π1  Π2 ] . If τ is known, then it seems natural to use an optimal test in the constantparameter formulation, such as Moreira’s (2003) conditional likelihood ratio (CLR) test with the 2k instruments Z t (τ), which is nearly efficient based on Andrews, Moreira, and Stock (2006). Indeed, we find that evaluating this test at an estimated break date, [τT ˆ ], is also efficient when the break date is large. However, we show numerically below that other procedures can have better power when the break date is estimated and the break is small. The intuition for this is that, in those cases, the break date will be imprecisely estimated, so ˆ will be using suboptimal instruments with possibly high tests based on Z t (τ) probability. Alternative procedures that consider every possible break point within a range that includes the true τ may therefore have better power. Our objective is to do inference that is widely applicable, so we start with weak assumptions and strengthen them progressively. This leads us to consider the following two different approaches to inference on (2). The first approach requires assumptions about the limiting distribution only of the partial-sample

IDENTIFICATION USING STABILITY RESTRICTIONS

1807

moments FsT (θ0 ), and leads to generalizations of the S test and confidence set of Stock and Wright (2000). The second approach requires assumptions about the joint limiting distribution of FsT (θ0 ) and its Jacobian ∂FsT (θ0 )/∂θ , and leads to generalizations of the conditional score and conditional likelihood ratio tests, Kleibergen (2002, 2005) and Moreira (2003). Both approaches yield inference that is robust to weak identification, and neither of them dominates the other (there is no UMP test). However, we show below that the second approach leads to more efficient tests when identification is strong. The test statistics we derive can be written generically in the form “A-B,” where “A” refers to the instability treatment and “B” refers to the full-sample treatment. 2.3.1. Generalized S Tests Let XT (s) = T −1/2 FsT (θ0 ) denote the partial-sample moments at θ0 , and ⇒ denote weak convergence of the underlying probability measures. We make the following high-level assumption about the large sample behavior of XT under both H0 and H1 . ASSUMPTION 2: (i) The process XT (s) = T −1/2 FsT (θ0 ) satisfies XT (·) − E[XT (·)] ⇒ Vff1/2 Wf (·), where Wf is a standard k × 1 Wiener process, and Vff is a positive definite k × k matrix. (ii) There exists a consistent estimator of Vff , denoted V ff (θ0 ). Assumption 2 strengthens Stock and Wright (2000, Assumption A), which corresponds to the special case of s = 1, above. Primitive conditions for the high-level Assumption 2 can be found in various papers in the stability literature, for example, Andrews (1993) and Sowell (1996). For instance, when the moment functions are given by ft (θ) = Zt ht (θ), Assumption 2 will be satisfied when ht (θ0 ) is strong mixing with finite moments of order greater than 2, and Zt is asymptotically mse-stationary; see Hansen (2000).5 It is important to acknowledge that Assumption 2 excludes permanent changes in the variance of the moment conditions. This assumption is shared by all tests of structural change proposed in the literature, so it does not limit the applicability of our results any more than for any one of the other stability tests. Technically, this assumption is necessary for the proposed tests to control size asymptotically; see Hansen (2000). However, it does not preclude all changes in the variance of the moment conditions. For example, it is sufficient to assume that the magnitude of any changes in the variance of the sample moments converges to zero as the sample increases, following the 5 Asymptotic mse-stationarity is weaker than strict stationarity and allows for nonpermanent changes in the marginal distribution of Zt .

1808

L. M. MAGNUSSON AND S. MAVROEIDIS

approach used by Bai and Perron (1998) to obtain pivotal statistics for inference on break dates. Therefore, Assumption 2 does not preclude changes in the variance that can be detected with possibly high probability. In addition, the results in Hansen (2000) indicate that the size distortions induced by such departures from Assumption 2 are modest, and our numerical results below confirm that. Finally, it is possible to relax Assumption 2 to allow for permanent changes in the variance, as long as the dates at which the variance matrix changes are known or consistently estimable and the variance matrix is consistently estimable in each subperiod over which it stays constant. EXAMPLE IV—Continued: Using (6), the moment function can be expressed as ft (θ0 ) = Zt Zt Πt (θ − θ0 ) + Zt u0t , where u0t = v1t − V2t θ0 . Assumption 2 holds if mT (θ Tt ) = E(Zt Zt )Πt (θ − θ0 ) is uniformly bounded, and Zt u0t satisfies a FCLT, sufficient conditions for which are widely avail[sT ] p able. For example, if sups∈[01] T1 t=1 Zt Zt − sΣZZ → 0, where ΣZZ is nonsingular, Zt u0t is mixing with φ of size −r/2(r − 1), r ≥ 2, or α of size −r/(r − 2), r > 2, E|Zit u0t |r < Δ < ∞ for all t and i = 1     k; and T limT →∞ var(T −1/2 t=1 Zt u0t ) = Vff , where Vff is finite and nonsingular, then Assumption 2 follows by White (2001, Theorem 7.30).6 Using the approach proposed by Müller (2011), we show in the next section that asymptotically efficient tests based on Assumption 2 can be expressed as joint tests of (i) the full-sample moment restrictions E[FT (θ0 )] = 0 and (ii) the restriction that E[ft (θ0 )] is stable. We denote the statistics for testing the fullST , respectively. The forsample moment and stability restrictions by ST and  mer is identical to the S statistic proposed by Stock and Wright (2000), and the sT (θ0 ); see (3). The proposed generlatter depends primarily on the process F alized S statistics can be written generically as (9)

gen-STc˜ c¯ (θ0 ) = gen- STc˜ (θ0 ) +

c¯ ST (θ0 ) 1 + c¯

where gen- STc˜ (θ0 ) is the stability component, ST (θ0 ) is the full-sample compo˜ c¯ are nonnegative constants that determine the weights the innent, and c vestigator places a priori on the information coming from stability and fullsample moment restrictions, respectively. Our criterion is maximizing WAP, with weights placed upon the different possible distributions of the data under the alternative, that is, the function m(θ ·) in Assumption 1. This approach has a long tradition in the related literature on testing for structural change. 6 For a general nonlinear moment function ft (θ0 ), the above assumption can be restated by replacing Zt u0t with ξt ≡ ft (θ0 ) − E[ft (θ0 )].

IDENTIFICATION USING STABILITY RESTRICTIONS

1809

The S statistic in (9) is given by (10)

ST (θ0 ) =

1 FT (θ0 ) V ff (θ0 )−1 FT (θ0 ) T

where V ff (θ0 ) is a consistent estimator of the variance of T −1/2 FT (θ0 ). So, the S test of Stock and Wright (2000) is a special case of the proposed tests when the investigator places zero weight on the stability restrictions. The stability statistic is obtained by maximizing WAP given alternative distributions of weights over the process m(θ ·). For reasons that we discuss below, we recommend the use of a statistic that is optimal against persistent time variation (Stock and Watson (1996, 1998) and Elliott and Müller (2006)). For completeness, we also discuss the alternative of a fixed number of breaks (Andrews and Ploberger (1994), Sowell (1996), Bai and Perron (1998)). Persistent time variation is characterized by a distribution of weights over m(θ ·) that make it a martingale process with variance proportional to Vff ; see Assumption 5 below. Using such weights, the stability-only component of the generalized S statistic is very similar to the qLL statistic proposed by Elliott and Müller (2006) to test against persistent time variation in regression coefficients (“qLL” stands for quasi local level). Following their recommendation, we set c˜ = 10, and refer to the corresponding generalized S statistic that puts equal weight on stability and full-sample restrictions as qLL-S and to the stabilityonly S statistic as qLL- S. These statistics can be computed using the following steps: 1. Compute vt = V ff (θ0 )−1/2 ft (θ0 ) (k × 1), and denote the ith element by vit , i = 1     k. 2. For i = 1     k, generate the series {wit }Tt=1 as wi1 = vi1 and wit = . r˜wit−1 + vit , for t = 2     T , with r˜ = 1 − 10 T 3. Regress {wit }Tt=1 on {˜r t }Tt=1 and obtain the squared residuals, sum over all i = 1     k, and multiply by r˜ . T k T 4. Compute i=1 t=1 (vit − v¯ i )2 , where v¯ i = T −1 t=1 vit , and subtract the quantity in step 3 from it to get qLL- ST (θ0 ). 5. Compute qLL-S using the formula (11)

ST (θ0 ) + qLL-ST (θ0 ) = qLL-

10 ST (θ0 ) 11

The qLL-S (- S) test rejects H0 in (2) for large values of the statistic qLLST (θ0 ) (- ST (θ0 )). Asymptotic critical values for those tests can be obtained by simulation using Theorem 1; see the Supplemental Material. Next, we turn to tests that are optimal against a fixed number of breaks. In the case of a single break at an unknown date, the proposed generalized S statistics are denoted exp-S and ave-S. They can be obtained using the

1810

L. M. MAGNUSSON AND S. MAVROEIDIS

formulae (12)

exp-ST (θ0 ) = exp- ST (θ0 ) + ST (θ0 )

(13)

ST (θ0 ) + ST (θ0 ) ave-ST (θ0 ) = ave-

where, with a slight abuse of notation,   exp- ST (θ0 ) = log exp  (14) ST (θ0  s) dνs  (15)

ave- ST (θ0 ) =

ς



 ST (θ0  s) dνs 

νs ∼ Uniform over ς ⊂ (0 1)

ς

and (16)

−1

1  Vff (θ0 )  sT (θ0 ) F ST (θ0  s) = F sT (θ0 ) T s(1 − s)

The integrals in (14) and (15) are computed by averaging over all time periods [sT ] for s ∈ ς. The tests require trimming at the end points of the sample; see Andrews and Ploberger (1994). Asymptotic critical values are nonstandard, but can be computed by simulation, and they are reported in the Supplemental Material. Statistics of the form (9) for particular weighting schemes were proposed previously by Rossi (2005) for a related testing problem. Specifically, Rossi considered the problem of testing a constant null hypothesis on parameters against time-varying alternatives of the same parameters. As we discuss below, our proposed tests have power against such alternatives, too. The novel idea here is that these statistics can be used to improve the identification of θ when it is constant under the alternative. More specifically, in exactly identified models, the exp-S and ave-S tests of (2) given in (12) and (13) are the same as the Exp-Wald∗T and Mean-Wald∗T tests of Rossi (2005, equations 25 and 26), respectively, but the hypotheses and assumptions are different. In general, the exp-S and ave-S tests are equivalent to the Exp-Wald∗T and Mean-Wald∗T tests, respectively, of the null hypothesis μt = 0 against single-break alternatives in the auxiliary “local level” model ft (θ0 ) = μt + ut , where μt = E[ft (θ0 )] and ut = ft (θ0 ) − μt .7 This connection is analogous to the fact that the S test of Stock and Wright (2000) can be seen as a Wald test of the null hypothesis E[FT (θ0 )] = 0 against E[FT (θ0 )] = 0.8 Of the above generalized S tests, we recommend using the qLL-S test against the fixed-break alternatives for a number of reasons. First, it has good overall 7 The term “auxiliary model” is used here in the sense of Dufour (2003, p. 789), who used it with reference to the Anderson and Rubin (1949) test in the linear IV regression model. 8 Section S.5 of the Supplemental Material provides further discussion of this point.

IDENTIFICATION USING STABILITY RESTRICTIONS

1811

power even against other common types of time variation; see Section 4. Second, it performs very well in our application, and we expect this to be the case in several other applications where there is evidence of persistent time variation. Third, it is simpler and faster to compute, and this is an important advantage in practice: the qLL-S test does not require the estimation of a sequence of statistics or any trimming at the end points of the sample, and it takes almost the same time to compute as the S test, which is an important advantage when inverting the test to compute confidence sets. Interpretation of Rejection and Confidence Sets. The generalized S tests have power against violation of the moment conditions (1) evaluated at θ = θ0 , which is the joint hypothesis that H0 : θ = θ0 and that the overidentifying restrictions are valid in every subsample. Thus, generalized S sets can be empty. The moment conditions (1) may fail to hold under H0 because some instruments are invalid (for at least some part of the sample), or because the true value of θ is time-varying. This point can be illustrated using the IV example given by (4) and (5). Suppose that the structural parameters are timevarying under H1 , such that θt − θ0 = h( Tt ) ∈ D[0 1]p for all t ≤ T T ≥ 1, while the first-stage regression coefficients are constant, and local to zero, that is, Πt = T −1/2 C for all t ≤ T and some nonstochastic matrix C. Then, under this time-varying parameter alternative, E[T −1/2 FsT (θ0 )] = ΣZZ Ch(s). Thus, instability in the moments E[ft (θ0 )] under the alternative may come either from instability in the first-stage regression coefficients when the true value of θ is constant, as in Example IV above, or from instability in θ (or both). This property can be an advantage if one is interested in testing the hypothesis that θ is stable before obtaining a confidence set on θ, because it avoids any pre-testing bias that may be induced by using separate stability tests, and which may be significant; see Elliott and Müller (2010). There are many applications in which persistent time variation in at least some of the parameters makes economic sense, such as a time-varying inflation target in the NKPC. Obviously, if this possibility is part of the maintained hypothesis, one would not estimate the model over the full sample treating the parameters as constant, but would either model precisely or make allowances for time variation in some parts of θ. The proposed methods can then be applied to those parameters that are assumed stable a priori. We discuss this further in the context of our empirical application in Section 5. The above property of generalized S sets also requires care in interpreting results when confidence sets are small but nonempty. As in the standard S set, this could happen either because the model is well-specified and precisely estimated, or because it is misspecified but the evidence is too weak to reject it entirely; see Stock and Wright (2000, p. 1065). This issue is analogous to the problem of using instruments that locally violate the exogeneity assumption; see Guggenberger (2012a, 2012b). The split-sample tests that we introduce next do not have power against violations of the overidentifying restrictions, so they do not have the aforementioned problem of interpretation.

1812

L. M. MAGNUSSON AND S. MAVROEIDIS

2.3.2. Split-Sample Tests When the moment functions ft (θ) are linear, for example, ft (θ) = ft (θ0 ) + qt (θ − θ0 ), information from stability restrictions arises from time variation in the expectation of their Jacobian E(qt ). (When ft (θ) is nonlinear, this applies to local alternatives.) Consider the leading case of a one-time change in the expected Jacobian at some date tb . If tb were known, we could generate split-sample instruments at tb and proceed with GMM estimation using the additional k moment conditions generated by the break. The resulting “split-sample” continuously-updated GMM criterion function would be a “split-sample” S statistic,9 or simply split-S, which is a special case of the exp-S and ave-S statistics in (12) and (13) when ς = {tb /T }.10 Moreover, under Assumption 2, asymptotic critical values for the split-S test can be obtained from a χ2 (2k) distribution. In addition to the split-S test, under an additional mild assumption on the Jacobian, as in Kleibergen (2005, Assumption 1), we could also obtain splitsample conditional score (split-KLM) and split-sample conditional likelihood ratio (split-CLR) tests. The motivation for considering such tests is that they are asymptotically more powerful than the split-S test under strong identification; see Section 3.4 below. Since we typically do not know the break date, we can obtain feasible versions of the aforementioned tests by evaluating them at an estimated break date. For this purpose, Assumption 2 is insufficient, because the break date is not identified under the null from the distribution of the sample moments alone. Therefore, we need an assumption about the joint distribution of the sample moments and their Jacobian. FsT (θ0 ) [sT ] ASSUMPTION 3: Let XTQ (s) = T −1/2 vec[Q , where QsT (θ0 ) = t=1 qt (θ0 ) sT (θ0 )] and qt (θ0 ) = ∂ft (θ0 )/∂θ . We assume that: (i) there exists a nonstochastic positive definite k(p + 1)-dimensional square matrix (17)



V = lim var XTQ (1) = T →∞



Vff Vqf

Vf q Vqq



and an estimator V T (θ0 ) =



Vff (θ0 ) V f q (θ0 ) p →V; V qf (θ0 ) V qq (θ0 )

9 The use of the continuously-updated GMM criterion is important for robustness to weak identification, for the reasons discussed in Stock and Wright (2000, p. 1064) and Kleibergen (2005, p. 1107). 10 Think of the analogy between the Chow test and the Quandt likelihood ratio test.

IDENTIFICATION USING STABILITY RESTRICTIONS

and   (ii) X (·) − E XTQ (·) ⇒ Q T



Vff1/2 Vqf Vff−1/2

0 1/2 Vqqf



1813

Wf (·)  Wq (·)

where Wf and Wq are independent standard k × 1 and kp × 1 Wiener processes, and Vqqf = Vqq − Vqf Vff−1 Vf q . Assumption 3 is a stronger version of Assumptions 1 and 2 in Kleibergen (2005). The latter correspond to the special case of s = 1 in the former. Assumption 3 avoids placing any restrictions on the (infinitely dimensional) nuisance parameter limT →∞ T −1 E[QsT (θ0 )], which are difficult to verify. In particular, we avoid making any assumptions about identification or the incidence and magnitude of breaks. EXAMPLE IV—Continued: The Jacobian of the moment function ft (θ) = Zt (y1t − Y2t θ) is qt = −Zt Y2t . Assumption 3 is satisfied if E(Zt Zt )Πt is uniformly bounded and the process vec(Zt Vt ) satisfies a multivariate FCLT. Primitive conditions are analogous to those given for Assumption 2, extended to apply to the entire k(p + 1)-vector vec(Zt Vt ), as opposed to just the k linear combinations Zt Vt (1 −θ0 ) . Under Assumption 3, we can evaluate standard weak-identification robust GMM tests based on the partial-sample GMM objective function, where the break date has been estimated. We refer to such tests as split-sample tests. The robustness objective is that the split-sample tests should control size irrespective of whether there has been a break or not, or whether the break date is consistently estimable, or even when the nature of instability has been misspecified. This is achieved by the following procedure. 1. Estimate of the break date. Specify a range of break dates ς ⊂ (0 1), for example, [015 085]. For each τ ∈ ς, compute the two subsample moments FT1 (θ0  τ) = FτT (θ0 ) and FT2 (θ0  τ) = FT (θ0 ) − FτT (θ0 ), their Jacobians QT1 (θ0  τ) = QτT (θ0 ) and QT2 (θ0  τ) = QT (θ0 ) − QτT (θ0 ), and V T (θ0 ). Compute the k × p matrices DiT (θ0  τ), i = 1 2, from     (18) vec DiT (θ0  τ) = vec QTi (θ0  τ) − V qf (θ0 )V ff (θ0 )−1 FTi (θ0  τ) and estimate their variance by V qqf (θ0 ) = V qq (θ0 ) − V qf (θ0 )V ff (θ0 )−1 V qf (θ0 ) . Thus, obtain the restricted estimator of the break date tb = [τT ] by (19)

   tb i tˆb (θ0 ) = arg max V qqf (θ0 )−1 T vec DT θ0  tb ∈Tς T i=1    tb i  × vec DT θ0  T 2 

−1 i

1814

L. M. MAGNUSSON AND S. MAVROEIDIS

where Tς = {t = [sT ] : s ∈ ς}, T1 = tb , and T2 = T − tb . This can be expressed equivalently as τ(θ ˆ 0 ) = tˆb (θ0 )/T . 2. Split-sample test statistics. The weak-identification robust split-sample statistics are given by (20)

split-ST (θ0  τ) =

2 

Ti−1 FTi (θ0  τ) V ff (θ0 )−1 FTi (θ0  τ)

i=1

split-KLM T (θ0  τ) =

2 

Ti−1 FTi (θ0  τ) V ff (θ0 )−1/2 PDi

i=1

× V ff (θ0 )−1/2 FTi (θ0  τ)  = V ff (θ0 ) D i

−1/2

where

i T

D (θ0  τ)

split-JKLM T (θ0  τ) = split-ST (θ0  τ) − split-KLM T (θ0  τ) split-CLRT (θ0  τ) =

and

1 split-ST (θ0  τ) − rkθ (θ0  τ) 2

2 + split-ST (θ0  τ) + rkθ (θ0  τ)

1/2  − 4split-JKLM T (θ0  τ)rkθ (θ0  τ) 

where rkθ (θ τ) is a statistic that tests for a lower rank value of the Jacobian of the moment conditions w.r.t. θ, for example, the one proposed by Kleibergen and Paap (2006). Evaluate the above statistics at τ = τ(θ ˆ 0 ). 3. Critical values. Conditional critical values for the split-sample tests evaluated at τ(θ ˆ 0 ) are given by the asymptotic distributions that would arise if τ(θ ˆ 0 ) were non-random. This is because, under Assumption 3 and H0 : θ = θ0 , τ(θ ˆ 0 ) is asymptotically independent of T −1/2 F·T (θ0 ). In the case of the split-S, split-KLM, and split-JKLM statistics, the asymptotic distributions are χ2 (2k), χ2 (p), and χ2 (2k − p), respectively. For the split-CLR test, critical values conditional on rkθ (θ0  τ(θ ˆ 0 )) can be computed by simulation. Of the split-sample tests introduced above, we recommend using the splitCLR test, because it is more efficient than the split-S test under strong identification (see Section 3.4), and does not suffer from the spurious declines in power of the split-KLM test at points of inflection of the split-sample GMM objective function under weak identification.11 Note that if there is indeed a break and the true break date [τT ] happens to be outside the specified range in step 1 above, this may affect the power of the split-sample tests, but it will have no effect on their asymptotic sizes, since these tests control size irrespective of whether the break date is estimated consistently or not. 11 This is analogous to the power issue with the standard KLM test discussed in Kleibergen (2005).

IDENTIFICATION USING STABILITY RESTRICTIONS

1815

3. ASYMPTOTIC THEORY In this section, we make use of Müller’s (2011) notion of asymptotic efficiency based on a weak convergence criterion. This approach is similar to the well-known limit of experiments framework of LeCam; see van der Vaart (1998) for details. Müller (2011) provided a detailed discussion of the difference between these two alternative asymptotic efficiency frameworks. In Müller’s framework, we consider models defined directly in terms of a weak convergence property (see Assumptions 1 and 2); that is, we consider the class of data generating processes that satisfy a certain multivariate invariance principle. We obtain efficient tests in the limiting problem and evaluate them at their sample analogues. Müller (2011) articulated the sense in which such tests are efficient. To make use of LeCam’s theory, we need to specify the likelihood ratio process in finite samples, obtain its Gaussian limit, and invoke LeCam’s third lemma. This latter approach is conventional in the literature; see, for example, Elliott, Rothenberg, and Stock (1996) and Andrews, Moreira, and Stock (2006). However, Müller (2011) showed that the two approaches often yield the same efficient tests, and since his approach is simpler to implement, we work in his framework. So, “efficient” in this section is meant in the Müller (2011) sense. Assumption 2 and the continuous mapping theorem (White (2001, Theorem 7.20)) imply that, under H0 , G(XT (s)) ⇒ G(W (s)) for any continuous functional G(·) on Dk[01] . Thus, there exists a large class of pivotal statistics that can be used to test the null hypothesis H0 : θ = θ0 . Define the random element X, such that XT ⇒ X, and let ν0 and ν1 denote the probability measures of X under H0 and H1 , respectively. We shall obtain efficient tests for the limiting problem of testing ν0 against ν1 , and then we will evaluate these tests at their sample analogue using XT and an estimator of the long-run variance VX . Under Assumptions 1 and 2, ν1 is determined by the stochastic differential equation dX(s) = m(θ s) ds + VX1/2 dW (s), and ν0 is determined by dX(s) = VX1/2 dW (s). Therefore, ν1 is absolutely continuous with respect to ν0 , and the Radon–Nikodym derivative of ν1 with respect to ν0 conditional on the entire path of m(θ ·) is given by  1  1 1  −1  −1 ξ(m) = exp (21) m(θ s) VX dX(s) − m(θ s) VX m(θ s) ds  2 0 0 Under the maintained assumptions, the process X(s) can be decomposed  ≡ X(s) − sX(1). Recall the definitions orthogonally into X ≡ X(1) and X(s) 1 (θ s) = m(θ s) − m(θ), and note that the random m(θ) = 0 m(θ s) ds and m function ξ(m) in (21) factors into the product of   1  −1  −1 ¯ (22) ξ(m) = exp m(θ) VX X − m(θ) VX m(θ) 2

1816

L. M. MAGNUSSON AND S. MAVROEIDIS

and (23)

  1 1 1  −1   −1 ˜ξ( (θ s) VX d X(s) − (θ s) VX m (θ s) ds  m m m) = exp 2 0 0

It follows that the statistic X is sufficient for θ if and only if m(θ ·) is constant,  (θ ·) = 0 for every θ. The finite-sample analogue of the statistic X(s) so that m is XT (s) − sXT (1) or T −1/2 [FsT (θ0 ) − sFT (θ0 )], and captures subsample variation in the moment functions ft (θ0 ) that is asymptotically independent of the full-sample moments T −1/2 FT (θ0 ). Therefore, we have established the following result. PROPOSITION 1: Under Assumptions 1 and 2, the statistic FT (θ0 ) is asymptotically sufficient for θ if and only if T −1/2 E[FsT (θ0 ) − sFT (θ0 )] → 0 for all θ and s ∈ [0 1]. Proposition 1 shows that ignoring the stability restrictions implicit in the moment conditions E[ft (θ0 )] = 0 for all t ≤ T cannot be optimal, except in the special case when E[ft (θ0 )] is constant, that is, when the stability restrictions are vacuous. EXAMPLE IV—Continued: The constant-parameter IV regression model is a special case of the model given by (4) and (5) with Πt = Π for all t. The assumption of normality of the errors implies the availability of the lowT dimensional sufficient statistic, t=1 Zt Yt ∈ Rk×(1+p) , where Yt = (y1t  Y2t ); see Andrews, Moreira, and Stock (2006). When Πt = Π, the aforementioned statistic is no longer sufficient for θ. From the factorization theorem, a sufficient statistic is given by the sequence {Zt Yt }Tt=1 . The limiting problem of testing H0 : θ = θ0 against the composite alternative H1 : θ = θ0 is equivalent to testing H0 : m(θ s) = m(θ0  s) = 0 for all s against H1 : m(θ s) = 0 for some s. An asymptotically point optimal test in the limiting problem is given by the statistic ξ(m) in (21). Let νm denote a probability measure for the process m(θ s) under H1 . A WAP maximizing test in the limiting problem is then given by the likelihood ratio (LR) statistic ξ(m) dνm . The finite-sample counterpart of the LR statistic is obtained by substituting T −1/2 FsT (θ0 ) and V ff (θ0 ) for X(s) and VX , respectively, in (21). Alternative WAP maximizing tests differ in the specification of νm . Since (θ s), corresponding to violation of m(θ s) is a linear function of m(θ) and m the full-sample moment restrictions and the stability restrictions, respectively, (θ s). Bewe can equivalently specify νm as a joint measure over m(θ) and m  it (θ s) correspond to the independent statistics X and X, cause m(θ) and m is reasonable to specify independent distributions of weights over m(θ) and (θ s), so the joint measure is given by the product of νm and νm . m

IDENTIFICATION USING STABILITY RESTRICTIONS

1817

¯ X ), For νm , we will use the conventional weight distribution m(θ) ∼ N(0 cV which puts equal weights over alternatives that are “equally hard to detect.” This yields tests that are invariant to rotations of the moment functions FT (θ0 ). The scalar parameter c¯ measures the magnitude of the violation of the fullsample moment conditions. This is the distribution used to motivate the standard Wald (1943) test. ASSUMPTION 4: The distribution of weights νm for m(θ) under the alternative ¯ ff ) for some constant c¯ ≥ 0. is N(0 cV For νm , which corresponds to the stability restrictions, we will consider the two leading alternatives in the stability literature: (i) persistent time variation, as in Stock and Watson (1998) and Elliott and Müller (2006); and (ii) a fixed number of breaks at unknown break dates, as in Andrews (1993), Andrews and Ploberger (1994), and Sowell (1996). In both cases, we will index νm by a scalar parameter c˜ that measures the magnitude of the instability under H1 . Power can be directed toward stability restrictions versus full-sample moment ¯ restrictions by varying c˜ relative to c. 3.1. Persistent Time Variation We consider the drifting parameter approach to modeling instability that was followed by Stock and Watson (1996, 1998) and Elliott and Müller (2006). (θ ·) under ASSUMPTION 5: The distribution of weights νm for the process m 1 the alternative is given by the distribution of Wm (·) − 0 Wm (s) ds, where Wm (·) is ˜ ff , for some constant c˜ ≥ 0. a k × 1 Wiener process with variance cV Derivation of the optimal test in this problem is facilitated by looking at a particular member of the class of data generating processes that satisfy Assumptions 1 and 2, for which the WAP maximizing test can be derived analytically. For this purpose, we use the Gaussian multivariate local level model, following Elliott and Müller (2006). The theory in Müller (2011) can then be invoked to show that the resulting test will be asymptotically efficient in a wider sense. Specifically, consider the model yt = μt + ut , for t = 1     T , where yt , μt , ut ∈ Rk . Assume ut ∼ iid N(0 Σ) for some positive definite matrix Σ, such that u ∼ N(0 IT ⊗ Σ), where u = (u1      uT ) ∈ RT k . The density of y = (y1      yT ) conditional on μ = (μ1      μT ) is given by  

1 −T k/2 −T/2  −1 |Σ| exp − (y − μ) IT ⊗ Σ (y − μ)  f (y|μ) = (2π) 2 We want to test the hypothesis H0 : μ = 0, against the alternative of persistent time variation specified below. H0 can be decomposed into H01 : μ¯ = 0, where

1818

L. M. MAGNUSSON AND S. MAVROEIDIS

T μ¯ = T −1 t=1 μt , and H02 : μt − μ¯ = 0 for all t, or, equivalently, H02 : μ˜ = 0, where μ˜ = (Be ⊗ Ik )μ, Be is a T × (T − 1) matrix such that Be e = 0, Be Be = IT −1 , Be Be = Me , Me = IT − e(e e)−1 e , and e is the T × 1 vector of ones. Conditional on μ, the ratio f (y|μ)/f (y|0) is given by  

1   −1 −1 ξ(μ) = exp y IT ⊗ Σ μ − μ IT ⊗ Σ μ  2  and the likelihood ratio is given by ξ(μ) dνμ , where dνμ is the density of μ. We specify independent weights over μ¯ and μ˜ under the alternative, with densi√ ¯ and T μ˜ ∼ N(0 [Be FF Be ⊗ cΣ]), ˜ ties given by T μ¯ ∼ N(0 cΣ) where F = [fij ] is a T × T lower triangular matrix of ones, that is, fij = 1 for all i ≤ j and 0 oth˜ and can be erwise. The resulting likelihood ratio statistic depends on c¯ and c, written as the product of the statistics   1 c¯ c¯ ¯ −k/2 exp T y¯  Σ−1 y¯ LRT = (1 + c) 2 1 + c¯ and (24)

 c˜ = LR T



1 − rc2T ˜ T (1 − rc2˜ )rcT˜ −1

−k/2

 k 1  exp v (Me − Gc˜ )vi  2 i=1 i 

T where y¯ = T1 t=1 yt , vi = (IT ⊗ ιki Σ−1/2 )y, i = 1     k, ιki is a k × 1 vector with 1 at position i, and 0 otherwise, and Gc = Hc−1 − Hc−1 e(e Hc−1 e)−1 e Hc−1 , with Hc = rc−1 FAc Ac F , Ac is a T × T matrix with ones in the main diagonal, −rc in its subdiagonal, and zeros otherwise, that is, its (i j)th element is aij = 1 if i = j, −rc if i = j + 1, and 0 otherwise, and rc = 12 (2 + c 2 T −2 − √  c˜ follows the T −1 4c 2 + c 4 T −2 ) = 1 − cT −1 + o(T −1 ). The derivation of LR T same calculations as in the proof of Elliott and Müller (2006, Lemma 1). Taking logs, multiplying by 2, and dropping the constants, the joint loglikelihood ratio statistic can be written as ˜ c¯ c T

LR

=

k  i=1

vi (Me − Gc˜ )vi +

c¯ ¯ T y¯  Σ−1 y 1 + c¯

The parameters c¯ and c˜ govern the weights given to deviations from H0 in the direction μ¯ = 0 and μ˜ = 0, respectively. LR0c T coincides with the usual Wald statistic for μ = 0, which is independent of c, while LRc0 T is a pure stability test, conditional on μ¯ = 0.12 12 In this case, apart from the sign, the main difference of LR0c T from the qLL statistic of Elliott and Müller (2006) is that the latter uses demeaned y.

IDENTIFICATION USING STABILITY RESTRICTIONS

1819

The resulting test of H0 : E[ft (θ0 )] = 0 against a persistent time-varying alternative is obtained by replacing yt by ft (θ0 ), and Σ by V ff (θ0 ). The resulting ¯ c, ˜ is given by statistic, indexed by the weights c (25)

qLL-STc˜ c¯ (θ0 ) =

k 

vˆ i (Me − Gc˜ )vˆ i +

i=1

c¯ ST (θ0 ) 1 + c¯

 vˆ i = f1 (θ0 ) V ff (θ0 )−1/2 ιki      fT (θ0 ) V ff (θ0 )−1/2 ιki  

The large sample properties of qLL-STc˜ c¯ are given by the following result. THEOREM 1: Under Assumption 2 and H0 : θ = θ0 , (26)

qLL-STc˜ c¯ (θ0 ) ⇒ ψc˜ +

c¯ ψ  1 + c¯ k

where ψk ∼ χ2 (k) independent of ψc , and ψc =

k  

cJi (1) + c 2

1

2

Ji (s)2 ds 0

i=1

 2 1 2c −c −cs + e Ji (1) + c e Ji (s) ds 1 − e−2c 0  2  1  − Ji (1) + c Ji (s) ds 0

Ji (s) is the ith element of the k-dimensional Ornstein–Uhlenbeck process J(s) = s W (s) − c 0 e−c(s−r) W (r) dr, and W is a k × 1 standard Wiener process. Moreover, a test that rejects for large values of the qLL-STc˜ c¯ (θ0 ) statistic is asymptotically efficient against persistent time variation of the moment conditions given by Assumptions 4 and 5. 3.2. One Break at an Unknown Date We focus on the leading case of a single break at an unknown date τ ∈ ς ⊂ (0 1), defining two regimes in m(θ s) = m1 (θ)1{s<τ} + m2 (θ)1{s≥τ} , for some τ ∈ ς ⊂ (0 1). The extension to multiple breaks is theoretically straightforward, but computationally demanding. This assumption implies m(θ) = (θ s) = (1{s<τ} − τ)(m1 (θ) − m2 (θ)), and δ(θ τ) = τm  τ 1 (θ) + (1 − τ)m2 (θ), m m(θ s) ds = τ(1 − τ)(m 1 (θ) − m2 (θ)). Weights are given by the following 0 assumption.

1820

L. M. MAGNUSSON AND S. MAVROEIDIS

(θ ·) under the ASSUMPTION 6: The distribution of weights νm for the process m alternative is described by νδ|τ × ντ , where δ(θ τ) = τ(1 − τ)(m1 (θ) − m2 (θ)), νδ|τ ˜ is N(0 cτ(1 − τ)Vff ) for some constant c˜ ≥ 0, and ντ is a distribution supported on ς ⊂ (0 1). Dropping the explicit dependence of δ on θ and τ for simplicity, we have H1 ˜ m) in (23) can be written in terms of δ and  X(τ) ∼ N(δ τ(1 − τ)VX ). Then, ξ( τ as     −1 1  ˜ξ(δ τ) = exp δ τ(1 − τ)VX −1 X(τ)  − δ τ(1 − τ)VX δ  2 We obtain WAP maximizing tests using Assumption 6 and the fact that Vff = VX : +∞ c  LR(δ τ) dνδ|τ dντ LR = ς

−∞

= (1 + c) c˜

  −1 1 c    X(τ) τ(1 − τ)VX X(τ) dντ  exp 2 (1 + c) ς 

−k/2



 × LR , where So, LRc˜ c¯ = LR (27)

c



−k/2

LR = (1 + c)

 1 c  −1 exp X VX X  2 (1 + c)

By the Neyman–Pearson lemma, a test that rejects for large values of LRc˜ c¯ is an optimal test for testing H0 in the limiting problem against the point alternative ¯ respectively, given by the probability measures νδ|τ , νm , indexed by c˜ and c, and ντ . It is conventional to use a uniform distribution for ντ ; see Andrews and Ploberger (1994). ˜ c¯ measure the importance of the full-sample versus the The parameters c, stability restrictions. The LR0c¯ test, which places zero weight on instability, is  equivalent to a test that rejects for large values of X VX−1 X. The finite-sample analogue of this statistic is the S statistic (10). Therefore, the S test is asymptotically efficient under Assumption 2 only when there is no instability under the alternative, in accordance with Proposition 1. ˜ By setting c˜ = c¯ = c, For c˜ > 0, the optimal test generally depends on c¯ and c. we put equal weights on the two alternatives, and the finite-sample analogue of the LRcc statistic can be written as   1 c cc −k ST (θ0  τ) dντ  exp LRT (θ0 ) = (1 + c) 2 (1 + c) ς

1821

IDENTIFICATION USING STABILITY RESTRICTIONS

where

ST (θ τ) =

1 T



 ×

FτT (θ) FT (θ) − FτT (θ) FτT (θ) FT (θ) − FτT (θ)



⎛ V 1 (θ τ)−1 ⎜ ⎝

ff

τ 0



⎞ 0

⎟ ⎠ V ff2 (θ τ)−1 1−τ

is the continuously updated version of the “partial-sample” GMM objective function of Andrews (1993). For the estimators V ff1 (θ τ), V ff2 (θ τ), we can use either respective partial-sample estimators, or a full-sample estimator V ff (θ); see Andrews (1993). ST (θ0  τ) is the split-S statistic in (20) that arises when we split the sample at date [τT ] and use the resulting 2k moment conditions [FτT (θ0 )  FT (θ0 ) − FτT (θ0 ) ] . The split-sample statistic ST (θ0  τ) can be decomposed orthogonally into the full-sample S statistic and the statistic  ST (θ0  τ) = ST (θ0  τ) − ST (θ0 ) sT (θ0 ) in (3), and therefore, has power only against that depends primarily on F instability. When c¯ > 0, the joint LR test can be based equivalently on the statistic   1 c˜  c¯ ST (θ0 ) (28) gen-STc˜ c¯ (θ0 ) = 2 log exp ST (θ0  τ) dντ + 2 1 + c˜ 1 + c¯ ς Setting c¯ = 0, we obtain tests of the stability restrictions. The large sample properties of gen-STc˜ c¯ (θ0 ) are given by the following result. THEOREM 2: Under Assumption 2 and H0 : θ = θ0 ,   1 c˜  c¯ ˜ c¯ c ψk (τ) dντ + ψ  gen-ST (θ0 ) ⇒ 2 log exp 2 1 + c˜ 1 + c¯ k ς  (τ) a standard k × 1 Brownian Bridge process, k (τ) = W (τ) W (τ) , with W where ψ τ(1−τ) 2  (τ). Moreover, a and ψk is a χ (k) distributed random variable independent of W ˜ c¯ c test that rejects H0 for large values of gen-ST (θ0 ) is asymptotically efficient against the alternative given by Assumptions 4 and 6. The statistics exp-S and ave-S defined by (12) and (13) in Section 2 are special cases of (28) with c˜ = c¯ = c, and c → ∞ and c → 0, respectively. Their

1822

L. M. MAGNUSSON AND S. MAVROEIDIS

interpretation is analogous to the corresponding statistics in Andrews and Ploberger (1994). 3.3. Tests Based on Estimating Break Dates We now examine the alternative procedures that are based on estimates of the break date. It is instructive to consider first the case of the linear IV model with time-varying first stage. 3.3.1. Finite-Sample Analysis for a Special Case The model is given by (6), with the distributional assumption that Vt =  [v1t  V2t ] ∼ N(0 Ω).13 This is a generalization of the canonical constantparameter linear IV model studied by Andrews, Moreira, and Stock (2006). Consider the assumption of a single break in Πt occurring at time [τT ] (the analysis can be generalized to multiple breaks). Define Z(τ) ∈ RT ×2k by  ] {Zt }[τT 0 t=1  Z(τ) = 0 {Zt }Tt=[τT ]+1 stack Yt  Vt into Y V ∈ RT ×(1+p) , respectively, and define Π ∈ R2k×p by Π =  [Π1  Π2 ] . Then, the model (6) can be written as (29)

Y = Z(τ)ΠA + V 

When Ω is known, the log-likelihood function is given by (up to a constant) (30)





 1  L(θ Π τ) = − tr Ω−1 Y − Z(τ)ΠA Y − Z(τ)ΠA 2  −1  = tr Ω AΠ  Z(τ) Y −

  1  1  −1 tr Ω AΠ  Z(τ) Z(τ)ΠA − tr Ω−1 Y  Y  2 2

Since {Zt } is non-random, the likelihood depends on the data only through the process Z(s) Y , s ∈ [0 1], which is the sufficient statistic. This 2k × (p + 1) process can be decomposed orthogonally into the processes

−1/2  −1/2 F(s) = Z(s) Z(s) Z(s) Y b0 b0 Ωb0   −1/2

−1/2 D(s) = Z(s) Z(s) Z(s) Y Ω−1 A0 A0 Ω−1 A0  where

b0 = 1 −θ0  13

 A0 = (θ0  Ip )

Normality is assumed for the purpose of discussing optimality.

IDENTIFICATION USING STABILITY RESTRICTIONS

1823

Also, define the following quantities:  −1/2 μΠτ (s) = Z(s) Z(s) Z(s) Z(τ)Π ∈ R2k×p 

−1/2 ∈ Rp  and cθ = (θ − θ0 ) b0 Ωb0

−1/2 dθ = A Ω−1 A0 A0 Ω−1 A0 ∈ Rp×p  Then, the following result arises as a straightforward extension of Andrews, Moreira, and Stock (2006, Lemma 2). LEMMA 1: For the model given by (29): 1. F(s) is a Gaussian process with mean μΠτ (s)cθ , and covariance kernel K(s1  s2 ) = [Z(s1 ) Z(s1 )]−1/2 Z(s1 ) Z(s2 )[Z(s2 ) Z(s2 )]−1/2 . 2. D(s) is a Gaussian process with mean μΠτ (s)dθ , and covariance kernel K(s1  s2 ), same as for F(s). 3. F(s1 ) and D(s2 ) are independent for all s1  s2 . We can now write the log-likelihood function (30) in terms of the statistics F(s) and D(s): 1 L(θ Π τ) = F(τ) μΠτ (τ)cθ − cθ μΠτ (τ) μΠτ (τ)cθ 2   1   + tr D(τ) μΠτ (τ)dθ − tr dθ μΠτ (τ) μΠτ (τ)dθ  2 Under H0 , cθ = 0, and (31)

  1   L(θ0  Π τ) = tr D(τ) μΠτ (τ)dθ0 − tr dθ 0 μΠτ (τ) μΠτ (τ)dθ0  2

In other words, the process D(·) is sufficient for Π and τ (or F(·) is specific ancillary for Π, τ under H0 ). Thus, the restricted maximum likelihood estimator (MLE) of Π τ given θ = θ0 can be obtained by minimizing (31) w.r.t. Π and τ. Concentrating (31) w.r.t. Π, we obtain 12 tr[D(τ) D(τ)], and therefore, the MLE for τ is (32)

  τˆ = arg max tr D(τ) D(τ)  τ

We can obtain similar tests of H0 : θ = θ0 either based on pivotal statistics, or based on non-pivotal statistics by conditioning. In the linear IV model with homoskedasticity, the S statistic coincides with the Anderson and Rubin (1949)

1824

L. M. MAGNUSSON AND S. MAVROEIDIS

statistic. The split-S, -KLM, and -JKLM statistics are given by split-S(s) = F(s) F(s)

 −1 split-KLM(s) = F(s) D(s) D(s) D(s) D(s) F(s)

and

split-JKLM(s) = split-S(s) − split-KLM(s) The JKLM statistic was introduced by Kleibergen (2005) in the standard GMM setting as a way to improve the power of the KLM test. It is included here for completeness. A test based on the JKLM statistic will also have power against violation of the model’s overidentifying restrictions, similarly to the S test. The split-CLR statistic, which is analytically available only in the case p = 1 (see Moreira (2003)), can be written as (33)

split-CLR(s) =

1 split-S(s) − rk(s) 2    2 + split-S(s) + rk(s) − 4split-JKLM(s)rk(s) 

where rk(s) = D(s) D(s). For p > 1, we will use the generalization of the CLR statistic derived by Kleibergen (2005), which is given by (33) with rk(s) being a statistic that tests that the rank of the matrix Π is p − 1 under H0 , and which is only a function of D(s). Since F(·) is orthogonal to D(·), and τˆ is only a function of D(·), we have the following result. THEOREM 3: Let τˆ = arg maxs D(s) D(s). Then, under H0 , 1. split-S(τ) ˆ is distributed as χ2 (2k). 2. split-KLM(τ) ˆ is distributed as χ2 (p). 3. split-JKLM(τ) ˆ = split-S(τ) ˆ − split-KLM(τ) ˆ is distributed as χ2 (2k − p). 4. split-KLM(τ) ˆ and split-JKLM(τ) ˆ are independent. 5. The distribution of split-CLR(τ) ˆ conditional on rk(τ) ˆ is the same as the disˆ + [ψp + ψ2k−p − rk(τ)] ˆ 2 + 4ψ2k−p rk(τ)] ˆ contribution of 12 [ψp + ψ2k−p − rk(τ) ditional on rk(τ), ˆ where ψp  ψ2k−p are independent random variables distributed as χ2 (p) and χ2 (2k − p), respectively. 3.3.2. Asymptotic Analysis for the General Case The previous analysis extends to any data generating process that satisfies Assumption 3. First, notice that under Assumption 3 and H0 : θ = θ0 , the entire partial-sample moments F·T (θ0 ) are asymptotically ancillary for τ, since their asymptotic distribution does not depend on it.

IDENTIFICATION USING STABILITY RESTRICTIONS

1825

For every break point s, define the two subsample moments FT1 (θ0  s) = FsT (θ0 ) and FT2 (θ0  s) = FT (θ0 ) − FsT (θ0 ), and their Jacobians QT1 (θ0  s) = QsT (θ0 ) and QT2 (θ0  s) = QT (θ0 ) − QsT (θ0 ), and let V i (θ0  s), i = 1 2, denote (possibly HAC) estimators of their k(p + 1) × k(p + 1) asymptotic variance V defined by (17). V i (θ0  s) could be based on the respective subsamples or it could be a common full-sample estimator, that is, V 1 (θ0  s) = V 2 (θ0  s) = V (θ0 ). Next, define the k × p matrices DiT (θ0  s), i = 1 2, by     vec DiT (θ0  s) = vec QTi (θ0  s) − V qfi (θ0  s)V ffi (θ0  s)−1 FTi (θ0  s) These are the subsample analogues of the full-sample matrix DT (θ0 ) defined in Kleibergen (2005, equation 16). Stack the elements of DiT (θ0  s) into the 2kp vector

 vec D1T (θ0  s) v

DT (θ0  s) = vec D2T (θ0  s) and consider the following estimator of its asymptotic variance:  1 sVqqf (θ0  s) 0 V D (θ0  s) =  2 (θ0  s) 0 (1 − s)V qqf where i (θ0  s) = V qqi (θ0  s) − V qfi (θ0  s)V ffi (θ0  s)−1 V qfi (θ0  s)  V qqf

i = 1 2

Consider the following estimator of τ: (34)

τ(θ ˆ 0 ) = arg max DT (θ0  s) V D (θ0  s)−1 DT (θ0  s) v

v

s∈ς

This is a generalization of the estimator given by (32) for the linear IV model. v Under Assumption 3 and H0 : θ = θ0 , DT (θ0  ·) is asymptotically independent ˆ 0 ). Therefore, we obtain the following result. of F·T (θ0 ), and hence, so is τ(θ THEOREM 4: When Assumption 3 and H0 : θ = θ0 hold, the limiting distributions of the split-S, -KLM, -JKLM, and -CLR statistics, defined in (20), evaluated at τˆ 0 ≡ τ(θ ˆ 0 ) in (34), are given by d

split-ST (θ0  τˆ 0 ) → ψp + ψ2k−p  d

split-KLM T (θ0  τˆ 0 ) → ψp  d

split-JKLM T (θ0  τˆ 0 ) → ψ2k−p 

1826

L. M. MAGNUSSON AND S. MAVROEIDIS

split-CLRT (θ0  τˆ 0 )|rkθ (θ0  τˆ 0 ) τˆ 0 d 1 → ψp + ψ2k−p − rkθ (θ0  τˆ 0 ) 2 

2  + ψp + ψ2k−p + rkθ (θ0  τˆ 0 ) − 4ψ2k−p rkθ (θ0  τˆ 0 )  where ψp and ψ2k−p are independently distributed χ2 (p) and χ2 (2k − p) random variables. 3.4. Strong Identification We now turn to the case of strong identification and local alternatives, which we characterize using the following high-level assumption. ASSUMPTION 7: (a) The true value of the parameter is θT = θ0 + B/T 1/2 , for some constant B ∈ p . (b) XT (·) ≡ T −1/2 [F·T (θ0 )] ⇒ X(·), and the probability measure of the random element X is given by the stochastic differential equation dX(s) = J(s)B ds + Vff1/2 dW (s), where J(·) is some nonstochastic k × p matrix function that is unis formly bounded, and there exist s ∈ [0 1] such that rank[ 0 J(r) dr] = p; Vff is some nonstochastic positive definite k × k matrix; and W (·) is a k-dimensional standard Brownian motion. (c) ft (θ0 ) is differentiable in some neighborhood of θ0 a.s. for t ≤ T , T ≥ 1, [sT ] −1 QsT (θ0 ) − qt (θ0 ) = ∂ft (θ0 )/∂θ , QsT (θ0 ) = t=1 qt (θ0 ), and sups∈[01] T s p J(r) dr → 0. 0 Assumption 7(a) is a standard specification of local alternatives; see Andrews, Moreira, and Stock (2006, Assumption SIV-LA(a)). Under the null B = 0, Assumption 7(b) is the same as Assumption 2. For B = 0, it can be established using Assumption 7(c) and appropriate smoothness conditions for mT (θT  t/T ) = EθT [ft (θ0 )] in some neighborhood  s of θ0 . Strong local identification is assumed through the condition rank[ 0 J(r) dr] = p. This covers both the case of strong identification via full-sample restric1 tions, rank[ 0 J(r) dr] = p, as well as identification via stability restrictions, s 1 rank[ 0 J(r) dr − s 0 J(r) dr] = p for some s ∈ (0 1). EXAMPLE IV—Continued: Since the moment function ft (θ) = Zt (y1t − Y2t θ) is linear, its Jacobian exists a.s. for all θ, qt = −Zt Y2t . As[sT ] p sumption 7 is satisfied if sups∈[01] T1 t=1 Zt Zt − sΣZZ → 0, where ΣZZ is nonsingular, and Πt = C(t/T ) for some constant function C(·) ∈ D[0 1]k×p , so that J(s) = ΣZZ C(s).

IDENTIFICATION USING STABILITY RESTRICTIONS

1827

Under Assumption 7, the Radon–Nikodym derivative of the measure of X under H1 w.r.t. to its measure under H0 is given by   1 1 1 (35) J(s) VX−1 dX(s) − B J(s) VX−1 J(s) dsB  ξ(B) = exp B 2 0 0 1 where VX = Vff . From (35), it follows that the p × 1 statistic X ∗ = 0 J(s) × VX−1 dX(s) is sufficient for B in the limiting problem, which is regular in the sense that it is a full exponential Gaussian location model. The 1 statistic X ∗ is Gaussian with mean 0 J(s) VX−1 J(s) dsB and variance VX ∗ = 1 J(s) VX−1 J(s) ds. Thus, WAP maximizing tests of hypotheses on B in the 0 limiting problem can be readily derived by standard methods as functions of X ∗ , say ϕ(X ∗ ). For example, the usual Wald criterion of equal power over “equally distant” alternatives, that is, ellipses defined by the variance of the sufficient statistic, yields a test that rejects H0 : B = 0 against H1 : B = 0 for large values of X ∗ VX−1∗ X ∗ ∼ χ2 (p ), where the second argument,  = 1 B 0 J(s) VX−1 J(s) dsB, is the noncentrality parameter. Feasible versions of those tests can be obtained if we can find a statistic XT∗ with the property that XT∗ ⇒ X ∗ , and a consistent estimator of its asymptotic variance V X ∗ . We discuss sufficient conditions complementing Assumption 7 under which this can be done when there is at most one large break (this can be generalized to any fixed number of breaks). ˆ 0 ) given by (34), T1 = [τˆ 0 T ], and T2 = T − T1 . ASSUMPTION 8: Let τˆ 0 ≡ τ(θ Suppose that (a) sups∈[01] V ffi (θ0  s) = Op (1), i = 1 2; (b) sups∈[01] V ffi (θ0  s)− p

Vff → 0, i = 1 2 and Assumption 7 holds with either (c) (no large breaks): J(s) = J for all s ∈ [0 1]; or (d) (one large break): (i) J(s) = J1 1{s≤τ} + J2 1{s>τ} , 0 < τ < 1, J1 = J2 ; (ii) τˆ 0 is a T -consistent estimator of τ, that is, for every η > 0, there exist C < ∞ s.t., for all large T , Pr(|T (τˆ 0 − τ)| > C) < η; (iii) T −1/2 supt≤T ft (θ0 ) = op (1); and (iv) T −1 supt≤T qt (θ0 ) = op (1). Assumptions 8(a) and (b) are standard in the structural change testing literature. Uniformity in s holds trivially for full-sample estimators of Vff , which are typical (this is what we use in our applications below). Assumption 8(c) is consistent with no breaks in the Jacobian, or with breaks that vanish asymptotically. Assumption 8(d) requires that the break point τ ∈ (0 1) be consistently estimated, sufficient conditions for which can be found, for example, in Bai and Perron (1998).14 14 Assumptions 8(d.iii) and (d.iv) have been used in Li and Müller (2009, Lemma 2 and Condition 1, resp.).

1828

L. M. MAGNUSSON AND S. MAVROEIDIS

EXAMPLE IV—Continued: Suppose p = 1, so that Y2t is a scalar, the firststage regression is given by (5) with Πt = Π1 1{t≤[τT ]} + Π2 1{t>[τT ]} , 0 < τ < 1, and τˆ 0 is given by (32). Sufficient conditions for Assumption 8(d.ii) can be found in Bai and Perron (1998, Section 2, Assumptions A1–A5). THEOREM 5: Suppose that Assumption 8 holds, and let XT∗ =

2  DiT (θ0  τˆ 0 ) i F i (θ0  τˆ 0 ) Vff (θ0  τˆ 0 )−1 T √ Ti T i=1

and V X ∗ = τˆ 0

D1T (θ0  τˆ 0 ) V ff1 (θ0  τˆ 0 )−1 D1T (θ0  τˆ 0 )

+ (1 − τˆ 0 )

T12 D2T (θ0  τˆ 0 ) V ff2 (θ0  τˆ 0 )−1 D2T (θ0  τˆ 0 ) T22



p d where T1 = [τˆ 0 T ] and T2 = T − T1 . Then, XT∗ → X ∗ and V X ∗ → VX ∗ .

COMMENTS: 1. It follows that the split-KLM test that rejects H0 : θ = θ0 against a twosided alternative for large values of the split-KLM T (θ0  τˆ 0 ) statistic, defined in (20), is efficient. The split-CLR test described in Section 2 is asymptotically equivalent to the split-KLM test (because the rk statistic diverges), so it is also efficient. 2. The asymptotic distribution of split-KLM T (θ0  τˆ 0 ) is noncentral χ2 (p ), with noncentrality parameter  = B [τJ1 Vff−1 J1 + (1 − τ)J2 Vff−1 J2 ]B. When breaks are small, J1 = J2 = J, this  is identical to the noncentrality of standard full-sample GMM tests,  = B J  Vff−1 JB, but not otherwise. Hence, the splitsample tests weakly dominate their full-sample GMM counterparts, since they have the same power when there is no information in the stability restrictions but strictly higher power when there is. 3. Comment 2 generalizes to any fixed number of sample splits, with appropriate modification of Assumption 8(d). However, asymptotics may become unreliable if the number of breaks is large relative to the sample size. 4. Split-KLM/CLR tests are asymptotically equivalent to their full-sample counterparts when there is strong identification and no large breaks. However, the split-S test is not asymptotically equivalent to its full-sample counterpart: under H0 it has a χ2 (2k) rather than χ2 (k) distribution. 5. Generalized S tests are not asymptotically equivalent to split-KLM and split-CLR tests (this follows trivially from their different asymptotic distributions under H0 ), hence they are not asymptotically efficient under Assumption 7.

IDENTIFICATION USING STABILITY RESTRICTIONS

1829

6. Efficient tests can also be obtained if we replace DiT (θ0  τˆ 0 ) in Theorem 5 with another consistent estimator of the Jacobian of the moment conditions, T −1 ∂FTi (θ0  τˆ 0 )/∂θ . However, these tests are not robust to weak identification. 7. Other WAP maximizing tests can be obtained as functionals of XT∗ . For example, in the case of p = 1, efficient tests of H0 : θ = θ0 against the one-sided alternative H1 : θ > θ0 can be based on a signed Lagrange Multiplier (LM) statistic, such as XT∗ / V X ∗ . 3.5. Concentrating Out Strongly Identified Parameters We consider now the situation when the model contains an additional set of parameters that are known (or assumed) a priori to be strongly identified. In the linear IV example, this corresponds to coefficients on included exogenous regressors, or coefficients on endogenous regressors for which researchers are confident that the instruments are strong. Let the strongly identified parameters be denoted by ζ of dimension pζ × 1. With slight abuse of notation, assume that the moment conditions are given by E[ft (θ ζ)] = 0 for all t ≤ T , where ft (θ ζ) is a k-dimensional function of data and parameters, with k ≥ pζ , and define the restricted estimator of ζ ζˆ 0 = arg min ζ

T  t=1

ft (θ0  ζ) WT

T 

ft (θ0  ζ)

t=1

where WT is an efficient weight matrix, as in two-step or continuously updated T p GMM, that is, WT → Vff−1 , where Vff = limT →∞ var[ √1T t=1 ft (θ ζ)]. We make the following high-level assumption. ASSUMPTION 9: Under H0 : θ = θ0 , the following conditions hold: (i) the [sT ] process XT (s) = T −1/2 t=1 ft (θ0  ζ0 ) satisfies XT (·) ⇒ Vff1/2 W (·), where W is a standard k × 1 Wiener process, Vff is a positive definite k × k matrix;  p

T (s) = T −1/2 [sT ] ft (θ0  ζˆ 0 ) satisfies (ii) WT → Vff−1 ; and (iii) the process X t=1

T (·) − E[XT (·)] ⇒ V 1/2 [W (·) − sPW (1)], where P = V −1/2 Γ (Γ  V −1 Γ )−1 × X ff ff ff Γ  Vff−1/2 for some full rank k × pζ matrix Γ . Parts (i) and (ii) of Assumption 9 correspond to Assumption 2 above. Part (iii) is a special case of Sowell (1996, Theorem 1) and Li and Müller (2009, Theorem 1(iii)), where sufficient conditions for it can be found, for example p ζˆ 0 → ζ0 , ft (θ0  ζ) is differentiable w.r.t. ζ in some neighborhood of ζ0 a.s. for [sT ] p all t ≤ T and T > 0, and sups∈[01] T −1 t=1 ∂ft (θ0  ζ0 )/∂ζ  − sΓ → 0. These conditions are standard in the stability literature; see Andrews (1993), Sowell (1996), and Elliott and Müller (2006). Note that Assumption 9 allows for instability in the strongly identified parameters, provided the instability is T 1/2 -estimable. For example, if there are

1830

L. M. MAGNUSSON AND S. MAVROEIDIS

breaks in the strongly identified parameters, and the break dates are known or the breaks are large enough so that the break dates are consistently estimable by the methods of Bai and Perron (1998), then Assumption 9 applies with ζ denoting all the parameters corresponding to the different regimes. Assumption 9 precludes moderate instabilities in ζ, that is, instabilities that cannot be detected with probability 1. These can be dealt with by appropriate modification of the stability part of the generalized S statistics, following the approach of Li and Müller (2009, Section 2.3), though the resulting tests need not have any asymptotic optimality properties. Replace ft (θ0 ) by ft (θ0  ζˆ 0 ) in the definition of the generalized S statistics described in (9) above, and denote the resulting statistics generically by STc˜ (θ0  ζˆ 0 ) + gen-STc˜ c¯ (θ0  ζˆ 0 ) = gen-

c¯ ST (θ0  ζˆ 0 ) 1 + c¯

The following result shows that the asymptotic distribution of the resulting qLL/exp/ave-S tests evaluated at ζˆ 0 is the same as in the case when there are no estimated nuisance parameters, given in Theorems 1 and 2, except for a degree of freedom correction in the limiting χ2 distribution of the full-sample S statistic. THEOREM 6: When Assumption 9 and H0 : θ = θ0 hold, ST (θ0  ζˆ 0 ) and gend  STc˜ (θ0  ζˆ 0 ) are asymptotically independent, ST (θ0  ζˆ 0 ) → χ2 (k − pζ ), and the asymptotic distribution of gen- STc˜ (θ0  ζˆ 0 ) is the same as in Theorems 1 and 2. Split-sample GMM test statistics can be defined accordingly by replacing ft (θ0 ) by ft (θ0  ζˆ 0 ) in the formulae (20). A suitable extension of Assumption 3 enables us to obtain their limiting distribution. [sT ] ASSUMPTION 10: Let FsT (θ ζ) = t=1 ft (θ ζ), qt (θ ζ) = ∂ft (θ ζ)/∂θ , FsT (θ0 ζ0 ) [sT ]

TQ (s) = T −1/2 × QsT (θ ζ) = t=1 qt (θ ζ), XTQ (s) = T −1/2 vec[Q , and X sT (θ0 ζ0 )] FsT (θ0 ζˆ 0 ) . We assume that: (i) there exists a positive definite k(p + 1)vec[QsT (θ0 ζˆ 0 )] dimensional square matrix 

Vff Vf q  V = lim var XTQ (1) = T →∞ Vqf Vqq

WT → Vff−1 and there exist consistent estimators V f q and V qq of Vf q and Vqq ; and  

TQ (s) − E XTQ (s) (ii) X   Vff1/2 0 Wf (s) − sPWf (s) ⇒  1/2 Wq (s) Vqf Vff−1/2 Vqqf p

IDENTIFICATION USING STABILITY RESTRICTIONS

1831

where Wf and Wq are independent standard k × 1 and kp × 1 Wiener processes, Vqqf = Vqq − Vqf Vff−1 Vf q , and P = Vff−1/2 Γ (Γ  Vff−1 Γ )−1 Γ  Vff−1/2 for some full rank k × pζ matrix Γ . The part of Assumption 10 that refers to FsT (θ0  ζˆ 0 ) is identical to Assumption 9. The part referring to the Jacobian QsT (θ0  ζˆ 0 ) has no counterpart in the stability literature. Relative to Assumption 3, which restricts the behavior of the Jacobian at ζ0 , this assumption can be verified with the additional rep quirement sups∈[01] QsT (θ0  ζˆ 0 ) − QsT (θ0  ζ0 ) → 0. This holds trivially when QsT (θ ζ) does not depend on ζ (e.g., when ζ is an intercept). THEOREM 7: When Assumption 10 and H0 : θ = θ0 hold, the limiting distributions of the split-S, -KLM, -JKLM, and -CLR statistics evaluated at ζˆ 0 are given by d split-ST (θ0  ζˆ 0  τˆ 0 ) → ψp + ψ2k−p−pζ  d split-KLM T (θ0  ζˆ 0  τˆ 0 ) → ψp 

split-JKLM T (θ0  ζˆ 0  τˆ 0 ) → ψ2k−p−pζ  d

θ  τˆ 0 split-CLRT (θ0  ζˆ 0  τˆ 0 )|rk d



1

θ ψp + ψ2k−p−pζ − rk 2  

θ )2 − 4ψ2k−p−p rk

θ  + (ψp + ψ2k−p−pζ + rk ζ

θ ≡ rkθ (θ0  ζˆ 0  τˆ 0 ), ψp and ψ2k−p−p are independently ˆ 0 ) and rk where τˆ 0 ≡ τ(θ ζ 2 distributed χ (p) and χ2 (2k − p − pζ ) random variables. 4. NUMERICAL RESULTS 4.1. Asymptotic Power Comparisons We compare the methods derived in the previous section in terms of asymptotic power. We first compare the power of the generalized S tests to the power envelope in the case of persistent time variation (PTV) and single-break alternatives. Next, we compare the power of generalized S tests to the splitsample tests in the model given in Example IV above. The numerical analysis in Andrews, Moreira, and Stock (2006) provides useful benchmarks, for example, the power of the “oracle” test when break dates are known. Without loss of generality, we normalize Vff to the identity, since the tests are invariant to rotations of the moment conditions. In all experiments, asymptotic results are

1832

L. M. MAGNUSSON AND S. MAVROEIDIS

approximated using a sample of 2,000 observations, and the number of Monte Carlo simulations is 20,000. 4.1.1. Power Under Different Alternatives We compute asymptotic power curves for (generalized) S tests of the null m(θ r) = 0 for all r ∈ [0 1] against m(θ r) = 0 for some r ∈ [0 1], where the distance of the alternative from the null is measured by the scalar parameter (θ s) + m(θ), and consider the two cases for c ≥ 0. We decompose m(θ s) = m (θ s) given by Assumptions 5 and 6, with c˜ = ωc, ω ∈ [0 1] and τ drawn from m a uniform on [015 085] where needed, and m(θ) given by Assumption 4 with c¯ = (1 − ω)c. The parameter ω allows us to vary the weight on the violation of the stability versus the full-sample restrictions. We use significance level of 5% and we set k = 1. In √ the canonical linear IV example with ω = 0, this setup corresponds to m(θ) = λ(θ √ − θ0 ), where λ is the concentration√parameter (see below), the distance of λ(θ − θ0 ) from zero is measured by c, and the power function is symmetric about θ0 . The left- and right-hand columns of Figure 1 plot the power curves of the S, ave-S, exp-S, and qLL-S tests for the cases of a single break at unknown point and of the persistent time variation. We also plot the power envelope constructed using a sequence of point-optimal tests for each alternative. As expected, when the information is coming only from the stability restriction (ω = 1), the exp-S and the qLL-S power curves are the closest to the power envelope in the single break and in the PTV cases, respectively. The power of the ave-S test is lower, while the S test has trivial power. It is interesting to note that the qLL-S has very similar power to the exp-S in the case of a single break, and it dominates the latter in case of PTV. In the case when the information is split evenly across the two statistics (ω = 1/2), all the generalized S tests have power very close to the envelope, as expected, since they are designed with ˜ In the other polar case when there is no instability (ω = 0), the ave-S c¯ = c. power dominates the exp-S and qLL-S, and the S test is optimal. However, even in the case ω = 0, the loss of power of the generalized S tests relatively to the power envelope is noticeably small. In fact, the power of the generalized S tests is relatively insensitive to the source of information. Analogous results (not reported here) are obtained when k is larger. 4.1.2. Generalized S versus Split-Sample Tests We compare the power of the generalized S tests to the split-CLR test in the linear IV regression model with time-varying first stage. The data generating process is given by (6) above, with a single endogenous regressor, p = 1, and Ω = ρ1 ρ1 . It is well known that in the constant-parameter IV regression model, the amount of information in the data about the structural parameters (or the quality of instruments) can be characterized using a unitless measure known

IDENTIFICATION USING STABILITY RESTRICTIONS

1833

FIGURE 1.—Asymptotic power curves of point-optimal and generalized S tests, under single-break and persistent time variation (PTV) alternatives. c describes the distance of the null from the alternative hypothesis, and ω ∈ [0 1] gives the relative contribution of stability restrictions. The number of moment conditions is k = 1. The lines are point-optimal (thin solid), S (thick solid), qLL-S (dash-dotted), exp-S (dashed), ave-S (dotted).

T as the “concentration parameter,” which is t=1 Π  Zt Zt Π. We can think of the contribution of each observation to the identification of θ as being equal to Π  Zt Zt Π; however, when Πt is time-varying, the incremental informa-

1834

L. M. MAGNUSSON AND S. MAVROEIDIS

T tion is Πt Zt Zt Πt , and so the total amount of information is t=1 Πt Zt Zt Πt . Under weak-instrument asymptotics, we have Π[sT ] = T −1/2 C(s), where C(·) is a non-stochastic cadlag function, and in conjunction with the uniform [sT ] p limit T −1 t=1 Zt Zt → sΣZZ , the asymptotic concentration parameter is λ = 1 C(s) ΣZZ C(s) ds. 0 We consider the case of a single break at some point τ, C(s) = 1{s<τ} C1 + 1{s≥τ} C2 . Because all of the statistics we consider are invariant to the class of transformation Zt → Zt G for any nonsingular matrix G, we can normalize ΣZZ = Ik and all but the first entries of C1  C2 to zero, without loss of gen2 2 τ + C21 (1 − τ) = λF + λS , where λF = [C11 τ + C21 (1 − τ)]2 erality. Then, λ = C11 2 and λS = (C11 − C21 ) τ(1 − τ) measure the information in the full-sample and the stability restrictions, respectively. In all the experiments, we fix λ = 5, to match the results of Andrews, Moreira, and Stock (2006), and we set λS = ωλ, where ω ∈ [0 1] measures the relative contribution of the stability restrictions, as in the previous subsection. We compare the power of the generalized S tests with the split-sample tests based on estimating the break date. Since the distribution of the CLR statistic under the alternative depends on ρ, so that its power function depends on ρ, we report results for two cases ρ = 020 and 095. Figure 2 compares the power curves of the qLL/exp/ave-S tests with the split-CLR and the oracleCLR tests in the two polar cases ω = 1 and ω = 0. The number of instruments k is equal to 2 and 5. The oracle-CLR test is computed assuming that the break date is known. We see that no test dominates the others in terms of power. Unreported results, available upon request, show that this is also true for the split-S and KLM statistics. Interestingly, we notice that the ave-S and exp-S dominate the remaining gen-S tests when ω = 0 and ω = 1, respectively. 4.1.3. Power Loss When There Is No Instability One may worry about the possible loss of power of the proposed tests relative to standard GMM tests when the underlying data generating process is stable, since the stability restrictions will not be informative in this case. This issue is entirely analogous to the use of irrelevant instruments in standard GMM. To gain some insight into this problem, we compare the power loss of each of the proposed tests relative to the optimal test in the case of no instability. We use the linear IV model described in the previous subsection, with concentration parameter λF = 5 and no instability. The number of instruments is 2, and ρ = 095 (results for other settings are similar). For each of the proposed tests, we calculate the average and maximum difference of the power curves at the 5% level of significance from the power of the optimal benchmark. For the generalized S tests, the relevant benchmark is the full-sample S (Anderson– Rubin) test with the same number of instruments, which makes no assumption about the first-stage regression. For the split-CLR test, the relevant benchmark is the full-sample CLR. (The difference between the power of the full-sample S

IDENTIFICATION USING STABILITY RESTRICTIONS

1835

FIGURE 2.—Power function of tests of the null hypothesis H0 : θ = θ0 in linear IV model using 5% significance level with break at τ = 05; concentration parameter is 5 and ω ∈ [0 1] measures relative contribution of stability restrictions. The curves correspond to: oracle-CLR (thin solid line); split-CLR (thin dashed line); exp-S (dashed line); ave-S (dotted line); and qLL-S (dashdotted line).

1836

L. M. MAGNUSSON AND S. MAVROEIDIS TABLE II

POWER LOSS FROM USING STABILITY RESTRICTIONS WHEN THEY ARE IRRELEVANTa S, k = 2

ave-S exp-S qLL-S S, k = 3 S, k = 4 S, k = 6

Average

Max.

0.021 0.046 0.084 0.025 0.048 0.078

0.054 0.114 0.204 0.065 0.120 0.191

CLR, k = 2

S, k = 2 split-CLR CLR, k = 4 CLR, k = 5 CLR, k = 6

Average

Max.

0.025 0.032 0.022 0.026 0.032

0.101 0.199 0.125 0.168 0.194

a Constant-parameter linear IV regression with one structural parameter and correlation of reduced-form errors ρ = 095. Except where otherwise stated, statistics are computed with k = 2 instruments. For the first three columns, the benchmark is the S (Anderson–Rubin) test with k = 2; for the last three columns, it is the CLR with k = 2.

and CLR tests gives the loss of power of the S tests from ignoring information about the functional form.) We also report the corresponding results for the S and CLR test with additional irrelevant instruments. The results are given in Table II. We notice that the use of the ave-, exp-, and qLL-S tests entails the same loss of power as including one, two, and four extra instruments, respectively. The loss of power of the split-CLR is comparable to using four extra instruments. This is larger than the power loss of the “oracle”-CLR, which in this case would split the sample at some arbitrary point, and whose power loss is the same as the CLR with 2k instruments. In other words, there is an extra cost to estimating the break date when there is no break and identification is weak. However, as the strength of identification, λ, increases, the power loss of the split-CLR goes to zero in accordance to Comment 2 on Theorem 5 (results not reported for brevity). 4.2. Size in Finite Samples We study the finite-sample rejection frequencies of the proposed tests using a simulation experiment based on the structural model given by (7) in Example RS in Section 2.2 above. This is a prototypical example of a forward-looking model that is commonly used in macroeconomics and finance. We calibrate our simulations to a leading macroeconomic application, the New Keynesian Phillips curve, where yt denotes inflation, and xt is the labor share; see Galí and Gertler (1999). We assume that xt follows xt = ρ1 xt−1 + ρ2 xt−2 + vt . We assume the shocks vt and εt are jointly Normal with zero mean, variances σε2 , σv2 , and covariance σεv . In this simple version of the New Keynesian model, the parameter β is a , where α represents the price rigidity in discount factor, while γ = (1−α)(1−βα) α the economy. The parameters are set to β = 1 and α = 23 (γ = 16 ), while the remaining nuisance parameters are calibrated to quarterly post-1960 U.S. data. We find ρ1 = 09, ρ2 = 0063, σε2 = 03, σv2 = 0011, and σεv = −0012. Several

IDENTIFICATION USING STABILITY RESTRICTIONS

1837

authors have argued that there was a structural change in the U.S. economy around 1984. Estimating the reduced-form parameters over the two subsamples, we find that the first-order autocorrelation ρ = ρ1 /(1 + ρ2 ) is constant, but ρ2 goes from −009 to 021, with a standard error of 0.15. We therefore set ρ2t = 0063 + 0075 × κ(−1)1{t<1984q1} and ρ1t = ρ(1 + ρ2t ), with ρ = 09. The parameter κ is used to vary the magnitude of the change in the coefficients in terms of standard errors from zero, with κ = 2 corresponding to the subsample estimates of ρ2 . There is also evidence of a break in the variance of the shocks over that period (see, e.g., McConnell and Perez-Quiros (2000)), a phenomenon known as the “great moderation.” Indeed, we find that σε2 falls significantly after 1984, although σv2 and σεv remain constant over the two periods. Specifically, σε2 drops from 0.5 before 1984q1 to 0.1 thereafter, with a standard error of 0.068. It is important to check the implications of a change in the variance, since permanent changes in the variance are not covered by Assumptions 2 and 3. Thus, large changes in the variance may lead to size distortion of our tests in finite sam2 = 03 + 0034 × φ(−1)1{t<1984q1} , where the ples. To examine this issue, we set σεt scalar φ is used to vary the magnitude of the change in the variance in terms of standard errors from zero, with φ = 6 corresponding to the subsamples estimates.15 Table III reports the null rejection frequencies of the following tests: S (10), qLL-S (11), exp-S (12), ave-S (13), and split-S/KLM/JKLM/CLR (20), of H0 : γ = 16 in the model (7) for a sample of T = 180, computed using 20,000 Monte Carlo replications. The instruments used are xt−1 and xt−2 , and the variance estimator used is Newey and West (1987) with prewhitening. We consider six cases: for the magnitude of the parameter break, we consider κ = 0 2, and 4, and for the magnitude of the change in the variance, we consider φ = 0 and 6. The rejection frequencies for all of the proposed tests are close to their nominal level. Some tests appear to be undersized, the most severe being the split-CLR test. Only the exp-S test appears somewhat oversized when there are large changes both in the variance and in the coefficients (κ = 4, φ = 6), but the size distortion is modest: 12.26% at the 10% nominal level, and 7.5% at the nominal 5% level. The important message is that the size is almost unaffected by the changes in the coefficients or in the variance. Figure 3 reports further evidence on the size of the tests as the magnitude of the break in the coefficients ρ1 and ρ2 varies. The figure plots the rejection frequencies of 5% nominal tests as a function of the magnitude of the break in the coefficients in standard error units κ. In this calculation, the variance 15 Note that the estimates of σε2 from the data are conditional on the assumed values of the structural parameters β, α, since σε2 is not independently identified.

1838

L. M. MAGNUSSON AND S. MAVROEIDIS TABLE III FINITE-SAMPLE NULL REJECTION FREQUENCIES OF TESTSa κ=0 φ=0

Nominal Level:

10%

S

5%

κ=2 φ=6

φ=0

7.81 3.84

768

393 8.46 4.20

qLL-S ave-S exp-S

7.30 3.23 6.39 3.16 6.20 3.08

970 781 963

498 7.90 3.64 1009 5.45 408 6.96 3.48 851 4.49 539 6.75 3.44 1007 5.56

split-S split-KLM split-JKLM split-CLR

7.97 6.93 8.79 6.73

8.20 8.32 8.14 8.13

10%

φ=0

5%

557 36 607 364

5%

φ=6

10%

3.93 991 3.29 713 4.35 109 3.15 705

10%

κ=4

5%

845 4.23

10%

φ=6

5%

959 4.93

10%

5%

837 4.00

912 4.39 1173 6.74 813 4.20 938 5.39 807 4.08 1229 7.52

4.20 1014 5.60 901 4.25 896 4.78 1072 3.95 1006 5.58 813 4.09 877 4.68 1037

4.81 1146 6.85 5.79 914 5.02 3.82 1151 6.69 5.54 938 4.93

a The null hypothesis is H : γ = 1 in the NKPC model y = βE(y t 0 t+1 |It ) + γxt + εt , where xt = ρ1t xt−1 + 6 ρ2t xt−2 + vt , the sample size is T = 180, and the variance of the structural error εt and the reduced-form coefficients ρ1t and ρ2t change by φ and κ s.e.’s, respectively, in the middle of the sample. Computed using 20,000 Monte Carlo replications.

is kept fixed (φ = 0) and we consider changes of up to 6 standard errors in the coefficients. The results indicate that the size is affected very little by the magnitude of the instability, never exceeding 7% at the 5% nominal level. In Figure 4, we report the corresponding rejection frequencies as functions of changes in the variance σε2 , in φ standard error units. The left panel shows

FIGURE 3.—Rejection probability of 5% significance level tests of H0 : γ = 16 in the NKPC model yt = βE(yt+1 |It ) + γxt + εt , where xt = ρ1t xt−1 + ρ2t xt−2 + vt , the sample size is T = 180, and the reduced-form coefficients ρ1t and ρ2t change by κ s.e.’s in the middle of the sample. Computed using 20,000 Monte Carlo replications.

IDENTIFICATION USING STABILITY RESTRICTIONS

1839

FIGURE 4.—Rejection probability of 5% significance level tests of H0 : γ = 16 in the NKPC model yt = βE(yt+1 |It ) + γxt + εt , where xt = ρ1t xt−1 + ρ2t xt−2 + vt , the sample size is T = 180, and the variance of the structural error εt and the reduced-form coefficients ρ1t and ρ2t change by φ and κ s.e.’s, respectively, in the middle of the sample. The left panel is for the case κ = 0 and the right panel for the case κ = 2. Computed using 20,000 Monte Carlo replications.

the case when only the variance of the shock changes, that is, the coefficients are constant, and the right panel reports results when the coefficients also change by two standard errors (as in the data). Even though the size of the tests increases with φ, the increase is modest, and becomes noticeable only when the break is more than 6 standard errors from zero. Even in those cases, the exp-S test is the only test that exhibits noticeable over-rejection, while the qLL-S and ave-S tests appear little affected. These results are not particularly surprising, in view of the evidence reported by Hansen (2000), who studied this issue in a related context. 5. EMPIRICAL APPLICATION The new Keynesian Phillips curve is a forward-looking model of inflation dynamics that plays a central role in modern macroeconomic policy analysis. We consider the version of the model derived from Calvo (1983) pricing with one-quarter indexation, studied by Sbordone (2005). The model is given by:16 (36)

 t + εt  πt − $πt−1 = βE(πt+1 − $πt |It ) + λmc

where πt is inflation, mct is real marginal costs and we use the convention that hatted variables denote log-deviations from steady state, the information set It includes variables known at time t, εt is an unobserved disturbance, β is a 16

See Galí (2008, Chapter 3) for a detailed derivation.

1840

L. M. MAGNUSSON AND S. MAVROEIDIS

discount factor, $ is the fraction of prices that are indexed to past inflation υ, α is the probability that when they cannot be optimally reset, λ = (1−α)(1−βα) α , μ is the desired mark-up of a price will be fixed in a given period, υ = a(μ−1) μ−a prices over marginal costs under flexible prices, and a is the labor elasticity of output in the Cobb–Douglas production function. Specification (36) is obtained by log-linearizing firms’ optimizing conditions around a zero-inflation steady state, also known as trend inflation. Below, we analyze a more general specification, derived by Cogley and Sbordone (2008), that allows for nonzero time-varying trend inflation. In accordance with the literature (see Kleibergen and Mavroeidis (2009)), we impose the restriction β = 1.17 We measure real marginal costs using the labor share, following Galí and Gertler (1999) and Sbordone (2002, 2005) and inflation by the GDP deflator.18 Hence, our baseline estimable specification of the model is (37)

$πt = E(πt+1 |It ) +

(1 − α)2 υxt + κ + εt  α

where xt is the log of the labor share and κ is an unrestricted constant that captures the steady-state value of the real marginal costs. The moment conditions are given by E(Zt ut ) = 0, where (38)

ut = $πt − πt+1 −

(1 − α)2 υxt − κ α

and Zt is a vector of instruments that consists of a constant, two lags of πt , and three lags of the labor share, following Kleibergen and Mavroeidis (2009). The parameter υ is unidentifiable and it is calibrated to 0.25.19 The sample period is 1966q1–2010q4. 5.1. Results for the Baseline Specification Confidence sets at the 90% and 95% level for the deep structural parameters (α $) in (37) are plotted in Figure 5. We report results based on inverting the full-sample S and CLR tests, as well as qLL-S, exp-S, and qLL- S. Results for the other tests are similar and they are reported in the Supplemental Material. The following conclusions can be drawn. First, the 95%-level confidence intervals for the coefficient $ based on the full-sample S and CLR tests cover the entire parameter space, so this parameter is completely unidentified by information in the full-sample restrictions. 17

Results are qualitatively very similar for other values of β in the range 0.9 to 1.05. A detailed description of the data is given in the Supplemental Material. 19 This value corresponds to a = 2/3 and μ = 12. This is the midpoint of a range of plausible estimates from the literature (1–1.4), for example, Rotemberg and Woodford (1993) and Basu and Fernald (1997). 18

IDENTIFICATION USING STABILITY RESTRICTIONS

1841

FIGURE 5.—90% and 95% confidence sets for α (Calvo parameter) and $ (indexation param2 eter) in the NKPC model: $πt = Et (πt+1 ) + (1−α) x˜ t + κ + εt , x˜ t is 0.25 times the log of the α labor share. Instruments: constant, two lags of πt , and three lags of x˜ t . Newey and West (1987) HAC with prewhitening. Period: 1966q1–2010q4.

1842

L. M. MAGNUSSON AND S. MAVROEIDIS

This conclusion remains robust to changes in the sample and specification; see below. Second, the qLL-S and exp-S sets are a fraction of the full-sample S set, and similarly for the split-CLR versus the full-sample CLR set. Using our proposed tests, we find evidence against full indexation $ = 1 that was used in Christiano, Eichenbaum, and Evans (2005), while we cannot reject the pure forward-looking specification in Sbordone (2002). The generalized S sets are also slightly more informative about the price rigidity parameter α, though the difference is not as dramatic as in the case of $. The average duration of prices, 1 computed from 1−α , is unbounded, which is consistent with the finding in the literature; see Kleibergen and Mavroeidis (2009). It is interesting to look at the qLL- S set, which only uses information in the stability restrictions. This set is also considerably smaller than the confidence sets based on the full-sample statistics. This lends further support to the view that stability restrictions are an important source of identification in this application. To shed some further light on the informational content of the stability restrictions, we look at the estimate of the break date for the split sample tests, and the contribution of the stability restrictions through the first-stage regression. The break date is re-estimated for every value under the null hypothesis, so there are as many estimates as there are points in the grid, but all of them cluster at 1975q1. The additional split-sample instruments are generated by interacting a dummy variable 1{t≥1975q1} with the original instruments. The instruments seem pretty strong for the labor share (the robust first-stage F statistic is over 100 with or without split-sample instruments), so the contribution of the stability restrictions seems to come mainly through changes in the reduced-form equation for inflation, where the first-stage F statistic goes from 6.3 without the split-sample instruments to 10.2 after those instruments are included.20 5.2. Results for Post-1983 Sample Two concerns about the validity of the previous results are: (i) a break in the variance of the moment vector, arising from a possible break in the variance of the structural shock in the early 1980s (the Great Moderation); and (ii) possible instability in the coefficients of the NKPC before 1984; see Kleibergen and Mavroeidis (2009). To address those, we recompute the previous confidence sets over the later sample period 1984q1–2010q4. For brevity, we report only the S, qLL-S, CLR, and split-CLR confidence sets in Figure 6 (the remaining ones can be found in the Supplemental Material). The difference between the full-sample and the qLL-S and split-CLR sets is actually even larger than before. This shows that the previous findings remain robust to the aforementioned concerns. 20 The robust first-stage F statistic is computed with the same HAC estimator as the one used in the computation of the confidence sets.

IDENTIFICATION USING STABILITY RESTRICTIONS

1843

FIGURE 6.—90% and 95% confidence sets for α (Calvo parameter) and $ (indexation param2 eter) in the NKPC model: $πt = Et (πt+1 ) + (1−α) x˜ t + κ + εt , x˜ t is 0.25 times the log of the α labor share. Instruments: constant, two lags of πt , and three lags of x˜ t . Newey and West (1987) HAC with prewhitening. Period: 1984q1–2010q4.

5.3. Time-Varying Trend Inflation The baseline specification of the NKPC given by (36) is derived by loglinearizing firms’ optimizing conditions around a zero-inflation steady state. The steady state of inflation corresponds to expected long-run inflation when the latter is constant. More generally, expected long-run inflation is called trend inflation, and we denote it by π¯ t to allow for the possibility that it may change over time. Trend inflation may be time-varying due to a time-varying inflation target (see Cogley and Sbordone (2008)), and does not contradict the hypothesis that the deep structural parameters underlying the NKPC are stable over time, that is, that the underlying sticky-price model is potentially immune to the Lucas critique. However, the presence of a nonzero and time-varying inflation target in general alters the specification of the NKPC. As shown in Cogley and Sbordone (2008), (36) remains valid only in the special case when

1844

L. M. MAGNUSSON AND S. MAVROEIDIS

there is full-indexation to π¯ t .21 Otherwise, the NKPC has some additional terms and time-varying coefficients that are functions of π¯ t and the underlying deep structural parameters α $ a, and μ, which are a priori considered invariant to shifts in π¯ t ; see Cogley and Sbordone (2008, p. 2105). Specifically, (39)

t πˆ t = ρt (πˆ t−1 − π¯ t ) + ζt mc + b1t E(πˆ t+1 |It ) + b2t

∞ 

ϕ1t E(πˆ t+j |It ) j−1

j=2

 ∞

+ b3t

ϕ1t E(rt+jt+j+1 + yt+j+1 |It ) + εt  j

j=0

where πˆ t = πt − π¯ t , ρt , ζt , b1t , b2t , b3t , and ϕ1t are functions of π¯ t , α $ a, and μ (see the Supplemental Material), rtt+1 is the real interest rate from period t to t + 1, and yt is real GDP growth, and we have imposed the assumption that the steady-state value of the discount factor rtt+1 + yt+1 is 1. We therefore need to modify our analysis to make the time-varying inflation target part of the maintained hypothesis. In the interest of robustness, we will not impose any specific model for π¯ t . Instead, it will suffice to assume time variation in π¯ t is moderate in the sense that it can be detected with possibly high probability, but not with certainty. Specifically, we assume that π¯ t = O(T −1/2 ) and π¯ t = o(π¯ t ), for all t ≤ T , T ≥ 1, which covers a wide range of time paths for π¯ t . This approach has been used by Li and Müller (2009) in a related context, and it is common in the literature on estimation of time-varying parameter models; see Stock and Watson (1998). We focus on moderate instability because this is the case that is problematic for inference. When instabilities are large, the generalized S sets become empty and this is far from what we found in the previous section. For simplicity, we have assumed a zero steady-state inflation target. The analysis can be generalized to allow for a nonzero steady-state inflation target π, ¯ included as an unknown parameter in θ, at the cost of more complicated algebra and higher computational intensity, due to the increase in the dimension of the parameter space. Alternatively, this assumption can be motivated as full indexation to any perfectly predictable long-run inflation target, as in Yun (1996). Terms of order lower than T −1/2 are asymptotically negligible for all the tests (this is analogous to Li and Müller (2009)). Using a first-order Taylor expansion with respect to π¯ t of the coefficients in (39), dropping all terms that are o(π¯ t ) (including π¯ t ), and rearranging yields the following specifi21

See also Yun (1996) and Ascari (2004).

IDENTIFICATION USING STABILITY RESTRICTIONS

1845

cation: (40)

$πt = E(πt+1 |It ) +

(1 − α)2 υxt + κ α

  (1 − α)2  υxt+1 − κIt + αE $πt+1 − πt+2 − α + t (θ)π¯ t + εt  2

where κ = (1−α) υ ln μ, υ = a(μ−1) , and t (θ) is a particular function of obα μ−a servable data and parameters given in the Supplemental Material. Equation (40) nests the baseline specification (37) when π¯ t = 0, because in that case 2 the term E($πt+1 − πt+2 − (1−α) υxt+1 − κ|It ) is identically equal to zero. α When π¯ t = 0, the baseline specification is misspecified, and the generalized S tests have power against this misspecification, as does the full-sample S test because the full-sample moment conditions are violated, too. So, the worry is that the S sets from the baseline specification (37) reported earlier may be too small because of this. This problem can be addressed using the more general specification (40). Notice that the parameters a and μ are not jointly identified, so we will calibrate the former to 2/3, which is standard. Any inference on θ based on (40) that ignores the unobservable term t (θ)π¯ t is generally invalid; see the Supplemental Material. This includes our proposed methods as well as the standard full-sample GMM tests. The intuition for this is that the correlation between t (θ) and the instruments is, in general, nonzero and this induces a nonzero mean to the asymptotic distribution of the moment vector T −1/2 FsT (θ0 ) under the null hypothesis θ = θ0 for all subsamples s ∈ [0 1]. However, because the term t (θ0 ) is observed, and E[Zt t (θ0 )] is consistently estimable under H0 and standard regularity conditions, the invalidity of the moment conditions can be corrected by recentering them appropriately. Specifically, the necessary correction involves an orthogonal projection of the moment vector ft (θ) = Zt (ut − αut+1 ), with ut defined in (38), onto the space spanned by E[Zt t (θ0 )] (see the Supplemental Material for the details). We obtain three-dimensional 95%-level confidence sets for the parameters α $ μ. The results are plotted in Figure 7. We notice that the confidence sets are very large, comprising almost the entire parameter space when we only use the full-sample restrictions. The volumes of the S and CLR sets as a fraction of the parameter space we considered are 90% and 88%, respectively. In contrast, the qLL-S and split-CLR sets are 21% and 24%, respectively, that is, less than a third of their full-sample counterparts (results for the other tests are given in the Supplemental Material). Relative to the baseline results given earlier, we notice that the requirement of robustness to time-varying trend inflation has a dramatic effect on the iden-

1846

L. M. MAGNUSSON AND S. MAVROEIDIS

FIGURE 7.—95% confidence sets for α, $, and μ in the NKPC allowing for time-varying trend inflation. The forcing variable is the log of the labor share. Instruments: constant, two lags of πt , and three lags of xt . Newey and West (1987) HAC with prewhitening. Period: 1966q1–2010q4.

tification of the deep parameters. This is consistent with the view that there is considerable time variation in trend inflation, so that excluding π¯ t from the model and using it as an “instrument” through the stability restrictions improves identification. The alternative would be to make specific assumptions about the time variation of π¯ t , which will enable us to use its variation for the identification of the deep parameters; see Cogley and Sbordone (2008). But if we wish to be robust to any possible misspecification of the path of π¯ t , then we need to orthogonalize the moment conditions with respect to it, thus effectively shutting off any information coming from π¯ t . We show in the Supplemental Material that this leads to the lack of identification near α = 0. Specifically, as α → 0, the moment vector ft (θ) becomes perfectly collinear with Zt t (θ) for all values of θ and for any instrument choice Zt , and therefore identification fails completely when using only full-sample restrictions. The identifica-

IDENTIFICATION USING STABILITY RESTRICTIONS

1847

tion problem is alleviated when we use stability restrictions. It can be shown that, as α → 0, ft (θ) becomes proportional to xt + ln μ, and stability restrictions remain informative about μ, though not about $. This explains why the qLL-S and split-CLR sets are so much smaller than their full-sample counterparts, and why they remain completely uninformative about $ when α is near 0. 5.4. Autocorrelated Mark-Up Shocks Another threat to the validity of the baseline results is the possibility that the error term εt in the NKPC (37), often interpreted as a mark-up shock, is autocorrelated. When this is the case, lags of the variables dated t − 1 are no longer valid instruments. It is straightforward to modify the moment conditions to account for autocorrelated errors. To do that, we need to make assumptions about the nature of autocorrelation in εt . If it is up to some finite order q, say, valid instruments are any variables dated before t − q. If the autocorrelation is of an autoregressive form, valid moment conditions can be obtained by quasi-differencing the data. However, allowing for autocorrelated errors is likely to increase the uncertainty about the parameters, since identification of the NKPC is based primarily on restrictions on the dynamics of the inflation process (exclusion restrictions on further lags of inflation and the labor share), and allowing for autocorrelated errors weakens those restrictions. We consider here the case of first-order autocorrelation: εt = φεt−1 + et , where E(et |It−1 ) = 0. The moment vector is given by ft (θ) = Zt (ut − φut−1 ), where ut is given by (38), and the instruments are the same as before. The parameter vector θ includes α $, and φ. Three-dimensional 95% confidence sets for those parameters are plotted in Figure 8. Even though all confidence sets are generally larger than when the restriction φ = 0 is imposed, the full-sample confidence sets remain considerably larger than the qLL-S and split-CLR sets. Specifically, the volume of the full-sample S and CLR sets is approximately 50% larger than that of qLL-S and split-CLR sets, respectively (see the Supplemental Material for details). So, stability restrictions remain relatively informative. It also appears that the autocorrelation parameter φ is very poorly identified and not significantly different from zero, since all confidence sets include φ = 0. Imposing this restriction enables us to rule out full indexation using the stability restrictions; see Figure 5. 6. CONCLUSIONS The contribution of this paper is twofold. First, it points out that the typical orthogonality conditions in time-series models estimated by GMM involve a set of stability restrictions that can be useful for identification of the parameters but have heretofore been unexploited by existing limited-information methods of inference. Second, it develops new limited-information methods

1848

L. M. MAGNUSSON AND S. MAVROEIDIS

FIGURE 8.—95%-level confidence sets for α, $, and φ in the NKPC with autocorrelated markup shocks εt = φεt−1 + et . The forcing variable is the log of the labor share. Instruments: constant, two lags of πt , and three lags of x˜ t . Newey and West (1987) HAC with prewhitening. Period: 1966q1–2010q4.

of inference that exploit the identifying information in the stability restrictions using only mild assumptions about the nature of instability. A notable implication of our proposed methods is that they allow for identification of the parameters in models where the usual GMM order condition for identification fails, for example, when the number of instruments is smaller than the number of parameters, because stability restrictions provide the additional information that is needed. This can be useful in situations where alternative exclusion restrictions may be controversial. REFERENCES ANDERSON, T. W., AND H. RUBIN (1949): “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations,” The Annals of Mathematical Statistics, 20, 46–63. [1810,1823]

IDENTIFICATION USING STABILITY RESTRICTIONS

1849

ANDREWS, D. W. K. (1993): “Tests for Parameter Instability and Structural Change With Unknown Change Point,” Econometrica, 61 (4), 821–856. [1800,1804,1807,1817,1821,1829] ANDREWS, D. W. K., AND W. PLOBERGER (1994): “Optimal Tests When a Nuisance Parameter Is Present Only Under the Alternative,” Econometrica, 62 (6), 1383–1414. [1800,1809,1810,1817, 1820,1822] ANDREWS, D. W. K., M. J. MOREIRA, AND J. H. STOCK (2006): “Optimal Two-Sided Invariant Similar Tests for Instrumental Variables Regression,” Econometrica, 74 (3), 715–752. [1806, 1815,1816,1822,1823,1826,1831,1834] ASCARI, G. (2004): “Staggered Prices and Trend Inflation: Some Nuisances,” Review of Economic Dynamics, 7 (3), 642–667. [1844] BAI, J., AND P. PERRON (1998): “Estimating and Testing Linear Models With Multiple Structural Changes,” Econometrica, 66, 47–78. [1804,1808,1809,1827,1828,1830] BASU, S., AND J. G. FERNALD (1997): “Returns to Scale in U.S. Production: Estimates and Implications,” Journal of Political Economy, 105 (2), 249–283. [1840] CALVO, G. A. (1983): “Staggered Prices in a Utility Maximizing Framework,” Journal of Monetary Economics, 12, 383–398. [1839] CANOVA, F., AND L. SALA (2009): “Back to Square One: Identification Issues in DSGE Models,” Journal of Monetary Economics, 56 (4), 431–449. [1800] CHRISTIANO, L. J., M. EICHENBAUM, AND C. EVANS (2005): “Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy, 113, 1–45. [1842] CLARIDA, R., J. GALÍ, AND M. GERTLER (2000): “Monetary Policy Rules and Macroeconomic Stability: Evidence and Some Theory,” Quarterly Journal of Economics, 115, 147–180. [1800] COGLEY, T., AND A. M. SBORDONE (2008): “Trend Inflation and Inflation Persistence in the New Keynesian Phillips Curve,” American Economic Review, 98 (5), 2101–2126. [1840,1843,1844, 1846] DUFOUR, J.-M. (2003): “Identification, Weak Instruments and Statistical Inference in Econometrics,” Canadian Journal of Economics, 36 (4), 767–808. [1810] ELLIOTT, G., AND U. K. MÜLLER (2006): “Efficient Tests for General Persistent Time Variation in Regression Coefficients,” Review of Economic Studies, 73 (4), 907–940. [1800,1804,1809,1817, 1818,1829] (2010): “Pre and Post Break Parameter Inference,” Technical Report, Department of Economics, Princeton University. [1811] ELLIOTT, G., T. J. ROTHENBERG, AND J. H. STOCK (1996): “Efficient Tests for an Autoregressive Unit Root,” Econometrica, 64 (4), 813–836. [1815] GALÍ, J. (2008): Monetary Policy, Inflation, and the Business Cycle: An Introduction to the New Keynesian Framework. Princeton: Princeton University Press. [1839] GALÍ, J., AND M. GERTLER (1999): “Inflation Dynamics: A Structural Econometric Analysis,” Journal of Monetary Economics, 44, 195–222. [1836,1840] GIACOMINI, R., AND B. ROSSI (2009): “Model Comparisons in Unstable Environments,” ERID Working Paper 30, Duke University. [1800] GUGGENBERGER, P. (2012a): “A Note on the Relation Between Local Power and Robustness to Misspecification,” Economics Letters, 116 (2), 133–135. [1811] (2012b): “On the Asymptotic Size Distortion of Tests When Instruments Locally Violate the Exogeneity Assumption,” Econometric Theory, 28 (2), 387–421. [1811] HANSEN, B. E. (2000): “Testing for Structural Change in Conditional Models,” Journal of Econometrics, 97 (1), 93–115. [1807,1808,1839] KLEIBERGEN, F. (2002): “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression,” Econometrica, 70, 1781–1803. [1807] (2005): “Testing Parameters in GMM Without Assuming That They Are Identified,” Econometrica, 73 (4), 1103–1123. [1807,1812-1814,1824,1825] KLEIBERGEN, F., AND S. MAVROEIDIS (2009): “Weak Instrument Robust Tests in GMM and the New Keynesian Phillips Curve,” Journal of Business & Economic Statistics, 27 (3), 293–311. [1800,1840,1842]

1850

L. M. MAGNUSSON AND S. MAVROEIDIS

KLEIBERGEN, F., AND R. PAAP (2006): “Generalized Reduced Rank Tests Using the Singular Value Decomposition,” Journal of Econometrics, 133 (1), 97–126. [1814] KLEIN, R., AND F. VELLA (2010): “Estimating a Class of Triangular Simultaneous Equations Models Without Exclusion Restrictions,” Journal of Econometrics, 154 (2), 154–164. [1800] LEWBEL, A. (2012): “Using Heteroskedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models,” Journal of Business & Economic Statistics, 30 (1), 67–80. [1800] LI, H. (2008): “Estimation and Testing of Euler Equation Models With Time-Varying ReducedForm Coefficients,” Journal of Econometrics, 142 (1), 425–448. [1799] LI, H., AND U. K. MÜLLER (2009): “Valid Inference in Partially Unstable GMM Models,” Review of Economic Studies, 76, 343–365. [1799,1827,1829,1830,1844] LUCAS, R. E., JR. (1976): “Econometric Policy Evaluation: A Critique,” in The Phillips Curve and Labor Markets, ed. by K. Brunner and A. Meltzer. Carnegie–Rochester Conference Series on Public Policy. Amsterdam: North-Holland. [1799] MAGNUSSON, L. M., AND S. MAVROEIDIS (2014): “Supplement to ‘Identification Using Stability Restrictions’,” Econometrica Supplemental Material, 82, http://www.econometricsociety.org/ ecta/supmat/9612_miscellaneous.pdf; http://www.econometricsociety.org/ecta/supmat/9612_ data_and_programs.zip. [1801] MCCONNELL, M. M., AND G. PEREZ-QUIROS (2000): “Output Fluctuations in the United States: What Has Changed Since the Early 1980’s?” American Economic Review, 90 (5), 1464–1476. [1837] MOREIRA, M. J. (2003): “A Conditional Likelihood Ratio Test for Structural Models,” Econometrica, 71, 1027–1048. [1806,1807,1824] MÜLLER, U. K. (2011): “Efficient Tests Under a Weak Convergence Assumption,” Econometrica, 79 (2), 395–435. [1802,1808,1815,1817] NEWEY, W. K., AND K. D. WEST (1987): “A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55 (3), 703–708. [1837,1841, 1843,1846,1848] PERRON, P. (2006): “Dealing With Structural Breaks,” in Palgrave Handbook of Econometrics, Vol. 1, ed. by K. Patterson and T. C. Mills. London: Palgrave Macmillan, 278–352. [1800] RIGOBON, R. (2003): “Identification Through Heteroskedasticity,” Review of Economics and Statistics, 85 (4), 777–792. [1800,1806] ROSSI, B. (2005): “Optimal Tests for Nested Model Selection With Underlying Parameter Instability,” Econometric Theory, 21 (5), 962–990. [1800,1810] ROTEMBERG, J. J., AND M. WOODFORD (1993): “Dynamic General Equilibrium Models With Imperfectly Competitive Product Markets,” Working Paper 4502, National Bureau of Economic Research. [1840] SBORDONE, A. M. (2002): “Prices and Unit Labor Costs: A New Test of Price Stickiness,” Journal of Monetary Economics, 49, 265–292. [1840,1842] (2005): “Do Expected Future Marginal Costs Drive Inflation Dynamics?” Journal of Monetary Economics, 52 (6), 1183–1197. [1839,1840] SIMS, C. A., AND T. ZHA (2006): “Were There Regime Switches in U.S. Monetary Policy?” American Economic Review, 96 (1), 54–81. [1800] SOWELL, F. (1996): “Optimal Tests for Parameter Instability in the Generalized Method of Moments Framework,” Econometrica, 64 (5), 1085–1107. [1804,1807,1809,1817,1829] STOCK, J. H., AND M. W. WATSON (1996): “Evidence on Structural Instability in Macroeconomic Time Series Relations,” Journal of Business & Economic Statistics, 14 (1), 11–30. [1799,1804, 1809,1817] (1998): “Median Unbiased Estimation of Coefficient Variance in a Time-Varying Parameter Model,” Journal of the American Statistical Association, 93 (441), 349–358. [1809,1817, 1844] STOCK, J. H., AND J. H. WRIGHT (2000): “GMM With Weak Identification,” Econometrica, 68 (5), 1055–1096. [1802,1803,1807-1812]

IDENTIFICATION USING STABILITY RESTRICTIONS

1851

STOCK, J. H., J. H. WRIGHT, AND M. YOGO (2002): “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments,” Journal of Business & Economic Statistics, 20, 518–529. [1800] VAN DER VAART, A. W. (1998): Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics, Vol. 3. Cambridge: Cambridge University Press. [1815] WALD, A. (1943): “Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations Is Large,” Transactions of the American Mathematical Society, 54, 426–482. [1817] WHITE, H. (2001): Asymptotic Theory for Econometricians. San Diego: Academic Press. [1808, 1815] YUN, T. (1996): “Nominal Price Rigidity, Money Supply Endogeneity, and Business Cycles,” Journal of Monetary Economics, 37 (2), 345–370. [1844]

Dept. of Economics, Business School, The University of Western Australia, 35 Stirling Highway—M251, Crawley, WA 6009, Australia; leandro.magnusson@ uwa.edu.au and Dept. of Economics and Institute for New Economic Thinking at the Oxford Martin School, University of Oxford, Manor Road, Oxford, OX1 3UQ, U.K.; [email protected]. Manuscript received October, 2010; final revision received March, 2014.

Identification Using Stability Restrictions

algebra and higher computational intensity, due to the increase in the dimen- sion of the parameter space. Alternatively, this assumption can be motivated.

802KB Sizes 1 Downloads 295 Views

Recommend Documents

Species Identification using MALDIquant - GitHub
Jun 8, 2015 - Contents. 1 Foreword. 3. 2 Other vignettes. 3. 3 Setup. 3. 4 Dataset. 4. 5 Analysis. 4 .... [1] "F10". We collect all spots with a sapply call (to loop over all spectra) and ..... similar way as the top 10 features in the example above.

speaker identification and verification using eigenvoices
approach, in which client and test speaker models are confined to a low-dimensional linear ... 100 client speakers for a high-security application, 60 seconds or more of ..... the development of more robust eigenspace training techniques. 5.

Sparse-parametric writer identification using ...
grated in operational systems: 1) automatic feature extrac- tion from a ... 1This database has been collected with the help of a grant from the. Dutch Forensic ...

Sparse-parametric writer identification using heterogeneous feature ...
Retrieval yielding a hit list, in this case of suspect documents, given a query in the form .... tributed to our data set by each of the two subjects. f6:ЮаЯвбЗbзбйb£ ...

LANGUAGE IDENTIFICATION USING A COMBINED ...
over the baseline system. Finally, the proposed articulatory language. ID system is combined with a PPRLM (parallel phone recognition language model) system ...

Multipath Medium Identification Using Efficient ...
proposed method leads to perfect recovery of the multipath delays from samples of the channel output at the .... We discuss this connection in more detail in the ...

Sparse-parametric writer identification using heterogeneous feature ...
The application domain precludes the use ... Forensic writer search is similar to Information ... simple nearest-neighbour search is a viable so- .... more, given that a vector of ranks will be denoted by ╔, assume the availability of a rank operat

Sparse-parametric writer identification using ...
f3:HrunW, PDF of horizontal run lengths in background pixels Run lengths are determined on the bi- narized image taking into consideration either the black pixels cor- responding to the ink trace width distribution or the white pixels corresponding t

SPEAKER IDENTIFICATION IMPROVEMENT USING ...
Air Force Research Laboratory/IFEC,. 32 Brooks Rd. Rome NY 13441-4514 .... Fifth, the standard error for the percent correct is zero as compared with for all frames condition. Therefore, it can be concluded that using only usable speech improves the

Electromagnetic field identification using artificial neural ... - CiteSeerX
resistive load was used, as the IEC defines. This resistive load (Pellegrini target MD 101) was designed to measure discharge currents by ESD events on the ...

Dietary Restrictions Accommodation
We use whole ingredients and prepare our delicious, kid-friendly meals from scratch. Additionally, we do our best to accommodate all participants' needs and we take dietary restrictions and allergies very seriously.

PIN generation using EEG: a stability study ...
School of Computer Science and Electronic Engineering,. University ... infrared), iris, retina, signature, ear shape, odour, keystroke entry pattern, gait and voice. (Jain et al. ..... International Journal of Advanced Mechatronic Systems, Vol. 2, No

Electromagnetic field identification using artificial neural ...
National Technical University of Athens, 9 Iroon Politechniou Str., 157 80 Athens. 4. National ..... Trigg, Clinical decision support systems for intensive care units: ...

Character Identification in Movie Using Movie Script - IJRIT
M.Tech Student, Department of Computer Science and Engineering ... Names for the clusters are then manually selected from the cast list. ..... Video and Image processing in multimedia system, Cloud Computing and Biometric systems.

Character Identification in Movie Using Movie Script - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 1, Issue ..... In this paper, the preserved statistic properties are utilized and a proposal to .... processing in multimedia system, Cloud Computing and Biometric systems.

a novel pattern identification scheme using distributed ...
Jul 6, 2009 - macroblock temporal redundancy (ITR) (see Fig 1) which is static in successive frames. Indeed, few bits are used to signal zero residual error and zero motion for ITR to decoder, which is obviously significant when a sequence is encoded

Dong et al, Thermal Process System Identification Using Particle ...
Dong et al, Thermal Process System Identification Using Particle Swarm Optimization.pdf. Dong et al, Thermal Process System Identification Using Particle ...

Blind Identification Channel Using Higher Order ...
Technology (IC2INT'13) 13-14 November 2013, Settat, Morocco. [11] ETSI, “Broadband Radio Access Networks (BRAN), HIPERLAN Type. 2, Physical (PHY) layer”, 2001. [12] ETSI,“Broadband Radio Access Networks (BRAN), (HIPERLAN) Type. 2”, Requiremen

Polony Identification Using the EM Algorithm Based on ...
Wei Li∗, Paul M. Ruegger†, James Borneman† and Tao Jiang∗. ∗Department of ..... stochastic linear system with the em algorithm and its application to.

Nonlinear System Identification and Control Using ...
Jul 7, 2004 - ments show that RTRL provides best approximation accuracy at the cost of large training ..... Knowledge storage in distributed memory, the synaptic PE connections. 1 ... Moreover, these controllers can be used online owing.

Notes on the identification of VARs using external ...
Jul 26, 2017 - tool to inspect the underlying drivers of ri,t. One way to .... nent bi,2sb,t in the error term and delivers a consistent estimate of bi1. The estimate,.

On Damage Identification in Civil Structures Using ...
Damage identification is a key problem in SHM. It is classified by ... cluding chemistry, neuroscience, social network analysis and computer vision [1,. 10]. ... Sun et al. [16] proposed different methods on dynamically updating compo- nent matrices