A Narrative Approach to a Fiscal DSGE Model This version: August 8, 2017 First version: February 2014 Thorsten Drautzburg ∗

Abstract Structural DSGE models are used for analyzing both policy and the sources of business cycles. Conclusions based on full structural models are, however, potentially affected by misspecification. A competing method is to use partially identified VARs based on narrative shocks. This paper asks whether both approaches agree. Specifically, I use narrative data in a DSGE-VAR that partially identify policy shocks in the VAR and assess the fit of the DSGE model relative to this narrative benchmark. In doing so, I make four contributions. First, I adapt the existing methods for shock identification with external VARs for Bayesian inference in the textbook SUR framework. Second, I prove that this narrative VAR approach is valid in a class of DSGE models with Taylor-type policy rules. I also extend the DSGE-VAR framework to incorporate instruments in the estimation. Third, I collect Greenbook data on expectations that I use to extend a proxy for short-term government spending shocks and to address fiscal foresight. Fourth, I estimate a standard quantitative DSGEVAR model with fiscal rules. I find that the DSGE model identification is at odds with the narrative information as measured by the marginal likelihood. I trace this discrepancy to differences in impulse responses, identified historical shocks, and policy rules. The results indicate monetary accommodation of fiscal shocks. Keywords: Fiscal policy, monetary policy, DSGE model, Bayesian estimation, narrative shocks, Bayesian VAR. JEL classifications: C32, E32, E52, E62.

∗ Federal Reserve Bank of Philadelphia. E-mail: tdrautzburg[at]gmail.com. Web: sites.google.com/site/tdrautzburg/. I am grateful to Pooyan Amir-Ahmadi, Jesus Fernandez-Villaverde, Pablo Guerron-Quintana, my discussant Ed Herbst, Jim Nason, Barbara Rossi, Juan Rubio-Ramirez, Frank Schorfheide, Tom Stark, Keith Sill, Harald Uhlig, Mark Watson, Jonathan Wright and audiences at the Atlanta Fed, the 2014 EEA, the ECB, the 2015 ESWC, Emory, Goethe, Madison, the 2015 NBER-NSF SBIES and NBER SI, the Philadelphia Fed, the Riksbank, the 2014 SED, the Fall 2015 System Macro Committee, and Wharton for comments and suggestions. I would also like to thank Valerie Ramey, Karel Mertens, and Morten Ravn for making their code and data publicly available and Nick Zarra for excellent research assistance. All errors are mine. The views expressed herein are my own. They do not necessarily reflect the views of the Federal Reserve Bank of Philadelphia, the Federal Reserve System, or its Board of Governors.

1

Introduction

Dynamic Stochastic General Equilibrium (DSGE) models are a widespread research and policy tool. But as structural models, DSGE models could be misspecified and their results therefore misleading. An alternative approach uses narrative methods to study some of the same shocks and their effects. Narrative methods rely on fewer structural assumptions and incorporate additional information relative to standard macroeconomic time series. I provide a framework for incorporating this information in DSGE model estimation and to quantify misspecification compared with narrative studies. Despite the success of DSGE models such as Christiano et al. (2005) and Smets and Wouters (2007), concerns about misspecification remain: Faust (2009), for example, argues that microfoundations of DSGE models are weak and that their quantitative success could mask misspecification. Sims (2005) cautions that DSGE models should not displace alternative identification schemes. Del Negro et al. (2007) use the Del Negro and Schorfheide (2004) framework to quantify model misspecification relative to a reduced-form vector-autoregression (VAR) and find a small degree of misspecification. But since their benchmark model is a reduced-form VAR, their metric for comparing models is unaffected by shock identification. 1 In this paper I quantify the misspecification of DSGE models with a focus on shock identification. I extend the Del Negro and Schorfheide (2004) framework for assessing misspecification in reduced-form VARs to structural VARs identified with external instruments: I model traditional macro variables jointly with narrative shocks such as the monetary policy shocks developed by Romer and Romer (2004). With the resulting narrative DSGE-VAR, I quantify the misspecification of the DSGE identification scheme by comparing marginal likelihoods: If a given DSGE model and the narrative VAR agree, including information from the model increases precision and hence the marginal likelihood. 2, 3 As in Park (2011), I distinguish fit of the DSGE dynamics from the fit of the DSGE identification. The innovation of this paper is to incorporates external data into a DSGE-VAR that partially identify the VAR and can reveal sources of model misspecification. Specifically, this paper makes four contributions. First, I provide a simple Bayesian implementation of the proxy-VAR framework proposed in frequentist settings by Stock and Watson (2012) and Mertens and Ravn (2013). Contrary to Caldara and Herbst (2015), my implementation is based on the reducedform representation of the VAR. Similar to a standard VAR such as Uhlig (2005), the reduced form estimation uses standard textbook methods. The structural implications follow from closed-form expressions based on the reduced form. Applications of the proposed Bayesian proxy-VAR include the analysis of responses to minimum wage shocks (Drautzburg et al., 2017) and unconventional monetary policy (Lhuissier and Szczerbowicz, 2017). Second, I show that comparing DSGE models with narrative VARs is valid for a class of DSGE 1

Canova et al. (2015) use the Del Negro and Schorfheide (2004) DSGE-VAR method to assess whether the assumption of time-invariant parameters matters. 2 Formally, shocks in the standard DSGE-VAR framework are identified only up to multiplication by an orthonormal matrix that does not affect the likelihood. Here, the extra data, and hence the likelihood, also speak to the identification. 3 Waggoner and Zha (2012) confront model misspecification differently. Instead of estimating a constant mixture between the DSGE model and the VAR, they allow for Markov switching between the models. Given the limited numbers of narrative shock observations, such a Markov switching approach seems too demanding of the data in my application.

1

models: I provide conditions under which the instrument-identified VAR in Mertens and Ravn (2013) correctly identifies shocks and policy rule coefficients in models with standard Taylor-type policy rules, such as Leeper et al. (2010) and Fernandez-Villaverde et al. (2015), without timing restrictions. This property of the narrative VAR contrasts with traditional VARs that identify shocks through contemporaneous zero restrictions. DSGE models that match the VAR then need to assume that economic agents only react to policy shocks with a delay. The narrative VAR approach is valid if the data are generated from a widely used class of DSGE models, without restricting the timing. The key condition for my result is that the information set in the VAR captures the variables policy-makers pay attention to. This theoretical results mirrors the empirical result in Caldara and Herbst (2015) that credit spreads may be an important policy variable. Third, I use previously unused data on expectations for the identification. Using the historic, publicly available Greenbook records, I digitize quarterly expectations data on the different components of fiscal policy along with the economy as a whole. I use this data to update the instrument for shortterm government spending in Ramey (2011), i.e., defense spending surprises. Extending the sample on proxy variables sharpens inference significantly. I also use the data to control for fiscal foresight in a model extension featuring news shocks. Fiscal foresight does affect some estimates, but leaves the main conclusions unchanged. Fourth, I estimate a quantitative version of my model. My application focuses on fiscal and monetary policy rules in a medium-scale DSGE model, with narrative measures for government spending, tax rates, and monetary policy shocks in the baseline specification. Using estimates of the marginal likelihood, I find that the best-fitting model puts significant weight on DSGE model dynamics, but the data prefers the weakest prior on DSGE model covariance structure. I interpret these results as indicating that the DSGE model identification does not line up with the narrative identification. These results are robust to estimating model the model with news shocks and expectations data, and for other model variants. To provide evidence why the data dislikes the DSGE model specification, I analyze the implied impulse-response functions, historical shocks, and policy rules. I find that monetary policy accommodates fiscal shocks in the impulse-response function analysis. While the corresponding pure DSGE can match these qualitative findings through the indirect fiscal effects on inflation and output, my analysis of policy rules reveals a systematic direct response. I also find responses to monetary shocks that resemble a price-puzzle. These responses are robust to different specifications such as controlling for expectations. The pure DSGE model cannot match these responses. Still, the pure DSGE model matches the historical monetary policy shocks best, but struggles to match historical tax shocks. The application thus highlights the advantage of using likelihood-based inference: The VAR has implications for several structural objects that are all informative about the success of the DSGE model that I analyze. In contrast, limited-information methods such as impulse-response function matching try to maximize the fit along only one of the structural dimensions. 4 4

Beyond impulse-responses, historical shocks, and policy rule estimates, variance decompositions are often of interest to researchers. Here, too, fully structural and partial identification of fiscal shocks can lead to widely different conclusions about the drivers of business cycles: (Rossi and Zubairy, 2011, Table 2) document that when applying a Blanchard and Perotti (2002) type identification of government spending shocks in a VAR, the fraction of the variance of GDP driven

2

Since it provides an intuitive way to incorporate prior information, the narrative DSGE-VAR framework can also be of interest for future narrative studies. In my application, the posterior distributions over the effects of fiscal shocks remain wide when only few proxy measures are available. Even though here I could add government spending shock proxies through extra data work, this is not possible for other applications, such as taxes. In such cases, incorporating prior information sharpens the posterior by shrinking the estimates toward theory-consistent policy rules. This is similar to the work by Arias et al. (2015), who document that even weak structural priors over policy rules help to identify shocks via sign restrictions. The application to a fiscal DSGE model with monetary policy is important from a substantive point of view: With monetary policy constrained by the zero lower bound (ZLB), “stimulating” fiscal policy has gained a lot of attention and influential papers such as Christiano et al. (2011b) have used quantitative DSGE models for the analysis of fiscal policies. Since the fiscal building blocks of DSGE models are less well studied than, say, the Taylor rule for monetary policy (e.g., Clarida et al., 2000), assessing the fiscal policy implications of these models is warranted. Even though my results show that fiscal DSGE models can match VAR estimates along some dimensions, my overall results caution against the use of standard DSGE models for fiscal policy analysis. This paper is structured as follows: Section 2 frames the research question in general and formal terms. The paper proceeds by describing the methods used in the analysis in Section 3. Sections 4 and 5 describe the empirical specification and the empirical results. A web appendix contains the proofs, describes the data, and shows additional empirical results.

2

Framework

To illustrate the main idea, consider the following state-space representation of a linear DSGE model with the vector Y of observables and vector X of potentially unobserved state variables: ∗ Yt = B ∗ Xt−1 + A∗ ǫ∗t

(2.1a)

∗ + C ∗ ǫ∗t , Xt∗ = D ∗ Xt−1

(2.1b)

where E[ǫ∗t (ǫ∗t )′ ] = Im . m is the dimensionality of Yt . In this paper, I am interested in estimating the following VAR(p) approximation to this state-space model: Yt = BXt−1 + Aǫt i h Xt = Yt Yt−1 . . . Yt−(p−1) ,

(2.2a) (2.2b)

where the dimension of the state vector is typically different across the VAR and the DSGE model but the shocks ǫt and ǫ∗t are assumed to be of the same dimension. Importantly, the observables Yt are also the same. by those shocks rises significantly with the forecast horizon, whereas the DSGE model-based variance decomposition in (Smets and Wouters, 2007, Figure 1) implies the opposite pattern.

3

The typical challenge in VARs is that observables only identify AA′ but not A. The standard DSGE-VAR in Del Negro et al. (2007) therefore uses the same rotation scheme that identifies shocks in the DSGE model also to identify shocks in the VAR. In contrast, I bring in additional data to identify a subset of shocks in the VAR. Stock and Watson (2012) and Mertens and Ravn (2013) have shown how to use narrative or external instruments to (partially) identify A for given parameter estimates. In the next section, I first summarize their identification results and show how to use a standard seemingly unrelated regression (SUR) estimator for inference. Second, I provide conditions when this procedure is correct in a class of DSGE models. Last, I show how to implement a prior for the narrative VAR based on a DSGE model.

3

Bayesian VAR estimation with narrative instruments

In this section, I first discuss identification and Bayesian estimation in the instrument-identified VAR. I refer to this model as narrative BVAR in what follows. The second part of this section links identification in the VAR to identification in a DSGE model with Taylor-type policy rules and outlines how to use a DSGE model for prior elicitation and for testing the identifying assumption. 5

3.1

Narrative BVAR

I use the following notation for the statistical model: yt = µy + Byt−1 + vt

(3.1a)

iid

str ∼ N (0, Im ) vt = Aǫstr t , ǫt

(3.1b) iid

zt = µz + F vt + Ω−1/2 ut , ut ∼ N (0, Ik )

(3.1c)

Here, Yt is the observed data, B is a matrix containing the (possible stacked) lag coefficient matrices of the equivalent VAR(p) model as well as constants and trend terms, vt is the m-dimensional vector are the structural shocks. of forecast errors, and zt contains k narrative shock measures. ǫstr t

6

Given B, vt is data. We can thus also observe Var[vt ] = AA′ ≡ Σ. A is identified only up to an ˜ AQ) ˜ ′ = AA′ for A˜ = chol(Σ) and QQ′ = I. orthonormal rotation: AQ( The observation equation for the narrative shocks (3.1c) can alternatively be written as: h i h i −1/2 −1/2 zt = G 0 ǫstr ut = G 0 A−1 Aǫstr ut t +Ω t +Ω | {z } {z } ≡v | ≡F

(3.2)

t

5 Caldara and Herbst (2015) provide an alternative way to estimate a narrative or proxy VAR using Bayesian methods. Their approach factors the likelihood function differently than I do and thereby highlights nicely how the instruments restrict the rotation matrix that maps forecast errors to structural shocks. 6 One could easily generalize the model to include conditioning information in the instrument equation. For example, I use forward-looking information from Ramey (2011) and Fisher and Peters (2010) in robustness checks to make sure the instrument does not pick up the measure anticipatory policy measures.

4

By imposing zero restrictions on the structural representation of the covariance matrix, knowledge of F and AA′ identifies the shocks that are not included in (3.2). However, only the covariance matrix F AA′ is needed for identification. It is therefore convenient for inference and identification to introduce a shorthand for the covariance matrix between the instruments and the forecast errors. Formally: Assumption 1. For some invertible square matrix G, the covariance matrix Γ can be written as: h i Γ ≡ Cov[zt , vt ] = F AA′ = G 0 A′ .

(3.3)

The assumption that G is invertible follows Mertens and Ravn (2013) and corresponds to the assumption that the instruments are relevant. The model in (3.1) can then be written compactly as: " # yt t−1 ∼N Y zt

"

µy + Byt−1 µz

# "

AA′ Γ′ , ˜ Γ Ω

#!

,

(3.4)

˜ = Ω + F AA′ F ′ is the covariance matrix of the narrative instruments. where Ω 3.1.1

Identification given parameters

This section largely follows Mertens and Ravn (2013). It considers the case of as many instruments as shocks to be identified, with"k ≤ m. # α11 α12 , α[1] = [α′11 , α′21 ]′ with both α11 (mz × mz ) being invertible Partition A = [α[1] , α[2] ] = α21 α22 and α21 ((m − mz ) × mz ). Using the definitions of Γ and the forecast errors gives from (3.3): h i Γ ≡ Cov[zt , vt ] = Cov[zt , Aǫstr ] = A′ = Gα′1 = [Gα′11 , Gα′21 ] G 0 t

(3.5)

Under Assumption 1, G is invertible, and we can partition Γ so that G−1 = α′11 Γ−1 1 . From (3.5), we −1 ′ −1 ′ ′ ′ also know that α21 = G Γ2 and hence we can write α21 = α11 (Γ1 Γ2 ) ≡ α11 κ (i.e., as a function of α11 and the observable matrix Γ). Hence, the (structural) impulse vectors to shocks 1, . . . , mz satisfy: α[1] =

" # α11 α21

=

# " Imz κ′

α11 ,

(3.6)

where Γ2 is mz × (m − mz ) and Γ1 is mz × mz . This m × mz dimensional vector is a known function of the mz 2 parameters in α11 . It therefore restricts (m − mz )mz elements of A. The VAR implies additional restrictions on the impulse vector through the covariance matrix Σ of forecast errors. In the case of a single identified shock, α11 is a scalar and the impulse vector is thus already identified up to scale and sign. The covariance matrix Σ of forecast errors pins down the correct scale of the shock in this case. 5

To see the more general restrictions arising from consistency of the impulse vector(s) with the covariance matrix, restate the VAR identification as finding the correct rotation of reduced form shocks: Given the covariance Σ of forecast errors, A is only identified up to multiplication by an ˜ so that A = AQ ˜ (e.g. Uhlig, orthonormal matrix Q′ . Denote the lower Cholesky factor of Σ by A, ˜ [1] . Now it becomes clear 2005). Multiplying out the partitioned matrix A then shows that α[1] = Aq that the fact that Q is a rotation matrix provides additional restrictions. In particular, it holds that: Imz = (q [1] )′ q [1] = α′11 This requires

mz (mz −1) 2

#′ " Imz κ

Σ−1

"

Imz κ

#



α11

α11 α′11 =

"

Imz κ

#′

Σ−1

"

Imz κ

#!−1

(3.7)

additional restrictions to identify α11 completely. In the case of a single

identified shock we thus need no extra restrictions. The following Lemma summarizes the above results: Lemma 1. (Stock and Watson, 2012; Mertens and Ravn, 2013) Under Assumption 1, the impact of shocks with narrative instruments is generally identified up to an mz × mz scale matrix α11 whose outer product α11 α′11 is known given Σ and κ, requiring an extra

(mz −1)mz 2

identifying restrictions and

the impulse vector is given by (3.6). Proof: See Appendix A.1. Thus, for mz > 1, the extra data by itself only identifies a set of IRFs and the statistical fit of the model is invariant to the particular choice of α11 . 7 To uniquely characterize impulse responses or historical shocks, however, I need further assumptions to partially identify specific responses. Before discussing identification, it is worth pointing out that identification is not affected by instruments that are missing at random conditional on time t − 1 information, as the following corollary shows. Corollary 1. Assume that z˜jt = zjt with probability φj ∈ (0, 1] and equal to its sample mean oth˜ = Cov[˜ ˜ −1 Γ ˜ 2 . Then κ erwise. Define Γ zt , vt ] and κ ˜ = Γ ˜ = κ and identification is unaffected by the 1 presence of φj . P P ˜ −1 Γ ˜ 2 = (α′ )−1 G−1 diag([φj ]mz )−1 ˜ = mz φj Cov[zj,t , vt ] = mz φj Gj,◦ A′ = [diag([φj ]mz )G, 0]A′ . κ ˜=Γ Proof: Γ 11 j=1 j=1 1 j=1 j=1 ′ −1 ′ [1] ′ ′ ′ (α11 ) (α12 ) = κ. Because a = [Imz , κ] α11 and α11 α11 depends on Σ and κ only, identification is unaffected. Intuitively, The scale of the covariance between instruments and forecasts errors does not affect identification; only the relative effect of instruments on the various forecasts errors does. To achieve point-identification of the instrument-identified shocks, a simple Cholesky-type assumption is appropriate for a class of DSGE models: While a Cholesky-factorization is not generally compelling, 8 I show in Section 3.2 that this choice is correct for policy instruments under commonly made assumptions. I therefore use this particular choice as my baseline but also provide an alternative statistical factorization. First, I summarize the baseline factorization, which follows Mertens and Ravn (2013). It is a Cholesky decomposition in a population two-stage least squares (2SLS) representation of the previous 7

Without extra structure, the model is thus only set identified in the sense of Moon and Schorfheide (2012). The countable number of Cholesky orderings does not span the uncountably many possible shock rotations consistent with the data when there are multiple shocks. 8

6

problem (i.e., given Σ, Γ). In this 2SLS, the instruments zt serve to purge the forecast error vari[1]

ance to the first mz variables in yt from shocks other than ǫt . Mertens and Ravn (2013) call the resulting residual variance-covariance matrix S1 S1′ and propose either an upper or a lower Choleskydecomposition of S1 S1′ . Formally, A−1 vt can be rewritten as the following system of simultaneous equations: " # v1,t v2,t

=

#" # " 0 η v1,t v2,t

κ 0

+

" S1 0

0 S2

#" # ǫ1,t ǫ2,t

,

(3.8)

−1 where η = α12 α−1 22 and κ = α21 α11 are functions of Σ, Γ, given in Appendix A.4. Using that vt = Aǫt

and simple substitution allows me to re-write this system as: "

α11 α12 α21 α22

#"

# # " # " # " (I − ηκ)−1 η (I − ηκ)−1 v1,t ǫ1,t S2 ǫ2,t , S1 ǫ1,t + = = (I − κη)−1 v2,t (I − κη)−1 κ ǫ2,t

This shows how identifying S1 identifies α[1] up to a Cholesky factorization as: [1]

α

=

" # α11 α21

=

"

(I − ηκ)−1

(I −

#

κη)−1 κ

chol(S1 S1′ ).

(3.9)

The second factorization I consider is a generalization of the Cholesky decomposition of α11 α′11 , [1]

the contemporaneous forecast error variance attributable to the instrument-identified shocks ǫt . The (lower) Cholesky decomposition of α11 α′11 implies that all of the one period ahead conditional forecast error is attributable to the first instrument-identified shock. All of the remaining residual variation in the second shock is then attributed to the second instrument-identified shock and so forth. 9 The procedure described by Uhlig (2003) generalizes this by identifying the first instrument-identified shock as the shock that explains the most conditional forecast error variance in variable i over some horizon {h, . . . , h}. 10 With either factorization scheme, both the IRFs to the mz policy shocks and the time series of the mz policy shocks are point-identified given the reduced-form VAR parameters. In my application, I compare both approaches when analyzing IRFs. Estimates based on maximizing the conditional forecast error variance are more uncertain, because the identification scheme depends both on the covariance matrix V and the model dynamics B. I now discuss how to estimate both jointly. [2]

9

This does not imply that only the first shock in ǫt has an impact effect on the variable ordered first; shocks in ǫt generally still have effects. 10 Following the steps in Uhlig (2003), this amounts to solving the following principal components problem: maxS q˜ℓα λℓ

= λℓ q˜ℓ ,

ℓ ∈ {1, . . . , mz },

S=

¯ h X

h=0

    ′   Imz h Imz ′ (h + 1 − max{h, h}) B α ˜ 11 . α ˜ 11 e1 e1 B κ κ 

h

Here α ˜ 11 is the Cholesky factorization associated with (3.7) and e1 is a selection vector with zeros in all except the first position. The desired α11 is given by α11 = α ˜ 11 qℓα , where the eigenvectors q˜ℓα can be normalized to form an orthogonal matrix because S is symmetric and such that the signs of diag(α11 ) are positive. Here I used that the variance whose forecast error variance is of interest is among the first mz variables. Otherwise, redefine α11 to include κ or reorder variables as long as α11 is guaranteed to be invertible.

7

3.1.2

Posterior uncertainty

Now, I consider the case when the posterior over Γ is nondegenerate. 11 Inference is analogous to inference in a SUR model (e.g. Rossi et al., 2005, ch. 3.5). In the special case in which the control variables for Zt coincide with the variables used in the VAR, the SUR model collapses to a standard scheme hierarchical Normal-Wishart posterior. Stack the vectorized model (3.4) as follows: vSU R ∼ N (0, V ⊗ IT ),

YSU R = XSU R βSU R + vSU R ,

(3.10)

where YSU R = [vec(Y )′ , vec(Z)′ ]′ and vSU R is defined analogously. In addition, I use the following definitions: # " AA′ Γ′ V = ˜ Γ Ω " Imy ⊗ Xy XSU R = 0T mz ×T my

βSU R = 0T (my p+1)×T mz Imz ⊗ Xz

#

"

vec(B)

#

vec(µz )

,

i h Xy = Y−1 . . . Y−p 1T ,

h i Xz = 1T .

Using these definitions to transform the model makes the errors independently normally distributed. The transformation takes advantage of the block-diagonal structure of the covariance matrix: ˜ ∼ N (0, I). Standard conditional Normal-Wishart posterior distributions arise from the v˜ = Y˜ − Xβ transformed model. For the transformation, it is convenient to define U as the Cholesky decomposition of V such that U ′ U = V : ˜ X NXX (V ) ST (β)

= ((U

˜ ′X ˜ =X "

=

1 ν0 +T

"

Imy ⊗ Xy ) ⊗ IT ) 0T mz ×T my

−1 ′

0T (my p+1)×T mz Imz ⊗ Xz

#

Y˜ NXY (V )

#

(Y − XB)′ h (Y − XB) (Z − 1T µ′z )′

i (Z − 1T µ′z ) +

ν0 ν0 +T

= ((U

"

Imy ⊗ Y ) ⊗ IT ) 0T mz ×T my

−1 ′

0T (my p+1)×T mz Imz ⊗ Z

˜ ′ Y˜ =X

S0 .

With these definitions, the following Lemma holds (Rossi et al. (2005), ch 3.5, or Appendix A.2): Lemma 2. The conditional likelihoods are, respectively, conditionally conjugate with Normal and Wishart priors. Given independent priors β ∼ N (β¯0 , N0 ) and V −1 ∼ W((ν0 S0 )−1 , ν0 ) and defining β¯T (V ) = (NXX (V ) + N0 )−1 (NXY (V ) + N0 β¯0 ), the conditional posterior distributions are given by: β|V, Y T ∼ N (β¯T (V ), (NXX (V ) + N0 )−1 ), V

−1

|β, Y

T

−1

∼ W(ST (β)

/(ν0 + T ), ν0 + T ).

(3.11a) (3.11b)

In general, no closed form posterior is available. The exception occurs when SUR collapses to ˜ = Imc +mz ⊗ Xy and NXX = (V −1 ⊗ X ′ Xy ) and ordinary least squares (OLS): If Xz = Xy , then X y analogously for NXY . In this special case, closed forms are available for the marginal distribution of V ,

11 I abstract from potentially weak instruments, an issue surveyed by Lopes and Polson (2014). I show that in my application, the instruments are highly correlated with the identified shocks.

8

#

allowing me to draw directly from the posterior. In general, however, the block closed form structure gives rise to a natural Gibbs sampler as in the following algorithm: Algorithm 1 SUR-VAR 1. Initialize V (0) = ST (β¯T ). 2. Repeat for i = 1, . . . nG : (a) Draw β (i) |V (i−1) from (3.11a). (b) Draw V (i) |β (i) from (3.11b).

3.1.3

Missing instruments

While Corollary 1 shows that missing instruments do not affect shock identification, a point-mass of instruments at their sample mean will typically affect the model fit. Missing data on instruments is common in macro time series: High frequency identification of monetary policy shocks using fed funds futures data (e.g. Kuttner, 2001) is restricted to the introduction of this financial instruments in the early 1990s. Mertens and Ravn (2013) include only 13 non-missing instruments for the U.S. personal income tax rate since 1947. The Bayesian approach provides a natural way of handling missing data that avoids mistakenly specifying a pure normal distribution for shocks when the instrument have mass points in the data. This approach only requires joint normality of the instruments and structural shocks when the shock is actually observed. For all other periods, I simply impute the missing data under the same distributional assumption. Let ιV,t denote the index of the rows of the covariance matrix V corresponding to the missing instruments in period t. Let ιcV,t denote the complementary index. Finally, let ιz,t index the rows of missing instruments in the mz row vector Zt with complement ιcz,t . Joint normality then implies that the missing instruments have the following distribution: " # Y − BY − µ t t−1 y iid ZˆιZ,t ,t ,V ∼ N c ZιZ,t ,t

µιZ,t ,z +

"

Yt − BYt−1 µy ZιcZ,t ,t − µιcZ,t ,z

#



β(ιZ,t ), VιV,t ,ιV,t − β(ιZ,t )VιcV,t ,ιcV,t β(ιZ,t )

(3.12)

where β(ιZ,t ) = (VιcV,t ,ιcV,t )−1 VιcV,t ,ιV,t is the population OLS regression coefficient of the missing data on the observed data. The Gibbs sampler easily accommodates this imputation, leading to Algorithm 2.

3.2

Narrative DSGE-VAR

Having established identification and estimation in the purely narrative VAR, I now show that the conditional Cholesky method in the narrative BVAR recovers policy rule coefficients in a class of DSGE models. This class of models is invertible and only has Taylor-type policy rules, two conditions

9

!

,

Algorithm 2 SUR-VAR with missing instruments (0) 1. Initialize V (0) = ST (β¯T ) and ZιZ,t ,t = 0dim(ιZ,t )×1 ∀t.

2. Repeat for i = 1, . . . nG : (i−1) (a) Draw β (i) |V (i−1) , {ZˆιZ,t ,t }Tt=1 from (3.11a). (i−1) (i−1) (b) Draw V (i) |β (i) , {ZˆιZ,t ,t }Tt=1 from (3.11b) replacing ZιZ,t ,t = ZˆιZ,t ,t ∀t. (i)

(c) Draw {ZˆιZ,t ,t }Tt=1 |β (i) , V (i) from (3.12).

I define first. Then I adapt the idea of dummy variables (e.g., Del Negro and Schorfheide, 2004) to the previous SUR framework for estimating the narrative VAR. 3.2.1

Invertibility

A necessary condition for the VAR and DSGE models to agree on the structural shocks is that both models span the same economic shocks. Fernandez-Villaverde et al. (2007) provide succinct sufficient conditions to guarantee that the economic shocks in the state space system (2.1) matches those from the VAR (2.2): Assumption 2. A∗ is nonsingular, and the matrix C ∗ − D ∗ (A∗ )−1 B ∗ is stable. Under this condition, it follows that the forecast errors from the VAR and the DSGE model coincide, as summarized in the following lemma: 12 Lemma 3. (Fernandez-Villaverde et al., 2007, p. 1022) Let Yt be generated by the DSGE economy (2.1). Under Assumption 2, the variance-covariance matrix of the one-step-ahead prediction error in the Wold representation of Yt is given by Σ∗ = (A∗ )(A∗ )′ . This invertibility condition is not just a formality, especially for models with fiscal policy. For example, Ramey (2011) argues that anticipation of fiscal policy invalidates the identification scheme of Blanchard and Perotti (2002). Formally, Hansen et al. (1991), and Leeper et al. (2013) provide examples of economies that may fail to have a VAR representation in terms of the underlying economic shocks. Whether the VAR spans the fundamental shocks depends both on the economy and what the econometrician sees. In my DSGE-VAR, the ABCD condition states whether the VAR can recover the economic shocks when given the right instruments. I illustrate the challenge of fiscal foresight and my approach to it in an extension of the textbook New Keynesian model (Gal´ı, 2009, ch. 3). Ramey (2011) addresses fiscal foresight by backing out 12

Intuitively, xt can be expressed as a square-summable linear combination in terms of y t . Hence, Var[xt |y t ] = 0 and the Wold representation of yt is given by: yt = B ∗

∞ X

(C ∗ − D∗ (A∗ )−1 B ∗ )j D∗ (A∗ )−1 yt−1−j + A∗ ǫt .

j=0

The one-step-ahead prediction error is, therefore, yt − E[yt |y t−1 ] = A∗ ǫt with variance (A∗ )(A∗ )′ .

10

some news shocks from newspapers in her VAR. However, Ramey (2011) also cautions that her series does not capture all the news shocks. I show analytically how observing expectations consistently solves the foresight problem. Example 1. Consider the labor-only New Keynesian economy in Appendix A.3, characterized by a linearized Phillips-Curve, the linearized IS-Curve, and a Taylor rule for interest rates with interest rate smoothing, and a government spending rule that depends on current and past output: gt = γyt + κyt−1 + shocks. These policy rules generate persistence in the otherwise purely forward looking model. Shocks affect technology, interest rates, and spending. If all shocks are independent surprises, the model’s locally stable solution has a VAR(1) representation in output, interest rates, and government consumption. This VAR(1) representation breaks down with government spending news shocks. I show that observing government consumption expectations solves the problem whenever the economy has a unique, stable solution. A VAR(2) in output, interest rates, government consumption, and government consumption expectations is fundamental in this economy, because government news are recovered as νg,t =

Et [gt+1 ]−γyr rt −κγyy yt , 1−γyg

where yr , yy are the coefficients of the policy rule for output.

This simple example generalizes. Within the confines of the example, the VAR can handle extra news shocks, such as productivity news, when it includes extra observed expectations, such as expected output. Beyond this simple NK model with its quasi-analytic solution, my DSGE model prior allows testing whether the model is invertible for given parameters. In my application, I assume throughout that Assumption 2 holds so that a VAR(p) can approximate the DSGE model dynamics arbitrarily well. Thus, AA′ ≈ A∗ (A∗ )′ . This assumption is not necessarily satisfied and, in general, depends on the observables Yt . 13 With an equal number of AR(1) shock processes as observables, I found two intuitive cases in my exploratory analysis that violate Assumption 2 for most of the parameter space: First, a model with capital that exclude investment and capital as observables. This is similar to Chari et al. (2005) who point to the challenge of recovering impulse responses in VAR models in economies with capital. Second, a model with news shocks and without observed expectations. For the estimated models, however, I show in Appendix C.2 that a VAR(4) approximation captures the underlying DSGE model dynamics well. 3.2.2

Identification of policy rules using instruments

For more than one instrument, mz > 1, the narrative VAR identifies shocks only up to an arbitrary rotation. How to choose among these infinitely many rotations? Mertens and Ravn (2013) argue that, in their application, the factorization is of little practical importance since two different Cholesky compositions yield almost identical results. I argue that for a class of policy rules the lower Cholesky decomposition actually recovers the true impact matrix α[1] in the DSGE model. 14 I call the class of policy rules for which the narrative VAR correctly recovers α[1] “observable Taylor-type rules”: 13

I also verify this condition for each draw of the DSGE model parameters in my empirical application. Showing that two different Cholesky factorization do not affect the results substantially does not generally mean that the results are robust to different identifying assumptions (cf. Watson, 1994, fn. 42). If taxes follow what I define as observable Taylor rules, however, the two Cholesky factorizations coincide, rationalizing the exercise in Mertens and Ravn (2013). 14

11

Definition 1. An observable Taylor-type rule in economy (2.1) for variable yp,t is of the form: yp,t =

m X

ψp,i yi,t + λp Xt−1 + σp ǫp,t ,

i=mp +1

where ǫp,t ⊂ ǫY,t is iid and yi,t ⊂ Yt , i = 1, . . . , np . The policy rules are defined with respect to the set of observables in the structural model. The canonical Taylor rule for monetary policy based on current inflation and the output gap is a useful clarifying example: Only when the output gap is constructed based on observables, is it a “simple” policy rule according to Definition 1. Example 2. An interest rate rule with observables only is an observable Taylor rule when output yt and inflation πt are observed: rt = (1 − ρr )(γπ πt + γy yt ) + ρr rt−1 + (1 − ρr )(−γy yt−1 ) + ωr ǫrt . This rule maps into Definition 1 with yp,t = rt , λp = er ρr + (1 − ρr )(−γy )ey , ψ1 = (1 − ρr )γπ , ψ2 = (1 − ρr )γy , σp = ωr . Example 3. Taylor-type rules that depend on an unobserved output gap (i.e., y˜t = yt − ytf with ytf 6⊂ Yt ) are not observable Taylor rules: rt = (1 − ρr )(γπ πt + γy y˜t ) + ρr rt−1 + ωr ǫrt ∗ ytf = By∗f Xt−1 + A∗yf ǫt

In general, the entire column vector A∗yf is nonzero, violating the exclusion restrictions for an observable Taylor rule. The key difference between examples 2 and 3 is that according to the rules in example 3 monetary policy also reflects other policy shocks contemporaneously despite controlling for output and inflation. But as long there is no more than general policy rule that violates Definition 1, the narrative VAR with Cholesky factorization still recovers the right IRFs. Formally, the following proposition shows that in the DSGE model with observable policy rules, S1 has a special structure that allows me to identify it uniquely using Γ, Σ, up to a normalization. Equivalently, when the analogue of Assumption 1 holds in the structural model, the narrative VAR recovers the actual policy rules based on the procedure in (3.9). Proposition 1. Assume Σ = AA′ = A∗ (A∗ )′ and order the policy variables such that the mp = mz or mp = mz − 1 observable Taylor rules are ordered first and Γ = [G, 0]A∗ . Then α[1] defined in (3.9) satisfies α[1] = A∗ [Imz , 0(m−mz )×(m−mz ) ]′ up to a normalization of signs on the diagonal if (a) mz instruments jointly identify shocks to mp = mz observable Taylor rules w.r.t. the economy (2.1), or 12

(b) mz instruments jointly identify shocks to mp = mz − 1 observable Taylor rules w.r.t. the economy (2.1) and ψp,mz = 0, p = 1, . . . , mp . Proof: See Appendix A.4. While the proof proceeds by Gauss-Jordan elimination, the intuition in case (a) can be understood using partitioned regression logic: S1 S1′ is the residual variance of the first mp forecast errors after accounting for the forecast error variance to the last m − mp observed variables. Including the nonpolicy variables that enter the Taylor rule directly among the observables controls perfectly for the systematic part of the policy rules and leaves only the variance-covariance matrix induced by policy shocks. Since this variance-covariance matrix is diagonal with observable Taylor rules, the Cholesky decomposition in (3.9) works. Formally, the Cholesky factorization of S1 S1′ proposed by Mertens and Ravn (2013) imposes the mz (mz −1) 2

zero restrictions needed for exact identification. The structure imposed by having observable

Taylor rules rationalizes these restrictions in a class of DSGE models. In fact, the mechanics of the proof would carry through if the block of policy rules had a Cholesky structure, confirming that what is needed for identification via instrumental variables in the model is precisely the existence of

mz (mz −1) 2

restrictions. 15 More generally, identification requires restrictions on the contemporaneous interaction between policy instruments that need not have the form of observable Taylor rules. 3.2.3

Prior elicitation

A natural way to elicit a prior over the parameters of the VAR, tracing back to Theil and Goldberger (1961), is through dummy observations. Del Negro and Schorfheide (2004) use this approach to elicit a prior for the VAR based on a prior over structural parameters of a DSGE model. Here I adapt their approach for use within the SUR framework. Using a dummy variable prior with normally distributed disturbances no longer yields a closed-form prior, as in Lemma 2. Using dummy variables, however, still generates conditional posteriors in closed-form and conditional priors with the same intuitive interpretation as in Del Negro and Schorfheide (2004). Because the likelihood function of the SUR model is not conjugate with a closed-form joint prior over (β, V −1 ), unless it collapses to a standard VAR model, the DSGE model implied prior also fails to generate unconditionally conjugate posteriors. Therefore, I consider independent priors for the dynamics and the covariance matrix. In this approach the prior is available in closed form, but the conditional prior variance of β is necessarily independent of V −1 , unlike in the standard DSGE-VAR model. To implement the prior, I follow Del Negro and Schorfheide (2004): I generate the prior for B, V −1

15

In the notation of Appendix A.4, D0 need only be lower triangular to achieve identification, not diagonal as implied in Proposition 1. Indeed, there are extra restrictions coming from the assumption of observable Taylor rules rather than allowing for a block-diagonal structure in the Taylor rules. Rather than imposing these overidentifying restrictions, I back out the implied Taylor-type policy rules and analyze the distribution of implied coefficient loadings.

13

by integrating out the disturbances to avoid unnecessary sampling error. That is, the prior is centered at: ¯ y (θ) = EDSGE [X0 X0′ |θ]−1 EDSGE [X0 Y0′ |θ] ⇔ β¯y (θ) = vec(B ¯ y (θ)) B 0 0 0 # " ∗ ∗ ′ ∗ ′ A(θ) (A(θ) ) (A(θ) )[diag([c , . . . , c ]), 0] 1 mz , V¯0 (θ)−1 = [diag([c1 , . . . , cmz ]), 0](A(θ)∗ )′ ωz2 diag(A1 (θ)∗ (A1 (θ)∗ )′ ) + A1 (θ)∗ (A1 (θ)∗ )′

(3.13a) (3.13b)

where EDSGE [·|θ] denotes the unconditional expectation based on the linear DSGE model (2.1) when the coefficient matrices are generated by the structural parameters θ. V¯0 (θ) satisfies Assumption 1 with G = Imz and measurement error that is independent across instruments. The variance of the measurement error is parametrized to be ωz2 times the univariate shock variance. The following dummy observations and likelihood implement the prior that β ∼ N (β¯0 , NXX (V¯0 )) and V −1 ∼ W(V¯0 T0V , T0V ): ¯ 0,SU R (θ)β¯0 (θ) + 0, vec([Y0B , Z0B ]) = X

(3.14a)

¯0,SU R (θ)β¯0 (θ), V¯0 (θ) ⊗ I B ) vec([Y0B , Z0B ]) ∼ N (X T 0

[Y0V , Z0V ] = 0 × β + V¯0 (θ) ⊗ IT V , 0

vec([Y0V , Z0V ]) ∼ N (0, V ⊗ IT V ) 0

(3.14b) (3.14c) (3.14d)

where X0,SU R is the Cholesky factor of the following matrix: ¯ 0,SU R (θ)′ X ¯ 0,SU R (θ) = EDSGE [X ′ (V¯ (θ)−1 ⊗ Ip(m+m ) )XSU R |θ]. X SU R z Appendix C.4 A.8 provides additional details on the prior densities. The important distinction to a standard DSGE-VAR approach is that the coefficient prior depends on V¯0 (θ) and not on the unknown covariance V . The prior incorporates the Normal likelihoods over the dummy observations and Jeffrey’s prior over V −1 along with scale factors chosen to make the prior information equivalent to T0B observations about B and T0V observations on V −1 . Lemma 2 implies that the corresponding marginal priors are given by: ˜ 0′ (V −1 ⊗ I)X ˜ 0′ )−1 ), β|V −1 , θ ∼ N (β¯0 (θ), (T0B × X ˜ 0 = diag([X y , . . . , X y , X z , . . . , X z ]), X 0 0 0 0 V −1 |β, θ ∼ Wm+mz (SSR0 (β, θ)−1 , T0V ) SSR0 (β, θ) = T0V × V¯0 (θ) + T0B ([Y0 (θ), Z0 (θ)] − X0 B(β))([Y0 (θ), Z0 (θ)] − X0 B(β))′ . For fixed θ (e.g., θ fixed at its prior mean θ¯0 ), inference simply proceeds according to Lemma 2. To estimate θ, I need an extra step. Specifically, if θ has a non-degenerate prior distribution, I can simulate its posterior by sampling θ|β, V −1 . Allowing for a nondegenerate distribution then also yields estimates of the structural parameters of the DSGE model as a byproduct of the DSGE-VAR

14

estimation. The prior and data density are given by: θ ∼ π(θ)

(3.15a)

π(B, V −1 |θ) ∝ |V −1 |−ny /2 ℓ(B, V −1 |Y0 (θ), Z0 (θ)) = |V −1 |−ny /2 f (Y0 (θ), Z0 (θ)|B, V −1 ) f˜(Y, Z|B, V

−1

, θ) = f (Y, Z|B, V

−1

(3.15b)

).

(3.15c)

The conditional posterior for B, V −1 |θ is as characterized before. The conditional posterior for θ can be written as: π(θ|B, V −1 , Y, Z) = R

f (Y, Z|B, V −1 )π(B, V −1 |θ)π(θ) p(B, V −1 |θ)π(θ) R = f (Y, Z|B, V −1 )π(B, V −1 |θ)π(θ)dθ π(B, V −1 |θ)π(θ)dθ

∝ π(B, V −1 |θ)π(θ),

(3.16)

as in the example in Geweke (2005, p. 77). Adding a Metropolis-within-Gibbs step to the previous Gibbs sampler 2 allows me to simulate from the DSGE-VAR with a nondegenerate prior: See Algorithm 3. Algorithm 3 DSGE-VAR with missing data −1 to OLS estimates. (1) Initialize VAR parameters: Set B(0) , V(0) R (2) Initialize structural parameters: θ0 = Θ θπ(θ)dθ. (0)

(3) Initialize missing instruments: ZιZ,t ,t = 0dim(ιZ,t )×1 ∀t.

(4) Metropolis-Hastings within Gibbs: (a) Draw a candidate θc from θc ∼ FΘ (·|θ(d−1) )). Assign unstable draws or draws violating the Fernandez-Villaverde et al. (2007) condition zero density. (b) With probability αd−1,i (θc ), set θ(d) = θc , otherwise, set θ(d) = θ(d−1) . (

−1 |θc )π(θc ) π(B(d−1) , V(d−1)

fΘ (θ(d−1) |θc ) αd−1,i (θc ) = min 1, −1 π(B(d−1) , V(d−1) |θ(d−1) )π(θ(d−1) ) fΘ (θc |θ(d−1) )

)

.

(3.17)

(d−1)

−1 , {ZˆιZ,t ,t }Tt=1 according to (3.11a) and including dummy observations. (c) Draw B(d) |θ(d) , V(d−1) (d−1) −1 |B(d) , θ(d) , {ZˆιZ,t ,t }Tt=1 according to (3.11b) and including dummy observations. (d) Draw V(d) (d)

−1 according to (3.12). (e) Draw {ZˆιZ,t ,t }Tt=1 |B(d) , θ(d) , V(d)

(f) If d < D, increase d by one and go back to (a), or else exit.

To implement Algorithm 3, I use a random-blocking Metropolis-Hastings step with random walk

15

proposal density with t-distributed increments. 16 To calibrate the covariance matrix of the proposal density, I use a first burn-in phase with a diagonal covariance matrix for the proposal density. The observed covariance matrix of the first stage is then used in subsequent stages up to scale. I use a second burn-in phase to calibrate the scale to yield an average acceptance rate across parameters and draws of 30%. To initialize the Markov chain, I then use a third burn-in phase whose draws are discarded. The order of the parameters is uniformly randomly permuted, and a new block is started with probability 0.15 after each parameter. This Metropolis-Hastings step is essentially a simplified version of the algorithm proposed by Chib and Ramamurthy (2010). Similar to their application to the Smets and Wouters (2007) model, I otherwise obtain a small effective sample size because of the high autocorrelation of draws when using a plain random-walk Metropolis-Hastings step. 17, 18 When comparing identification schemes and to illustrate the role of the prior over B and V , I also show results with a Minnesota prior for the VAR dynamics B. I specify the prior for B via dummyobservations (e.g. Del Negro and Schorfheide, 2011) with a random walk prior for non-stationary variables and a sum-of-coefficients prior for all variables. Crucially, I work with a pure Jeffrey’s prior for the variance-covariance matrix V . While the effects of the Minnesota prior relative to a flat prior are almost unnoticeable for my baseline results, the Minnesota prior smooths the posterior in model variants estimated on smaller samples or with more coefficients that I show for robustness. 19

3.3

Marginal likelihood

The joint distribution of the data [Y, Z], the missing instruments Zˆ ≡ {ZˆιZ,t }t , the VAR parameters β, V −1 , and the DSGE model parameters θ is given by ˆ β, V −1 , θ) = p(Y, Z, Z|β, ˆ V −1 )p(β, V −1 |θ)p(θ), p(Y, Z, Z, where equation (A.19) in Appendix (A.5) spells out the individual components. Integrating out the missing observations, the VAR parameters, and the DSGE parameters gives the marginal likelihood: p(Y, Z) =

Z Z Z Z

ˆ β, V −1 , θ)dβdV −1 dZdθ ˆ p(Y, Z, Z,

(3.18)

Because the prior is a function of T0V , T0B , the marginal likelihood is, implicitly, indexed by these 16

The t-distribution has 15 degrees of freedom as in Chib and Ramamurthy (2010). See also Herbst and Schorfheide (forthcoming). 18 An alternative approach to increasing the efficiency of the sampler could be metropolization: Liu (1995) shows that, in a discrete setting, metropolization of the Gibbs sampler can lead to efficiency increases in terms of variance reduction. More generally, Robert and Casella (2005, ch. 10.3) discuss metropolization of the Gibbs sampler as a possibility to speed up the exploration of the parameter space. Here, metropolization is potentially interesting when the DSGE model prior becomes tight. Intuitively, a tight DSGE model prior means that movements in β become smaller. If the initial β is in an isolated region of the posterior, the resulting θ draws also tend to be drawn from the same region, and crossing over into other regions of the parameter space can be slow. This is not a problem in Del Negro and Schorfheide (2004) because they can draw directly from the marginal distribution of θ. 19 I use λ = 0.2 for the random walk prior and λ = 1 for the sum-of-coefficients prior and the decay parameter of the random walk prior. 17

16

prior hyperparameters. I now discuss how to interpret the marginal likelihood as a function of the prior DSGE model weights. Next, I summarize the computation of the marginal likelihood. 3.3.1

Interpretation

Del Negro and Schorfheide (2004) introduce the prior DSGE model weight with the highest marginal likelihood as a diagnostic of misspecification: If the DSGE prior generates observations with properties like the data, these are informative and improve the model fit by reducing weight put on the wrong parameters. Del Negro et al. (2007) show in an AR(1) case with known variance that when DSGE model and sample moments differ, there can be an interior optimum for the prior weight. This trades off shrinkage with bias. In other cases, they find that the best fitting prior weight can diverge, so that a high weight on the pure DSGE model or the (almost) flat prior VAR can emerge as optimal. Thus, the analysis of the prior weight that maximizes the marginal likelihood is meaningful and has a clear interpretation. My model differs from Del Negro et al. (2007) in two dimensions: First, because of the extra information via instruments, my prior is not conjugate. Second, I allow for the weight on the covariance matrix and the dynamics to differ. I therefore characterize the behavior of the marginal likelihood in terms of the prior weight in the case of my model in Appendix A.6.1. For the case of known model dynamics, I characterize the marginal likelihood analytically and prove a lemma characterizing the slope of the marginal likelihood in the scalar case. 20 The appendix also provides a numerical example for the empirically relevant matrix case. 21 In summary, I show in Appendix A.6.1 that when the DSGE model prior fits the sample moments well, the marginal likelihood is strictly increasing in the number of dummy observations. In particular, the analytical results imply that the marginal likelihood is strictly increasing in T0V when the prior variance fits well enough. Vice versa, when the DSGE model fits the sample variance (sufficiently) poorly, the marginal likelihood is strictly decreasing in the prior weight T0V . A good fit of the model implied covariance matrix, therefore, shows in a higher optimal T0V that may even diverge to +∞. 3.3.2

Computation

I combine the methods of Chib (1995) and Geweke (1999) to compute the likelihood (3.18). Specifically, I first compute the marginal likelihood conditional on a specific θd – that is, the three inner integrals in (3.18) – using the method of Chib (1995) for models with fully conditional posteriors. This conditional marginal likelihood, when combined with the prior, gives the kernel of the θ posterior that I use in the Geweke (1999) algorithm. While I find that relatively few draws – that is, 2,000 draws after 1,000 burn-in draws – are accurate to ±0.1 log points given θd , the repeated approximation takes

20

See Lemma 5 in Appendix A.6.1. In the analytic characterization, as in Del Negro et al. (2007), I consider the case of a single DSGE model parameter that maps directly into the VAR variance parameters. 21 My analysis of different weights for model dynamics for the covariance and model dynamics is related to Park (2011) who also parametrizes the prior over model dynamics and the static model part separately. In contrast to my analysis, however, he does not include external information in the VAR that would identify shocks.

17

time. Since the posterior draws {θd }d are autocorrelated, I subsample every jth draw to yield a more efficient sample of 1,000 posterior draws that economizes on computing time. 22

4

Empirical specification

4.1

Data and sample period

I use seven variables in the estimation: Government spending, the average labor tax rate, and the effective Federal Funds Rate (FFR) are the fiscal and monetary policy instruments. Real GDP, real investment (including consumer durables), the debt-to-GDP ratio, and GDP inflation are additional standard macroeconomic variables. GDP and its components are in per capita terms. Appendix C describes the details of the data construction, which follows Fernandez-Villaverde et al. (2015) for the tax time series. All fiscal variables aggregate the federal government with state and local governments. To ensure my sample period covers periods of significant variation in the fiscal variables, I start the estimation in 1947:Q1. This period includes Korean War expenditures as well as episodes of declining and rising debt-to-GDP ratios. This can be important, because Bohn (1991) has argued that long samples can be important to capture slow moving debt dynamics. I stop the estimation in 2007:Q4, before the zero lower bound became binding. The resultant sample period is similar to that in Christiano et al. (2011a). Following Francis and Ramey (2009) and Ramey (2011), I allow for a quadratic trend. 23 I use newly digitized Greenbook data as a continuously available proxy for defense spending and to compute an updated series on monetary policy shocks. In addition, I use narrative tax shocks from Mertens and Ravn (2013). Specifically, I follow Ramey (2011) and use one-quarter ahead defense spending forecast errors as a proxy for exogenous government purchases. Ramey (2011) computes the forecast errors from the Survey of Professional Forecasters that stopped surveying defense spending in 1982. In contrast, Greenbooks report defense spending forecasts continuously since 1969. 24 I also update the Romer and Romer (2004) series of monetary policy shocks beyond 1996. Appendix C.1 provides details. 25 In an extension, I also use the newly digitized Greenbook data to control for fiscal foresight. Specifically, I use Greenbook forecasts on federal government purchases and revenue four quarters 22

See Appendix A.7 for details. For the best-fitting model I also compute different chains and double the chain-length and find that the differences between the three chains are within 6 natural log-points. An alternative to computing the marginal likelihood would be to specify an explicit prior over τ = (T0B , T0V ) and estimate τ as Adjemian et al. (2008) do for a standard DSGE-VAR. The advantage of my approach is that I can report model fit independent of the prior over the hyperparameters τ . 23 I detrend prior to estimation to match the detrended data with my stationary DSGE model. 24 Restricted to the same sample period at the SPF forecast errors, the posterior uncertainty is larger with the Greenbook forecasts. This is intuitive because model averaging typically improves forecasts. In the full sample, the extra data availability serves to reduce the uncertainty. 25 In unreported robustness checks, I also consider alternative instruments for monetary policy shocks. First, I consider shocks identified by Kuttner (2001) and updated by G¨ urkaynak et al. (2005) based on federal funds futures changes around Federal Open Market Committee announcements. Second, I apply the same identification strategy to 13-week T-Bill futures that started trading more than a decade earlier, combining the announcement dates from Romer and Romer (2004) with those in G¨ urkaynak et al. (2005). The source of the daily T-Bill data is Thomson Reuters’ “CME-90 DAY US T-BILL CONTINUOUS” series from June 1976 to September 2003.

18

out as observables in the VAR to control for the possibility of news shocks. Theoretically, I show in Example 1 that including such forecasts makes an otherwise non-fundamental VAR fundamental in a version of the work-horse New-Keynesian model. Empirically, the announcements and implementation dates in Yang (2007) suggest that U.S. tax reforms are typically implemented within a year. Thus, expectations of policy four quarters ahead should capture fiscal foresight.

4.2

DSGE model specification

In this section, I outline the empirical specification of the generic DSGE model in (2.1). The model is based on the standard medium-scale New Keynesian model as exemplified by Christiano et al. (2005) and follows closely Smets and Wouters (2007). There is monopolistic competition in intermediate goods markets and the labor market with Calvo frictions in price and wage adjustment, partial price and wage indexation, and real frictions such as investment adjustment cost and habit formation. I add labor, capital, and consumption taxes as in Drautzburg and Uhlig (2015) and fiscal rules as in Leeper et al. (2010) and Fernandez-Villaverde et al. (2015). Here, I only discuss the specification of fiscal and monetary policy. The remaining model equations are detailed in Appendix A.8. The monetary authority sets interest rates according to the following standard Taylor rule:  rˆt = ρr rˆt−1 + (1 − ρr ) ψr,π π ˆt + ψr,y y˜t + ψr,∆y ∆˜ yt + ξtr ,

(4.1)

where ρr controls the degree of interest rate smoothing and ψr,x denotes the reaction of the interest rate to deviations of variable x from its trend. y˜ denotes the output gap (i.e., the deviation of output from output in a frictionless world). ξtr follows an AR(1) process. 26 The fiscal rules allow for both stabilization of output and the debt burden as well as smoothing of the different fiscal instruments: 27 ¯b  gˆt = ρg gˆt−1 + (1 − ρg ) −ψg,y yˆt − ψg,b ˆbt + ξtg γ y¯  ¯b  sˆt = ρs sˆt−1 + (1 − ρs ) −ψs,y yˆt − ψs,b ˆbt + ξts γ y¯  ¯b  w¯ ¯n n w¯ ¯n n dτ = ρτ dτ + (1 − ρτ ) ψτ n ,y yˆt + ψτ n ,b ˆbt + ξtτ,n y¯ t y¯ t−1 γ y¯ 

(4.2a) (4.2b) (4.2c)

◦ + ǫ◦ . 28 The sign of the coefficients The disturbances ξt◦ follow exogenous AR(1) processes: ξt◦ = ρ◦ ξt−1 t

in the expenditure components gt , st is flipped so that positive estimates always imply consolidation in good times (ψ◦,y > 0) or when debt is high (ψ◦,b > 0).

26

Money supply is assumed to adjust to implement the interest rate and fiscal transfers are adjusted to accommodate monetary policy. 27 Leeper et al. (2010) assume there is no lag in the right-hand side variables, while Fernandez-Villaverde et al. (2015) use a one quarter lag. 28 Not only the fiscal policy shocks but all shocks in my specification follow univariate AR(1) processes, unlike Smets and Wouters (2007) who allow some shocks to follow ARMA(1,1) processes. Ruling out MA(1) components helps to guarantee that a VAR can approximate the DSGE model dynamics as discussed by Fernandez-Villaverde et al. (2007).

19

The consolidated government budget constraint is: 29 ¯b r¯k w¯ ¯n c¯ (¯ r k − δ)k¯ p (ˆbt − rˆ) + (dτtn + τ¯n (w ˆt + n ˆ t )) + τ¯c cˆt + (dτtk + τ¯k (ˆ rtk k + kˆt−1 )) r¯y¯ y¯ y¯ y¯ r¯ − δ ¯b =ˆ gt + sˆt + (ˆbt−1 − π ˆt ) γ y¯

(4.3)

Following Christiano et al. (2011a), I include a cost channel of monetary policy: Firms have to borrow at the nominal interest rate to pay the wage bill at the beginning of the period. This allows monetary policy to cause inflation in the short run. 30 The debate about variable selection in singular DSGE models is still ongoing; see Guerron-Quintana (2010), Canova et al. (2013) and the comment by Iskrev (2014). When fitting the model to the data, I consider as many structural shocks as observables in the observation equation of the DSGE model (2.1a) and the VAR (2.2a). Including the policy variables naturally suggests to include the corresponding policy shocks. Additionally, I include a total factor productivity (TFP) and investment specific technology shock to explain GDP and investment, a price markup shock to contribute to inflation, and a shock to lump-sum transfers to add variation to the debt-to-GDP ratio. In the robustness check that allows for fiscal foresight, I model fiscal policy as the sum of a contemporaneous and an anticipated component. The contemporaneous component is given by (4.2) as before. Analogous rules determine the anticipated component two quarters out in terms of current observables. In addition, I allow for a productivity news shock, also revealed two quarters in advance. Backing out the DSGE model implied historical shocks generally requires Kalman smoothing. Here, I exploit that under the invertibility Assumption 2, Lemma 3 implies that the data eventually fully reveal the hidden state variables of the DSGE model. Consequently, the contemporaneous uncertainty matrix for the state is zero. I initialize the Kalman filter with this matrix so that the Kalman smoother coincides with the Kalman filter. 31 I use Dynare (Adjemian et al., 2011) to solve the DSGE model for a given set of parameters. To limit the dimensionality of the estimation problem, I calibrate a number of structural parameters and focus on the estimation of policy rules and shock processes (Table 1). These parameters largely correspond to the prior mean in Smets and Wouters (2007). Average tax rates are calibrated as in Drautzburg and Uhlig (2015). Priors for policy rules and shock processes are standard: I follow Smets and Wouters (2007) for common parameters and choose similar priors for new parameters of the fiscal rules. To parametrize the observation equations (3.2) of the instruments, I assume that both the matrix of loadings G and the covariance matrix of measurement errors are diagonal. My prior for the loadings ci is that they are centered around unity, given the appropriate scaling of the narrative variables. Both the loadings and the relative standard errors of the measurement error have inverse gamma priors, so that the instruments are relevant for all parameter draws. The prior for the relative standard deviation is a 29

Seigniorage revenue for the government enters negatively in the lump-sum transfer to households sˆt . This feature increases the marginal likelihood by about three log points in the benchmark specification. 31 Since convergence to the fixed point is fast, the results are virtually unchanged when I initialize the filter with the stationary variance instead and use the Koopman (1993) Kalman smoother algorithm. 30

20

Table 1: Calibrated parameters Parameter Elasticity of substitution (inverse) σ Discount rate (quarterly) Capital share α Depreciation rate δ Net TFP growth (quarterly) Steady state gross wage markup Kimball parameters Steady state government spending Steady state consumption tax rate Steady state capital tax rate Steady state labor tax rate

Value 1.500 0.5% 0.300 0.020 0.4% 1.150 10.000 0.200 0.073 0.293 0.165

relatively tight prior with a mean of 0.5 and a standard deviation of 0.1. This prior is intended to make the instruments informative to allow them to influence other parameter estimates: Overall the prior mean implies a signal-to-noise ratio of two in terms of standard deviations. Table C.1 in the Appendix lists all estimated parameters alongside their prior distributions. 32

4.3

DSGE-VAR model specification

I use a VAR with p = 4 lags throughout this paper. The baseline model has seven variables and all three narrative instruments listed earlier. In the model with news shocks, I add expectations of government spending and overall output four quarters ahead as observables, along with shocks to future spending and productivity. 33 The specification of the lags follows Ramey (2011), but I also find that this finite lag approximation to the underlying DSGE model does well empirically: Figure C.5 for the baseline model and Figure C.7 in the Appendix show that the VAR dynamics match that of the underlying estimated DSGE model well. When I estimate the model with the DSGE model prior, I verify that the invertibility condition in Fernandez-Villaverde et al. (2007) holds for every draw. In the reported results, I do not control for exogenous variation in the instruments by including regressors Xtz other than a constant. For robustness, however, I check if anticipated shocks or commodity prices drive the government spending or monetary policy instruments. This check is simple in the SUR framework: I add extra regressors that can soak up variation in the instruments. The regressors are the Fisher and Peters (2010) defense spending excess returns and the log-change in the producer price commodity index. I find no noticeable change in the results. To calibrate the Gibbs sampler, I discard the first 50,000 draws as a burn-in period and keep every 20th draw thereafter until accumulating 5,000 draws. With a low prior weight on the DSGE model, this generates negligible autocorrelations of model summary statistics. The sampler is less 32 Increasing the prior standard deviation increases the marginal likelihood, but otherwise leads to qualitatively similar results. In earlier stages, I tested the assumption that the matrix G is diagonal under full information and I found evidence in favor of the current parsimonious parametrization. 33 I switch the anticipated tax shock rule and anticipated tax shocks off and give up on identifying current tax shocks. This is both due to the small number of tax shock proxies and because I show in Figure C.7(c) that the approximation quality in the model with news is poor for tax shocks.

21

efficient with a stronger DSGE model prior but performs well with the above sample size for moderate weights on the DSGE prior. Appendix C.4 also presents evidence on the convergence of the parameter estimates based on the Brooks and Gelman (1998) diagnostic and compares IRFs for chains started at different seeds and with longer lengths.

5

Results

I now discuss the fit of the narrative DSGE-VAR along its dynamic and its identifying dimension, measured as the tightness of priors over coefficients B, and the covariance matrix V . Then I analyze the implied impulse-response functions, historical shocks, and policy functions in the DSGE-VAR. Last, I analyze the estimated DSGE model parameters.

5.1

Marginal likelihood

Figure 1(a) shows the marginal likelihood for the baseline model as a function of the strength of the prior for DSGE model dynamics, T0B , for different values of T0V . The marginal likelihood initially increases in T0B , then flattens, and eventually falls for all values of T0V . There is thus clear evidence for an interior peak in terms of the model dynamics. However, the data clearly prefer the weakest weight for the prior on the DSGE model identification. The best-fitting model features T0B = 4T and T0V = 15 T , i.e. a weight of four sample sizes for the model coefficient B and of one fifth for the covariance matrix V . 34 Thus, a pure DSGE model is misspecified, but less so for model dynamics than for shock-identification. The qualitative finding that DSGE model dynamics agree more with the data than the shock identification is robust to fiscal foresight. To do so, I consider a variant of the model with spending and technology news shocks and observed expectations on output and government spending. Figure 1(b) shows that the results mirror Figure 1(a). Qualitatively, the results are unchanged: The best-fitting model puts an intermediate weight on the DSGE-model prior for model dynamics, but at each point the data prefer a weaker prior on the DSGE-model covariance. The fact that both the likelihood and the relative weight on model dynamics are higher than in the baseline case likely reflects the shorter estimation sample in this extended model. The result that the data prefer the weakest prior for the DSGE-model covariance also holds true when considering other model variations. In Figure C.8 in the Appendix, I drop the cost-channel of monetary policy first and, second, I allow the monetary authority to respond also to the output gap measured relatively to a flex-price economy. For both model variants I find that the marginal likelihood is strictly decreasing in the weight on the DSGE-model covariance. Together, my findings suggest that also the empirical findings here extend to a larger set of models beyond the specific model put forward. To better understand the model fit along the two dimensions of shock identification and dynamics, 34

For the best-fitting model, I have re-computed the likelihood of the best-fitting model using different seeds and doubling the number of simulations for the two additional seeds. The three different estimates lie 2 to 6 (natural) log-points from each other. The plots shows results averaged across chains.

22

(a) Baseline model

(b) News shocks and observed expectations TV =T 0

-2450

marginal data density

marginal data density

-2400

-2500 -2550 TV 0 TV 0 TV 0

-2600 -2650

=T =T/2

TV =T/4 0

-1600

-1700

-1800

=T/5

-1900 .2 .5 1

2

3 4 5 7.5

inf

.25 .5 1

2

3 4 5 7.5

inf

/T TB 0

/T TB 0

The marginal narrative DSGE-VAR likelihood peaks at an interior overall weight on the DSGE model restrictions: For the baseline model in panel (a), T0B = 4T and T0V = 51 T (i.e., adding four full samples worth of observations on DSGE model dynamics but only one fifth of a sample worth on shock and their covariance), yields the highest data density. Panel (b) shows the analogous plot for a variant of the model with news shocks and observed expectations. The qualitative finding is the same: The model fit is the highest with a significant weight on DSGE model dynamics and a small weight on the covariance structure. The estimation sample is 20 years shorter in panel (b), explaining the higher data density and potentially why a larger relative sample size is preferred for the model dynamics. Overall, this is evidence against the DSGE model implied identification via its contemporaneous covariance structure.

Figure 1: Marginal likelihood for varying DSGE model weights: Baseline model and model with news shocks. I proceed by presenting the IRFs that shape the model dynamics in the baseline model. Subsequently, I turn to the policy shocks directly to gain intuition on the identification. Policy rule estimates and, more generally, DSGE model parameter estimates provide an additional way to understand the results summarized by the likelihood. For presenting the historical shocks and the estimated IRFs, I focus on the best-fitting DSGE-VAR model with a weak prior to over V and an informative prior over B, i.e., the model with T0V = 51 T and T0B = 4 × T .

5.2

IRFs

5.2.1

Benchmark DSGE-VAR estimates

Figure 2 shows responses of private output to the three policy shocks identified in the DSGE-VAR along with the policy instruments themselves: Shown in black and shades of gray are the pointwise posterior median and 68% and 90% credible sets. I find it useful to report results for private output. Private output is overall GDP minus government consumption and investment. 35 Overall, the results confirm intuition: Spending increases are expansionary, tax and interest rate increases are contractionary. However, the responses differ in their size, timing, and the estimated precision. Start with the government spending shock. A 1% shock leads to an additional build-up in government spending that declines but persists for more than five years. This spending increase causes 35

I compute private output assuming a constant share of government spending in real GDP of 20%: yˆt − 0.2ˆ gt .

23

private sector output to rise persistently, but with a one year delay. Consequently, the impact multiplier slightly is centered around 1, but the credible set rises to a range between 1.25 and 2.0 after five years, as Figure 3 shows. 36 The DSGE-VAR estimate is consistent with other papers that estimate multipliers using variation in defense spending: Ramey (2011, p. 31) estimates a 5-year cumulative multiplier of 1.2. Amir-Ahmadi and Drautzburg (2017, Fig. 4.13) identify an interval of 1.0 to 3.0 consistent with macro sign restrictions and industry-level heterogeneity restrictions at the 5-year horizon. G to G shock

Tax rate to Tax

G to unit shock in G

2.5

1.5

2

Real rate to FFR shock

tax to unit shock in tax

p.p.

%

1 p.p.

1

1.5

Real rate to unit shock in FFR

1.5

0.5

0.5

1

0

0.5

0

-0.5

0 0

5

10 quarters

15

Y to G shock

5

10 quarters

15

20

0

Y to Tax

Private Output to G

0.6

-0.5 0

20

10 quarters

15

20

Y to FFR shock

Private Output to tax

1

5

Private Output to FFR

2

0.4

0

0

0

%

%

%

0.2

-1

-2

-0.2

-2

-0.4 0

5

10 quarters

15

20

posterior median

0

5

10 quarters

15

68% posterior CS

20

-4 0

5

10 quarters

15

20

90% posterior CS

There is a buildup in government spending in response to a government spending shock, causing a significant and lasting increase in output. A tax shock raises tax rates persistently but implies a smooth decline to zero. With a one quarter lag, output drops persistently. A FFR shock increases the real rate and causes a significant output drop after two quarters, with no significant response on impact.

Figure 2: Policy shocks and output responses in best-fitting DSGE-VAR (T0B = 4T and T0V = 51 T ) Tax shocks have a half-life of slightly less than five years, but decline smoothly. The estimate of their effect on private output is noisy, but I find that an increase in tax rates by one percentage point leads to a decrease in private sector activity that with a one year delay that lasts for about one year – significant with a 68% probability. The traditional output multiplier, however, is insignificant, because overall output may rise with 68% probability, because of off-setting effects on government spending. The multiplier on private output, in contrast, is weakly positive with 68% probability starting at the six quarter horizon (Figure 3). 37 36

I compute PDV multipliers as the ratio of the discounted sum of GDP changes to the discounted sum of government spending, using a discount factor of 0.99 per quarter. I use a share of G in GDP of 20% and of labor taxes in GDP of 10%. For the numerator, I only consider first round effects: For shocks to G, I only consider sum the increase in G on expenditure. For shocks to taxes, I only consider (minus) the increase in tax rates, times the average size of the tax base. 37 The noise surrounding these estimates reflects the sparsity of data on tax shock proxies.

24

Overall multiplier of G

Private multiplier of G

PDV Multiplier to G

2.5 2

1

1.5

0.5

1

0

0.5

-0.5

0

-1

-0.5

Overall multiplier of taxes

PDV Multiplier to G

1.5

5

10 quarters

15

20

0

5

10 quarters

posterior median

15

10

20

0

10

-10

0

PDV Multiplier to tax

-10 0

20

Private multiplier of taxes 30

-20

-1.5 0

PDV Multiplier to tax

20

5

68% posterior CS

10 quarters

15

20

0

5

10 quarters

15

20

90% posterior CS

The increase in output following a government spending shock is initially driven by government spending itself, generating an impact multiplier around 1. The present discounted value multiplier rises over time, with a 68% credible set for the five year multiplier between 1.25 and 2.0. The multiplier for output net of government spending is, mechanically, shifted down by 1. In contrast, for a tax shock, the response of government spending muddies the analysis and the overall multiplier estimate is ambiguous. However, the 68% confidence intervals for the multiplier effect on private output is weakly positive starting about seven quarters after the initial shock.

Figure 3: Present-discounted value multipliers for overall and private output for fiscal shocks in bestfitting DSGE-VAR (T0B = 4T and T0V = 15 T ). Shocks to monetary policy, in contrast to fiscal shocks, have a half-life of only about one year. The rise in the nominal interest rate causes the real rate to rise and leads, with a delay of half a year, to a hump-shaped drop in private output that peaks about 2.5 years after the shock. The effects revert to zero after five years. These results reflect both the prior, the identification scheme, the information set, and the sample period. I now discuss each of these elements and argue that the qualitative results are a robust feature of the data, but that we still need a certain amount of economic reasoning to reach quantitative conclusions. 5.2.2

Robustness

The role of the prior.

Here I briefly discuss what the data alone tells us about the responses and

how modest amounts of prior information help to sharpen inference at different horizons. As a concrete example, Figure 4 shows the response of the PDV multiplier for GDP in response to a government spending shock for four different priors. Two lessons emerge: First, the estimates based on a flat prior convey the same qualitative information. The impact multiplier is centered around one and rises significantly above one at the five year horizon. Second, shrinking the VAR coefficients alone by introducing a Minnesota prior on the coefficient matrix B already rules out the most extreme multiplier estimates, and shifts the multiplier estimates slightly towards one. The weakest DSGE-model prior I consider cuts the credible set for the impact response by one third relative to both the flat prior VAR and the Minnesota prior for B. Tightening the DSGE model prior for the model dynamics shifts the multiplier estimates down towards zero and narrows the credible sets around longer-horizon estimates while the uncertainty on impact remains largely unchanged. This is intuitive, because the uncertainty on impact mostly re-

25

flects uncertainty about shock-identification since model dynamics have little effect on the covariance between shock proxies and forecast errors. Minnesota prior for B Flat prior for V

Flat prior for B, V Mult to G shock

5

Mult to G shock

5

Weak prior: = T0V = 51 T

T0B

Best-fitting model: T0V = 51 T, T0B = 4T

Mult to G shock

5

4

4

4

4

3

3

3

3

2

2

2

2

1

1

1

1

0

0

0

0

-1

-1

-1

-1

0

5

10 quarters

15

20

0

5

10 quarters

posterior median

15

20

0

5

68% posterior CS

10 quarters

15

Mult to G shock

5

20

0

5

10 quarters

15

20

90% posterior CS

A weak DSGE model prior tightens the posterior bands substantially, as illustrated here with the present discounted multiplier of a government spending shock. Tightening the prior on DSGE model dynamics, T0B , leaves the uncertainty about the impact response largely unchanged, but narrows the credible set at longer horizons.

Figure 4: Effects of the prior precision on the posterior uncertainty: PDV government spending multipliers. More generally, the best-fitting DSGE-model prior increases the persistence of government spending and implies an impact multiplier close to unity and above 1.0 at the 5-year horizon with 90% posterior probability, both in the baseline model and the model with news shocks and observed expectations. See Figure C.15 in the Appendix. The role of the identification scheme.

Based on Proposition 1, I identify multiple shocks using

the conditional (lower) Choleski factorization. If the true model features observable Taylor-type rules for fiscal policy, but not necessarily for monetary policy, this is the correct procedure. But this procedure embodies economic theory beyond the exclusion restriction typical for IV estimation. The identified set in the sense of Moon et al. (2011) based only on the instruments alone is, however, too large to allow inference. To identify policy shocks separately, one thus has to take a stand on what tells them apart. Figure 5 compares the response of private output in the baseline identification scheme with two alternatives. These other approaches to (point-)identify the three policy shocks differ in how they factor the variance attributable to the policy shocks. Panel (a) shows an upper triangular factorization of the three shocks. I find little changes between the two triangular methods, similar to Mertens and Ravn (2013). This is consistent with monetary policy responding only to the variables actually included in the VAR, as opposed to others only spanned by the VAR. 38 Panel (b) shows the results when recursively maximizing the conditional FEVD to recover shocks. 39 These results for the response of private output to the shocks are similar to the baseline, but more uncertain. Specifically, for the government spending shock little changes. For the tax shock, the credible set widens, but the results 38

If monetary policy responded directly to an output gap measure or inflation expectations, this assumption would be violated. 39 I choose to maximize the cumulative FEVD from 4 to 19 quarters after impact. I exclude the 0 to 3 quarter horizon to stay away from imposing zero impact restrictions.

26

remain significant at the 68% level. Similarly, for the monetary policy shock the credible set widens, but responses remain significant at the 68% level. The wider posterior bands are natural in the FEVD because it also depends on the uncertain VAR dynamics. Overall, the qualitative findings appear robust to different identifying assumptions, but identification based on economic theory via observable Taylor-type rules allows the sharpest inference. (a) Comparison of upper and lower conditional Cholesky G shock Tax shock FFR shock Private output to G shock

0.8

Private output to tax shock

2

0.6

2 0

0.2

%

0

%

0.4

%

Private output to FFR shock

4

-2

-2

0

-4

-0.2

-4

-0.4 0

5

10

15

-6 0

20

5

10

15

20

0

5

quarters

quarters

Conditional lower Cholesky

10

15

20

quarters

Conditional upper Cholesky

(b) Comparison of conditional Cholesky and FEVD G shock Tax shock FFR shock Private output to G shock

1

Private output to tax shock

4

4

2 0.5

2

%

%

%

0 -2 -4

-4

-6 0

5

10

15

20

0 -2

0

-0.5

Private output to FFR shock

6

-6 0

5

10

15

20

0

quarters

quarters

Conditional Cholesky

5

10

15

20

quarters

Conditional FEVD

This figure shows two analogues to the output responses in the bottom panel of Figure 2 with a flat VAR prior and two different identification schemes. The output responses are qualitatively the same with a flat prior and the DSGE model prior but more uncertain. The qualitative results for the fiscal policy shocks are also largely unchanged when the shocks are decomposed by maximizing the cumulative forecast error variance (FEVD) due to the fiscal shocks at the one to five year horizon, except that the contraction of output in response to a monetary policy shock becomes less significant.

Figure 5: Robustness of output effects across identification schemes with a Minnesota prior on B and flat prior on V .

The role of extra instruments.

I extend the short-term government spending instrument from

Ramey (2011), the defense-spending forecast errors, by digitizing Greenbook forecast errors. I also update the Romer and Romer (2004) instruments using more recent Greenbook releases. Using this extra variation helps to significantly reduce estimation uncertainty for identified government spending shocks. To highlight the variation coming from the data alone, I compare the estimates in the flat prior VAR. Comparing columns (a) and (b) in Figure 6 shows that the extra instruments reduce the 90% credible range of 2.0 percentage points after five years by about one quarter and the uncertainty 27

about the impact response of private output and the multiplier by roughly half. The improvements for monetary policy shocks are small, but render the price puzzle insignificant; see Figure C.11(a) and (b). 40 (d) Post 1966 + observed E

2

2

1.5

1.5

1.5

1.5

1 0.5

1 0.5

2.5

%

2

%

2

%

2.5

1 0.5

1 0.5

0

0

0

0

-0.5

-0.5

-0.5

-0.5

10 quarters

15

20

0

5

10 quarters

15

20

0

0

-0.5

0

-0.5 0

5

10 quarters

15

20

10 quarters

15

20

0.5

%

%

0.5

5

0

-0.5 0

5

10 quarters

15

20

5

10 quarters

15

20

4

4

4

3

3

3

2

2

2

2

1

1

1

1

0

0

0

10 quarters

15

20

0

5

10 quarters

posterior median

15

20

10 quarters

15

20

0

5

10 quarters

15

20

-0.5 0

3

5

5

0

4

0

0

0.5

%

5

0.5

%

Private Output to G

(c) Post 1966

2.5

0

PDV overall multiplier

(b) Updated IV

2.5

%

G to G

(a) IV from literature

0 0

5

10 quarters

68% posterior CS

15

20

0

5

10 quarters

15

20

90% posterior CS

Figure 6: Effects of the information set and sample period: response of output to government spending.

The role of fiscal foresight.

Last, I assess whether the impulse-responses are driven by fiscal

foresight or the sample period. Figure 6 (c) restricts the estimation sample to the time period after 1966 when I have data on expectations. Column (d) also uses expectations of output and government spending four quarters out in the estimation. Both the estimation sample and the information set matter somewhat for the results. However, the high persistence of government spending and the impact multipliers around 1.0 are a robust feature of the data. The differences concern mostly the three to five year horizon and the estimation uncertainty: With the more recent estimation sample, the response of government declines towards zero after five years and is estimated more precisely. Private output still increases in a hump-shaped manner, but its response is noisier and the effects subside after three to four years. The distribution of the PDV multiplier is tighter and now includes 1.0 in the 90% credible set at the five year horizon. Including expectations data has only small effects on the government spending response, but shrinks the response of private output further towards zero. This provides only mild support for crowding in of private activity for a few quarters starting one year after the shock when output remains weakly positive with 68% posterior probability. Consequently, the PDV multiplier distribution shifts more towards unity and now includes 1.0 in the 68% credible 40 I omit tax shocks from the comparison (and the underlying estimation) because I could not update the underlying series and the post 1966 sample has too few observations on the instruments to yield informative estimates.

28

set at the five year horizon. The results for the monetary policy shock change little across samples and when controlling for expectations; see Figures C.16, which addresses fiscal foresight, and C.12, which controls for inflation expectations, bond spreads, and oil prices. 5.2.3

DSGE-VAR vs DSGE model

Having established that the key findings are driven by the data and do not hinge on specific sample periods or information sets, I now turn to comparing the estimates DSGE-VAR with the corresponding DSGE model. The goal is to understand the shortcomings of the DSGE model that lead the data to reject the identifying restrictions implied by the DSGE model and to prefer only a modest weight on the model dynamics. Impulse-response functions reflect both identification and model dynamics and this section discusses the shortcomings of the DSGE model relative to the DSGE-VAR impulse responses. 41 In particular, the pure DSGE model roughly matches the responses of most variables to fiscal shocks that include significant responses of monetary policy to these fiscal shocks. As the DSGE-VAR produces responses to monetary policy shocks reminiscent of the price puzzle, the pure DSGE model fails to match those responses. It also cannot replicate the response of government spending to the other identified shocks. Government spending

Tax rate

G to G

3

FFR

tax to G

1.5

1

0 %

%

p.p.

2

FFR to G

0.05

0.5

-0.05

1 0

0

-0.1

-0.5

0

5

10

15

20

-0.15

0

Inflation

15

20

0

5

10

15

20

Investment

Private Output to G

0.6

Inv to G

1

0.4

0

0 p.p.

0.2 %

p.p.

10

Output

Inf to G

0.1

5

-0.1

0

-0.2

-1

-0.2

-0.3

-2

-0.4

0

5

10

15

20

0

5

10

DSGE-VAR

15

20

0

5

10

15

20

Pure DSGE

The pure DSGE model and the best-fitting DSGE-VAR largely agree on the response to a government spending shock, except for the size of the build-up in government spending. Importantly, the pure DSGE model replicates the drop in the funds rate and the inflation rate following a government spending shock.

Figure 7: Full set of responses to spending policy shock with best-fitting model (T0B = 4T, T0V = 15 T ).

41

I choose to present results based on the state-space representation of the DSGE model. None of the results depend on this, because the DSGE-VAR and the pure DSGE model produce virtually identical responses. See Figure C.5 in the Appendix.

29

For the government spending shock in Figure 7, 42 the first order mismatch is that the DSGE model overstates the government spending buildup and thus the size of the wealth effect. Otherwise, the DSGE model largely succeeds in matching the DSGE-VAR responses after taking parameter uncertainty into account. For example, the DSGE model matches the zero impact response of private output that turns positive with a delay. The pure DSGE model also matches the implied spending multipliers; see Figure C.10. The second striking fact about the identified responses is that both inflation and the FFR drop both in the DSGE-VAR and the pure DSGE model. This is possible even though monetary policy in the model reacts only to output and inflation. 43 Figure 8 shows the responses to a tax rate shock. The responses reveal that the DSGE model matches the dynamics of the tax rate and most responses to it, but cannot replicate the estimated initial increase in government spending following a tax hike. Otherwise, and aided by the large uncertainty surrounding the tax shock estimates, the pure DSGE model matches the qualitative features of the DSGE-VAR responses. The pure DSGE model also matches the pattern of the implied private output multiplier that is initially close to zero and becomes significantly positive at longer horizons; see Figure C.10. The estimates also indicate, albeit noisily, monetary accommodation of the contractionary tax shock. In the pure DSGE model this response is driven by the underlying reaction to output and inflation. Government spending

Tax rate

G to tax

4

FFR

tax to tax

1.5

FFR to tax

0.2

3 1

0.1

1

%

p.p.

%

2 0.5

0

0 0

-0.1

-1 -2

-0.5

0

5

10

15

20

-0.2

0

5

Inflation

15

20

0

Output

Inf to tax

0.6

10

5

Private Output to tax

1

10

15

20

15

20

Investment Inv to tax

4 2

0.4

p.p.

0.2

%

p.p.

0

0

0 -2

-1

-4

-0.2

-2

-0.4 0

5

10

15

20

-6

0

5

10

DSGE-VAR

15

20

0

5

10

Pure DSGE

The estimates of the response to taxes are noisy. Given the uncertainty, the pure DSGE model and the best-fitting DSGE-VAR largely agree on the response to a tax spending shock, except for the flat response of government spending in the DSGE model.

Figure 8: Full set of responses to tax rate shock with best-fitting prior (T0B = 4T, T0V = 15 T ). 42

The responses of debt to all shocks are shown in Figure C.9 in the Appendix, immediately before the responses in the best-fitting DSGE-VAR. 43 This pattern may be driven by the large wealth effect driving down marginal costs in the DSGE model.

30

The responses to the monetary policy shock in Figure 9 are the most precisely estimated responses and show the largest discrepancies between the DSGE-VAR and the pure DSGE model. The latter struggles to explain the initial responses of output, investment, and inflation – reminiscent of the price puzzle. This is despite including a cost channel of monetary policy as in Christiano et al. (2011a). 44 The DSGE model also fails to match the sizable fiscal contraction at the one-year horizon that the DSGE-VAR implies. Government spending

Tax rate

G to FFR

4

1.5

2

FFR

tax to FFR

FFR to FFR

1.5

1

1

-2

0.5 %

p.p.

%

0 0.5

0

-4

0

-0.5

-6 -8

-1

0

5

10

15

20

-0.5

0

5

Inflation

15

20

0

Output

Inf to FFR

0.5

10

10

15

20

20

p.p.

%

p.p.

5

15

-5

-4 0

20

0

-2

-1

15

Inv to FFR

5

0

-0.5

10

Investment

Private Output to FFR

2

0

5

-10

0

5

10

DSGE-VAR

15

20

0

5

10

Pure DSGE

The estimates of the response to monetary policy shocks are the most precise. The estimates show that, despite including a cost-channel of monetary policy, pure DSGE model cannot fit the initial responses of private output and inflation, as well as the delayed decline in government spending.

Figure 9: Full set of responses to FFR shock with best-fitting prior (T0B = 4T, T0V = 15 T ).

5.3

Historical shocks

In addition to impulse-responses, historical shocks are an important implication of DSGE models. I now analyze whether the pure DSGE model agrees with the narrative DSGE-VAR. The DSGE-VAR, in turn, agrees with the instruments: I find that the three shock proxies have median correlations between 0.48 and 0.62 with the corresponding identified shocks, with 90% credible sets of 0.37 and higher. Thus, the shock proxies are of reasonably good quality and identify the policy shocks well. See the middle column in Table 2. The structural shocks implied by the DSGE-VAR and the pure DSGE model counterpart line up well overall: The median correlation is 0.76 for spending shocks, 0.51 for tax shocks, and 0.77 for 44

This finding may hint at another dimension of misspecification of monetary policy shock measures. For example, Caldara and Herbst (2015) argue that monetary policy also reacts to credit spread shocks. However, I show in Figure C.16 that the result is robust to the fiscal foresight and a shorter sample period and show in Figure C.12 that controlling for inflation expectations or BAA bond spreads and oil prices does not change the result.

31

Table 2: Historical shocks in best-fitting DSGE-VAR: Correlation with DSGE model and non-zero instruments.

Shock G Tax FFR

Structural correlations DSGE-VAR vs DSGE Median (90% band) 0.76 (0.58, 0.85) 0.51 (0.23, 0.76) 0.77 (0.66, 0.85)

“First stage” correlations DSGE-VAR vs IV DSGE vs IV Median (90% band) Median (90% band) 0.48 (0.40, 0.54) 0.53 (0.51, 0.54) 0.54 (0.37, 0.68) 0.28 (0.12, 0.40) 0.62 (0.59, 0.65) 0.52 (0.48, 0.55)

The table shows the posterior median correlations (and posterior credible sets) between the identified DSGE-VAR shocks and the corresponding structural shocks from the DSGE model, as well as the correlation of the structural shocks with the non-missing instrumental variables (IV). Underlying are draws from the joint posterior of the structural parameters θ and the corresponding VAR parameters.

monetary policy shocks. This correlation is robust across sub-samples: Figure 10 shows the time series for the spending shock from the DSGE-VAR in black and from the pure DSGE model in red along with point-wise credible sets. While the pure DSGE model overstates the size of the positive shock sequence associated with the U.S. involvement in the Korean war starting in the second half of 1950, the two shock series track each other closely. 45 , 46 standard deviations

G shock correlation = 0.85 10 5 0 -5 1950

4 2 0 -2 -4 1970

1955

1960

1965

1970Q1 – 1989Q4 G shock correlation = 0.80

1975

1980

1990Q1 – 2007Q4 G shock correlation = 0.86

4 2 0 -2 -4 1990

1995

2000

2005

Full sample standard deviations

standard deviations

standard deviations

1949Q2 – 1969Q4

1985 DSGE-VAR

G shock correlation = 0.76 (0.58,0.85) 10 5 0 -5 1950 1960 1970 1980 1990 2000 Pure DSGE

The solid lines represent the DSGE-VAR shocks the dashed lines the corresponding DSGE model shocks. The plot of the entire sample shows that the shocks are reasonably close to iid. Their correlation is high and robust across subsamples.

Figure 10: Historical government spending shocks 45

Figures C.13 and C.14 in the Appendix show these subsamples for the tax and monetary policy shocks. Comparing shocks is in the spirit of the work by Rudebusch (1998). Sims (1998) disputes that comparing different historical monetary policy shocks across different VARs in Rudebusch (1998) is meaningful. Their dispute involves VARS with differing information sets, whereas I compare models with identical information sets except for the instruments. Since the instruments do not add information in the DSGE model, forecasts and forecast errors from these models should be close if the VAR approximation is accurate, as I assume in Proposition 1 and verify in Figure C.6. Comparing the subset of identified shocks that I extract from these forecast errors is therefore a meaningful exercise. 46

32

For moderate amounts of prior information, the shock correlation between the DSGE-VAR and the pure DSGE model is largely flat: Table 3(a) shows that for prior weights ranging from a weak to the best-fitting prior with a weight on DSGE model dynamics of up to four sample sizes, the shock correlations change little. With a dogmatic prior on DSGE model dynamics and a weak prior on the covariance structure, the median correlation rises to 0.94 for for monetary policy shocks and 0.87 for spending shocks, but remains low for tax shocks. This, together with the low correlation of structural shocks from the pure DSGE model with the narrative tax instruments in Table 2, indicates that the DSGE model has trouble fitting the historical path of tax shocks in the U.S. While the correlations increase only slowly with the weight on DSGE model dynamics, Panel 3(b) shows that the correlations with the pure DSGE model increase noticeably when moving from T0V = 15 T to a higher weight on the DSGE model identification with T0V = T . However, I showed that more weight on DSGE model identification lowers the model fit. Table 3: Correlations of structural shocks: DSGE-VAR vs pure DSGE model. (a) Weak prior over V : T0V = 51 T Shock G Tax FFR

T0B = 15 T , T0V = 51 T Median (90% band) 0.67 (0.51, 0.77) 0.54 (0.37, 0.67) 0.78 (0.69, 0.85)

T0B = T , T0V = 15 T Median (90% band) 0.73 (0.59, 0.82) 0.51 (0.28, 0.69) 0.75 (0.63, 0.84)

T0B = 4T , T0V = 51 T Median (90% band) 0.76 (0.58, 0.85) 0.51 (0.23, 0.76) 0.77 (0.66, 0.85)

T0B = ∞, T0V = 51 T 0.87 0.76 0.94

(0.61, 0.96) (0.41, 0.91) (0.88, 0.98)

(b) Stronger prior over V : T0V = T Shock G Tax FFR

T0B = 15 T , T0V = T Median (90% band) 0.70 (0.61, 0.78) 0.57 (0.45, 0.67) 0.84 (0.78, 0.89)

T0B = T , T0V = T Median (90% band) 0.76 (0.66, 0.82) 0.61 (0.41, 0.74) 0.79 (0.71, 0.86)

T0B = 4T , T0V = T Median (90% band) 0.81 (0.72, 0.87) 0.63 (0.23, 0.77) 0.78 (0.66, 0.86)

T0B = ∞, T0V = T 0.92 0.91 0.98

(0.79, 0.98) (0.78, 0.96) (0.96, 0.99)

The table shows the posterior median correlations (and posterior credible sets) between the identified DSGE-VAR shocks and the corresponding structural shocks from the DSGE model. Underlying are draws from the joint posterior of the structural parameters θ and the corresponding VAR parameters.

5.4

Policy rules

Partial identification of policy shocks implies identification of the underlying policy rules. Arias et al. (2015) exploit this to identify shocks with sign restrictions, and it underlies the link between the narrative identification and the DSGE model in Proposition 1. It is therefore instructive to back out the underlying policy rules to understand the mechanics of the DSGE-VAR model. It turns out that the raw data are uninformative about policy rules: The lower panel in Table 4 shows that the policy rule coefficients have wide posterior confidence bands that often include zero for the best-fitting model. For example, the response of monetary policy to inflation and output is insignificant. This is where a stronger DSGE model prior over V could help: It pulls the imprecise estimates of the policy rule coefficients toward those implied by the DSGE model prior and renders 33

Table 4: Policy rule estimates (a) Best-fitting model: T0B = 4T, T0V = 51 T

G G tax FFR

tax 0.00 (-0.00, 0.00)

0.01 (-0.24, 0.16) -0.02 (-0.05, 0.00)

FFR 0.00 (-0.00, 0.00) 0.00 (-0.00, 0.00)

0.04 (-0.03, 0.12)

Y -1.81 (-5.92, 0.27) -0.30 (-1.05, 0.80) 0.05 (-0.07, 0.19)

Debt -1.41 (-4.47, 0.24) -0.33 (-0.94, 0.54) 0.03 (-0.08, 0.14)

Inv 0.03 (-0.30, 0.47) 0.00 (-0.15, 0.16) -0.01 (-0.03, 0.01)

Inf -0.15 (-2.84, 2.44) -0.48 (-1.50, 0.63) 0.07 (-0.08, 0.22)

Inv 0.03 (-0.16, 0.27) 0.00 (-0.08, 0.09) -0.01 (-0.02, 0.01)

Inf 0.04 (-1.26, 1.28) -0.11 (-0.63, 0.38) 0.13 (0.03, 0.23)

Inv 0.00 (0.00, 0.00) -0.00 (-0.00, -0.00) 0.00 (0.00, 0.00)

Inf -0.00 (-0.00, 0.00) -0.00 (-0.01, 0.01) 0.23 (0.20, 0.27)

(b) Stronger prior: T0B = 4T, T0V = T G G tax FFR

tax -0.00 (-0.00, 0.00)

0.01 (-0.11, 0.11) -0.02 (-0.04, 0.01)

FFR -0.00 (-0.00, 0.00) 0.00 (-0.00, 0.00)

0.04 (-0.01, 0.08)

Y -1.17 (-3.04, 0.02) -0.08 (-0.52, 0.44) 0.07 (-0.02, 0.16)

Debt -0.90 (-2.26, 0.01) -0.10 (-0.44, 0.28) 0.04 (-0.03, 0.11)

(c) Dogmatic prior: T0B = T0V = ∞ G G tax FFR

tax -0.00 (-0.00, 0.00)

-0.00 (-0.00, -0.00) -0.00 (-0.00, -0.00)

FFR 0.00 (-0.00, 0.00) -0.00 (-0.00, 0.00)

0.00 (0.00, 0.00)

Y -0.00 (-0.00, -0.00) 0.02 (0.01, 0.03) 0.01 (0.01, 0.02)

Debt -0.00 (-0.00, 0.00) 0.00 (-0.00, 0.00) -0.00 (-0.00, 0.00)

Shown are the posterior median and 90% credible set of the policy rule coefficients implied by the partially identified DSGE-VAR. The coefficients on policy interactions have a triangular pattern by construction. With a weak prior, the posterior over the coefficients is very dispersed. The best-fitting model implies monetary tightening in response to higher inflation and some indications of accommodating fiscal policy. The estimates for fiscal policy rules are very dispsersed and are bounded away from zero only with a very strong prior.

the inflation and output responses positive. Figure 11 illustrates this clearly for the monetary policy rule: With the DSGE model prior, the policy rule estimates imply that the Fed increases interest rates in response to higher inflation or output growth. With a dogmatic prior on the DSGE model, we recover that monetary policy only responds to inflation and output. However, the data clearly prefer the model with the diffuse policy rule estimates. (a) Best-fitting model: T0B = 4T, T0V = 15 T

(b) Stronger prior: T0B = 4T, T0V = T

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

0

-0.1

0.3

-0.1 G

tax FFR Y

Debt Inv

Inf

(c) Dogmatic prior: T0B = T0V = ∞

-0.1 G

tax FFR Y

Debt Inv

Inf

G

tax FFR Y

Debt Inv

Inf

Shown are the posterior median and 68% and 90% credible set of the monetary policy rule coefficients implied by the partially identified VAR with DSGE model prior. With the best-fitting DSGE-VAR prior, the model is inconclusive about monetary tightening in response to higher inflation but finds evidence of accommodating fiscal policy. Only priors putting more weight on the DSGE-model identification find a significant reaction to output and inflation.

Figure 11: Monetary policy rule estimates For both fiscal policy shocks, the estimated DSGE-VAR impulse responses found that monetary policy accommodates: Lower interest rates follow expansionary G shocks and higher interest rates the contractionary tax shocks. The pure DSGE model could match this because fiscal shocks affect inflation via marginal costs – indirectly via the wealth effects and the pass-through of higher taxes. My policy rule estimate, however, indicates that there is also a direct element of accommodation in 34

the monetary policy rule. Are these contemporaneous reactions of monetary policy to fiscal policy plausible? Romer and Romer (2014) provide qualitative evidence that the Federal Reserve has indeed considered fiscal policy in its monetary policy decisions. They document staff presentations to the FOMC suggesting monetary accommodation of the 1964 and 1972 tax cuts (p. 38f) as well as monetary easing in response to the 1990 budget agreement. The tax-inflation nexus is also reflected in staff presentations and in the comments during at least one FOMC meeting: According to Romer and Romer (2014), the Federal Reserve Bank staff saw social security tax increases in 1966, 1973, and 1981 as exerting inflationary pressure (p. 40). While Romer and Romer (2014) do not reach a clear-cut conclusion whether monetary policy accommodated fiscal policy, the deliberations they document are consistent with my finding that the systematic component of monetary policy reacts to fiscal policy.

5.5

Parameter estimates

As a byproduct of the estimation of the DSGE-VAR, the estimator also yields estimates of the structural DSGE model parameters. For brevity, I present the full estimates only in Table C.1 in the Appendix. One feature that emerges is that the DSGE-model parameters feature only a low persistence of most structural shocks, but a high degree of smoothing in the policy rules. This helps to explain the inability of the pure DSGE model to match the responses of government spending to the other policy shocks. With a stronger prior on V , the parameter estimates also feature higher real and nominal frictions in the form of adjustment and fixed costs, and the Calvo price stickiness. 47 With few exceptions, the posterior differs significantly from the prior and the policy rule estimates are economically meaningful. The exception is the wage indexation parameter whose posterior is close to its prior. This is plausible because I do not use wages in the estimation. (a) T0V = 51 T G tax FFR

1.5

(b) T0V = T

2 signal to noise ratio

signal to noise ratio

2

1

0.5

0

G tax FFR

1.5

1

0.5

0 .2 .5 1

2

3 4 5 TB 0

7.5

inf

.2 .5 1

2

3 4 5

7.5

inf

TB /T 0

/T

The plot shows the signal-to-noise ratio as a function of the prior weight given to the DSGE model. The ratio is defined as the standard deviation of the instrument attributable to the structural shock divided by the standard deviation of the measurement noise. The prior signal-to-noise ratio in the DSGE model is 2. The plot implies that the data is informative about the signal-to-noise ratio that falls initially quickly and then stabilizes around 0.4 for the two fiscal instruments and falls toward 0.75 for the monetary policy shock.

Figure 12: Signal-to-noise ratio of instruments with varying DSGE model weight

47 Figure C.19 in the Appendix uses the Brooks and Gelman (1998) diagnostic to show that the posterior simulator for θ has converged reasonably well.

35

I now turn to the parameters that are specific to my narrative DSGE-VAR: the parameters of the observation equation for the narrative instruments in the DSGE model. Figure 12 shows the signal to noise ratio estimated for each of the three narrative shocks. For the best-fitting model with T0B = 4T in panel (a), the posterior median signal-to-noise ratio is about 0.5 for all three instruments. With the stronger prior on V , the distribution of the estimated signal-to-noise ratios seems to shift somewhat down in panel (b) compared to panel (a). Intuitively, because the data prefers the looser prior on the DSGE model identification, it selects lower signal-to-noise ratios when forced to tighten the prior. The differences concern, however, mostly the center of the distribution, while the credible sets overlap.

6

Conclusion

A key question for academics and practitioners using quantitative DSGE models is whether these models agree with methods that “get by with weak identification” (Sims, 2005, p. 2). This paper formally assesses the potential misspecification of DSGE models with respect to the identification of shocks through external instruments, extending earlier work (Del Negro et al., 2007) that does not assess the DSGE model identification. The paper makes two methodological contributions: First, it shows that the VAR identification based on narrative instruments correctly identifies policy rules in DSGE models when these are Taylor rules with limited direct interaction between policy instruments. Second, it shows how to estimate a Bayesian proxy VAR using the standard SUR framework and how to incorporate a DSGE model prior. As an additional contribution, this paper provides data on fiscal shock proxies and expectations. In particular, I use Greenbook data to extend the short-term spending instrument in Ramey (2011) to provide an extended series of defense spending surprises that I use to identify government spending shocks. This dataset also allows me to control for expectations of future spending, revenue, and output. In terms of the substance, I find that a standard medium-scale DSGE model such as Christiano et al. (2005) and Smets and Wouters (2007) augmented with fiscal Taylor rules improves the overall statistical fit of the DSGE-VAR model as measured by the marginal data density. However, by varying separately the prior weight on the DSGE model dynamics and covariance, I show that this fit comes from matching the dynamics, rather than the covariance structure in the data. This finding carries over from the baseline model with contemporaneous shocks to an extended model with news shocks and observed expectations, as well as to two variants of the baseline model with a different monetary policy rule or no cost channel of monetary policy. Looking at impulse responses shows that the best-fitting DSGE-VAR and the corresponding pure DSGE model largely agree on the dynamics following fiscal shocks, but the pure DSGE model cannot capture the interaction between policy tools. The model has trouble matching the dynamics following a monetary policy shock. My estimates of policy rules show that systematic monetary policy broadly moves with fiscal policy, reinforcing spending or tax increases.

36

References Adjemian, Stephane, Houtan Bastani, Michel Juillard, Ferhat Mihoubi, George Perendia, Marco Ratto, and Sebastien Villemot, “Dynare: Reference Manual, Version 4,” Dynare Working Papers 1, CEPREMAP 2011. , Matthieu Darracq Paries, and Frank Smets, “A quantitative perspective on optimal monetary policy cooperation between the US and the euro area,” Working Paper Series 0884, European Central Bank March 2008. Amir-Ahmadi, Pooyan and Thorsten Drautzburg, “Identification Through Heterogeneity,” Working Papers 17-11, Federal Reserve Bank of Philadelphia May 2017. Arias, Jonas E., Dario Caldara, and Juan Rubio-Ramirez, “The Systematic Component of Monetary Policy in SVARs: An Agnostic Identification Procedure,” 2015. unpublished, Duke University. Blanchard, Olivier and Roberto Perotti, “An Empirical Characterization Of The Dynamic Effects Of Changes In Government Spending And Taxes On Output,” The Quarterly Journal of Economics, November 2002, 117 (4), 1329–1368. Bohn, Henning, “Budget balance through revenue or spending adjustments?: Some historical evidence for the United States,” Journal of Monetary Economics, 1991, 27 (3), 333 – 359. Brooks, Stephen P and Andrew Gelman, “General methods for monitoring convergence of iterative simulations,” Journal of Computational and Graphical Statistics, 1998, pp. 434–455. Caldara, Dario and Edward Herbst, “Bayesian Proxy SVAR: Theory and Application to Monetary Policy,” 2015. unpublished, Federal Reserve Board of Governors. Canova, Fabio, Filippo Ferroni, and Christian Matthes, “Choosing the variables to estimate singular DSGE models,” CEPR Discussion Papers 9381 March 2013. , , and , “Approximating Time Varying Structural Models With Time Invariant Structures,” 2015. unpublished, Federal Reserve Bank of Richmond. Chari, V. V., Patrick J. Kehoe, and Ellen R. McGrattan, “A critique of structural VARs using real business cycle theory,” Technical Report 2005. Chib, Siddhartha, “Marginal likelihood from the Gibbs output,” Journal of the American Statistical Association, 1995, pp. 1313 – 1321. and Srikanth Ramamurthy, “Tailored randomized block MCMC methods with application to DSGE models,” Journal of Econometrics, 2010, 155 (1), 19 – 38. Christiano, Lawrence J., Martin Eichenbaum, and Charles L. Evans, “Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy, February 2005, 113 (1), 1–45. , Mathias Trabandt, and Karl Walentin, “DSGE Models for Monetary Policy Analysis,” in Benjamin M. Friedman and Michael Woodford, eds., Handbook of Monetary Economics, Vol. 3 of Handbook of Monetary Economics, Elsevier, 2011, chapter 7, pp. 285–367. Christiano, Lawrence, Martin Eichenbaum, and Sergio Rebelo, “When Is the Government Spending Multiplier Large?,” Journal of Political Economy, 2011, 119 (1), 78 – 121. 37

Clarida, Richard, Jordi Gal´ı, and Mark Gertler, “Monetary Policy Rules And Macroeconomic Stability: Evidence And Some Theory,” The Quarterly Journal of Economics, February 2000, 115 (1), 147–180. Del Negro, Marco and Frank Schorfheide, “Priors from General Equilibrium Models for Vars,” International Economic Review, 2004, 45 (2), pp. 643–673. and 2011.

, Bayesian Macroeconometrics Oxford Handbooks in Economics, Oxford University Press,

, , Frank Smets, and Rafael Wouters, “On the Fit of New Keynesian Models,” Journal of Business & Economic Statistics, April 2007, 25, 123–143. Drautzburg, Thorsten and Harald Uhlig, “Fiscal Stimulus and Distortionary Taxation,” Review of Economic Dynamics, October 2015, 18 (4), 894–920. , Jes´ us Fern´ andez-Villaverde, and Pablo Guerr´ on-Quintana, “Political Distribution Risk and Aggregate Fluctuations,” Working Paper 23647, National Bureau of Economic Research August 2017. Faust, Jon, “The new macro models : washing our hands and watching for icebergs,” Sveriges Riksbank Economic Review, 2009, pp. p. 45–68. Fernandez-Villaverde, Jesus, Juan F. Rubio-Ramirez, Thomas J. Sargent, and Mark W. Watson, “ABCs (and Ds) of Understanding VARs,” American Economic Review, June 2007, 97 (3), 1021–1026. , Pablo A. Guerron-Quintana, Keith Kuester, and Juan Rubio-Ramirez, “Fiscal Volatility Shocks and Economic Activity,” Technical Report 11 November 2015. Fisher, Jonas D.M. and Ryan Peters, “Using Stock Returns to Identify Government Spending Shocks,” Economic Journal, 05 2010, 120 (544), 414–436. Francis, Neville and Valerie A. Ramey, “Measures of per Capita Hours and Their Implications for the Technology-Hours Debate,” Journal of Money, Credit and Banking, 09 2009, 41 (6), 1071–1097. Gal´ı, Jordi, Monetary Policy, Inflation, and the Business Cycle: An Introduction to the New Keynesian Framework, Princeton University Press, 2009. Geweke, John, “Using simulation methods for bayesian econometric models: inference, development, and communication,” Econometric Reviews, 1999, 18 (1), 1–73. , Contemporary Bayesian Econometrics and Statistics Wiley Series in Probability and Statistics, Wiley, 2005. Guerron-Quintana, Pablo A., “What you match does matter: the effects of data on DSGE estimation,” Journal of Applied Econometrics, 2010, 25 (5), 774–804. G¨ urkaynak, Refet S., Brian Sack, and Eric Swanson, “The Sensitivity of Long-Term Interest Rates to Economic News: Evidence and Implications for Macroeconomic Models,” American Economic Review, March 2005, 95 (1), 425–436. Hansen, Lars Peter, William Roberds, and Thomas J. Sargent, “Time Series Implications of Present Value Budget Balance and of Martingale Models of Consumption and Taxes,” in Lars Peter Hansen and Thomas J. Sargent, eds., Rational Expectations Econometrics, Westview Press Boulder, 1991, chapter 5, pp. 121–162. 38

Herbst, Edward P. and Frank Schorfheide, “Sequential Monte Carlo Sampling for DSGE Models,” Journal of Applied Econometrics, forthcoming. Iskrev, Nikolay, “Choosing the variables to estimate singular DSGE models: Comment,” Dynare Working Papers 41, CEPREMAP October 2014. Koopman, Siem Jan, “Disturbance Smoother for State Space Models,” Biometrika, 1993, 80 (1), 117–126. Kuttner, Kenneth N., “Monetary policy surprises and interest rates: Evidence from the Fed funds futures market,” Journal of Monetary Economics, 2001, 47 (3), 523–544. Leeper, Eric M., Michael Plante, and Nora Traum, “Dynamics of fiscal financing in the United States,” Journal of Econometrics, June 2010, 156 (2), 304–321. , Todd B. Walker, and Shu-Chun Susan Yang, “Fiscal Foresight and Information Flows,” Econometrica, 05 2013, 81 (3), 1115–1145. Lhuissier, St´ ephane and Urszula Szczerbowicz, “Corporate Debt Structure and Unconventional Monetary Policy in the United States,” unpublished, Banque de France 2017. Liu, Jun S., “Metropolized Gibbs sampler: An improvement,” 1995. unpublished, Department of Statistics, Stanford University. Lopes, Hedibert F. and Nicholas G. Polson, “Bayesian Instrumental Variables: Priors and Likelihoods,” Econometric Reviews, June 2014, 33 (1-4), 100–121. Mertens, Karel and Morten O. Ravn, “The Dynamic Effects of Personal and Corporate Income Tax Changes in the United States,” American Economic Review, June 2013, 103 (4), 1212–47. Moon, Hyungsik Roger and Frank Schorfheide, “Bayesian and Frequentist Inference in Partially Identified Models,” Econometrica, 2012, 80 (2), 755–782. , , Eleonora Granziera, and Mihye Lee, “Inference for VARs Identified with Sign Restrictions,” NBER Working Papers 17140, National Bureau of Economic Research, Inc June 2011. Park, Woong Yong, “Evaluation of DSGE Models: With an Application to a Two-Country DSGE Model,” unpublished, Princeton University 2011. Ramey, Valerie A., “Identifying Government Spending Shocks: It’s all in the Timing,” The Quarterly Journal of Economics, 2011, 126 (1), 1–50. Robert, Christian P. and George Casella, Monte Carlo Statistical Methods Springer Texts in Statistics, Springer, 2005. Romer, Christina D. and David H. Romer, “A New Measure of Monetary Shocks: Derivation and Implications,” American Economic Review, September 2004, 94 (4), 1055–1084. and , “Transfer Payments and the Macroeconomy: The Effects of Social Security Benefit Changes, 1952-1991,” Working Paper 20087, National Bureau of Economic Research May 2014. Rossi, Barbara and Sarah Zubairy, “What Is the Importance of Monetary and Fiscal Shocks in Explaining U.S. Macroeconomic Fluctuations?,” Journal of Money, Credit and Banking, 2011, 43 (6), 1247–1270.

39

Rossi, Peter E., Greg M. Allenby, and Rob McCulloch, Bayesian Statistics and Marketing Wiley Series in Probability and Statistics, Wiley, 2005. Rudebusch, Glenn D, “Do Measures of Monetary Policy in a VAR Make Sense?,” International Economic Review, November 1998, 39 (4), 907–31. Sims, Christopher A, “Comment on Glenn Rudebusch’s “Do Measures of Monetary Policy in a VAR Make Sense?”,” International Economic Review, November 1998, 39 (4), 933–41. , “The state of macroeconomic policy modeling: Where do we go from here?,” 2005. accessed 02/12/14. Smets, Frank and Rafael Wouters, “Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach,” The American Economic Review, 2007, 97 (3), 586–606. Stock, James H. and Mark W. Watson, “Disentangling the Channels of the 2007-2009 Recession,” Brookings Papers on Economic Activity, 2012, pp. 81–141. Theil, H. and A. S. Goldberger, “On Pure and Mixed Statistical Estimation in Economics,” International Economic Review, 1961, 2 (1), pp. 65–78. Uhlig, Harald, “What moves GNP?,” draft, Humboldt Universit¨ at zu Berlin, 2003. , “What are the effects of monetary policy on output? Results from an agnostic identification procedure,” Journal of Monetary Economics, March 2005, 52 (2), 381–419. Waggoner, Daniel F. and Tao Zha, “Confronting model misspecification in macroeconomics,” Journal of Econometrics, 2012, 171 (2), 167 – 184. Bayesian Models, Methods and Applications. Watson, Mark W., “Vector autoregressions and cointegration,” in R. F. Engle and D. McFadden, eds., Handbook of Econometrics, Vol. 4 of Handbook of Econometrics, Elsevier, 1994, chapter 47, pp. 2843–2915. Yang, Shu-Chun S., “A Chronology of Federal Income Tax Policy: 1947-2009,” Working Paper 2007-021, CAEPR 2007.

40

Technical Appendix – not for publication –

Contents Narrative VAR and DSGE-VAR . . . . . . . . . . . . . . . . . . . . A.1 Narrative shock identification . . . . . . . . . . . . . . . . . . . A.2 Priors and posteriors . . . . . . . . . . . . . . . . . . . . . . . A.3 Simple New Keynesian model With News Shocks . . . . . . . . A.3.1 Equilibrium conditions . . . . . . . . . . . . . . . . . . . A.3.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Narrative policy rule identification . . . . . . . . . . . . . . . . A.5 Dummy variables prior . . . . . . . . . . . . . . . . . . . . . . A.6 Results on the marginal data density in T0V . . . . . . . . . . . A.6.1 Analytic results . . . . . . . . . . . . . . . . . . . . . . A.6.2 Numerical example . . . . . . . . . . . . . . . . . . . . A.7 Likelihood computation . . . . . . . . . . . . . . . . . . . . . . A.8 DSGE model equations . . . . . . . . . . . . . . . . . . . . . . A.8.1 Households . . . . . . . . . . . . . . . . . . . . . . . . . A.8.2 Production side and price setting . . . . . . . . . . . . . A.8.3 Market clearing . . . . . . . . . . . . . . . . . . . . . . . A.8.4 Observation equations . . . . . . . . . . . . . . . . . . . Sampling properties of VAR estimator . . . . . . . . . . . . . . . . . Data and additional results . . . . . . . . . . . . . . . . . . . . . . . C.1 Data construction . . . . . . . . . . . . . . . . . . . . . . . . . C.2 Approximation quality of VAR representation of DSGE model C.3 Additional results . . . . . . . . . . . . . . . . . . . . . . . . . . C.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

42 42 43 44 45 45 48 52 53 53 55 55 57 57 58 58 58 60 63 63 67 71 79

A A.1

Narrative VAR and DSGE-VAR Narrative shock identification

Here I derive how the observables, Γ and Σ, identify the impulse responses of interest up to an extra mz (mz −1) restrictions, where mz is the number of instruments and shocks to identify. 2 Define ′ κ = (Γ−1 1 Γ2 ) ,

(A.1)

so that α21 = κα11 . Then: 

α11 α′11 κ′ + α12 α′22 α11 α′11 + α12 α′12 Σ= ′ ′ κα11 α11 + α22 α12 κα11 α′11 κ′ + α22 α′22



(A.2)

The covariance restriction identifies the impulse response (or component of the forecast error) up to an mz × mz square scale matrix α11 :      [1]  α Imz [2] [1] [2] [2] [1] [1] [2] α11 ǫt + 12 ǫt ut = Aǫt = α α ǫt = α ǫt + α ǫt = κ α22

Given that ǫ[1] ⊥ ⊥ ǫ[2] , it follows that:

 α12 α′12 α12 α′22 = α (α ) = α22 α′12 α22 α′22   α11 α′11 α11 α11 κ [2] [1] [1] ′ Var[ut |ǫt ] = α (α ) = κα11 α′11 κα11 α′11 κ   Σ12 Σ′12 Σ12 Σ′22 [2] [1] Σ = Var[ut ] = Var[ut |ǫt ] + Var[ut |ǫt ] = Σ22 Σ′12 Σ22 Σ′22 [1] Var[ut |ǫt ]

[2]

[2] ′



Note that: ures t

≡ ut −

[1] E[ut |ǫt ]



[1] E[ut |ǫt ]

 Imz [1] α11 ǫt = κ 

  Any vector in the nullspace of Imz κ′ satisfies the orthogonality condition.     κ′ Imz is an orthogonal basis for Rm . , Note that −Im−mz κ Define     I κ′ Z ≡ Z [1] Z [2] ≡ mz κ −Im−mz

(A.3)



Note that Z [2] spans the nullspace of α[1] . Hence, (Z [2] )′ vt projects vt onto the nullspace of the [1] instrument-identified shocks ǫt .   (Z [2] )′ vt = (Z [2] )′ Aǫt = (Z [2] )′ α[1] α[2] ǫt     ˜ 11 α[2] ǫt = 0 (Z [2] )′ α[2] ǫt = (Z [2] )′ Z [1] kZkα [2]

[1]

⊥ ǫ[1] = 0 × ǫt + (Z [2] )′ α[2] ǫt ⊥

[2]

Note that (Z [2] )′ α[2] is of full rank, and I can therefore equivalently consider ǫt or (Z [2] )′ vt . Thus,

42

[2]

the expectation of vt given ǫt is given by: [2]

E[vt |ǫt ] = Cov[vt , (Z [2] )′ vt ] Var[(Z [2] )′ vt ]−1 (Z [2] )′ vt , [2]

vt − E[vt |ǫt ] = (I − Cov[vt , (Z [2] )′ vt ] Var[(Z [2] )′ vt ]−1 (Z [2] )′ )vt ,   κ′ [2] ′ [2] Cov[vt , (Z ) vt ] = ΣZ = Σ −Im−mz [2]

Var[vt |ǫt ] = E[(I − Cov[vt , (Z [2] )′ vt ] Var[(Z [2] )′ vt ]−1 (Z [2] )′ )vt vt′ ] = E[vt vt′ ] − Cov[vt , (Z [2] )′ vt ] Var[(Z [2] )′ vt ]−1 E[(Z [2] )′ )vt vt′ ] = Σ − Cov[vt , (Z [2] )′ vt ] Var[(Z [2] )′ vt ]−1 Cov[vt , (Z [2] )′ vt ] −1        κ′ κ′ κ −Im−mz Σ κ −Im−mz Σ =Σ−Σ −Im−mz −Im−mz   α11 α′11 α11 α11 κ = κα11 α′11 κα11 α′11 κ

(A.4)

This gives a solution for α11 α′11 in terms of observables: Σ and κ = Γ−1 1 × Γ2 . For future reference, note that this also implies that: [1]

[2]

Var[vt |ǫt ] = Σ − Var[vt |ǫt ]    −1  ′   κ′ κ′ κ′ κ −Im−mz Σ Σ =Σ −Im−mz −Im−mz −Im−mz

(A.5)

z In general, α11 itself is unidentified: Additional (mz −1)m restrictions are needed to pin down its 2 (m +1)m z z 2 independent elements in α11 α′11 . Given α11 , the impact response to mz elements from the 2 a unit shock is given by:   Imz α11 κ

A.2

Priors and posteriors iid

Let ut ∼ N (0, V ) and let U = [u1 , . . . , uT ]′ , where ut is ma × 1 and U is T × ma . Then the likelihood can be written as: ! T X 1 L = (2π)−mT /2 |V |−T /2 exp − u′t V −1 ut 2 t=1 ! T 1X −mT /2 −T /2 ′ −1 = (2π) |V | exp − tr(ut V ut ) 2 t=1 ! T X 1 ut u′t ) = (2π)−mT /2 |V |−T /2 exp − tr(V −1 2 t=1   1 −mT /2 −T /2 −1 ′ = (2π) |V | exp − tr(V U U ) (A.6) 2   1 ′ −1 −mT /2 −T /2 = (2π) |V | exp − vec(U ) (V ⊗ IT ) vec(U ) , 2 43

′ using that tr(ABC) = vec(B ′ )′ (A′ ⊗ I) vec(C)     and that V = V . By B + For the SUR model, [Y, Z] = [Xy , Xz ] y +U . Consequently, YSU R ≡ vec([Y, Z]) = XSU R vec Bz Bz vec(U ), where   Im ⊗ Xy 0 XSU R ≡ . 0 Imz ⊗ Xz

The likelihood can then also be written as:   1 ′ −1 −mT /2 −T /2 L = (2π) |V | exp − (YSU R − XSU R β) (V ⊗ IT )(YSU R − XSU R β) 2   1 ˜ −mT /2 −T /2 ′ ˜ SU R β) (Y˜SU R − X ˜ SU R β) = (2π) |V | exp − (YSU R − X 2   1 ˜ −mT /2 −T /2 ′ ˜ ˜ ˜ = (2π) |V | exp − (YSU R − XSU R β) (YSU R − XSU R β) 2   1 ˜ ′ −mT /2 −T /2 ˜SU R (β − β˜SU R )) = (2π) |V | exp − (XSU R (β − β˜SU R )) (X 2   1 −mT /2 −T /2 ′ ˜′ ˜ ˜ ˜ = (2π) |V | exp − (β − βSU R ) (XSU R XSU R )(β − βSU R ) , 2

(A.7)

−1 ˜ ′ ˜′ X ˜ ˜ where β˜SU R ≡ (X SU R SU R ) XSU R YSU R and the second to last equality follows from the normal equations. Note that expression (A.7) for the likelihood is proportional to a conditional Wishart distribution −1 ′ −1 ⊗ I)X −1 ˜ ˜′ X ˜ for β: β|V −1 ∼ N (β˜SU R , (X SU R ) ). Alternatively, SU R SU R ) ) ≡ N (βSU R , (XSU R (V expression (A.6) for the likelihood is proportional to a conditional Wishart distribution for V −1 : V −1 |β ∼ Wma ((U (β)′ U (β))−1 , T + ma + 1). Premultiplying with a Jeffrey’s prior over V , transformed ma +1 to V −1 , is equivalent to premultiplying by π(V −1 ) ≡ |V −1 |− 2 and yields:   ma +1 1 π(V −1 )× = |V −1 |− 2 × (2π)−mT /2 |V |−T /2 exp − tr(V −1 U ′ U ) 2   1 −mT /2 −1 (T −ma −1)/2 −1 ′ = (2π) |V | exp − tr(V U U ) , (A.8) 2

which is V −1 |β ∼ Wma ((SSR(β))−1 , T ), with SSR(β) ≡ U (β)′ U (β) = [Y − Xy By (β), Z − Xz Bz (β)]′ [Y − Xy By (β), Z − Xz Bz (β)] =

T X [yt − xy,t By (β), zt − xz,t Bz (β)]′ [yt − xy,t By (β), zt − xz,t Bz (β)]. t=1

A.3

Simple New Keynesian model With News Shocks

Consider the framework from (Gal´ı, 2009, ch. 3), with constant returns to scale at the firm level and simple output in the Taylor rule to simplify. Households have separable utility with inverse IES σ and Frisch elasticity of labor supply ψ. Firms face a downward sloping, elastic demand curve. They produce with labor only and can reset prices with Calvo probability 1 − ξ. Their (log) productivity is at . I depart from the model in Gal´ı (2009) mainly in two dimensions: First, I allow for news shocks. News shocks are known to agents one period in advanced so that shocks that take effect in period t are 44

collected in the vector νt−1 . Besides these news shocks, there are standard surprise shocks, collected in the vector ǫt . Both νt−1 and ǫt are iid. Second, I allow for a fiscal rule and interest rate smoothing. Both generate endogenous persistence. A.3.1

Equilibrium conditions

Because the model is largely standard, I only present the log-linear equilibrium conditions in what follows. Optimal price setting, market clearing, labor supply decisions and aggregation across firms implies the New Keynesian Philips Curve that relates inflation πt to output yt :   σ ˜ 1+ψ at − gt + ǫπ,t , (A.9) πt = βEt [πt+1 ] + λ(˜ σ + ψ) yt − σ ˜+ψ σ ˜+ψ increases in the Calvo probability 1 − ξ with which firms can adjust prices. where λ ≡ (1−ξ)(1−βξ) ξ Here, σ ˜ ≡ σ/(1 − g¯/¯ y ). Market clearing and consumption smoothing yield the New Keynesian Intertemporal Substitution curve, that relates current output to current and expected government spending, expected output, and real interest rates: yt = gt + Et [yt+1 − gt+1 ] −

1 (rt − Et [πt+1 ]) . σ ˜

(A.10)

Interest rates are set according to a Taylor Rule with interest rate smoothing: rt = ρrt−1 + (1 − ρ)(φπ πt + φy yt ) + ǫr,t

(A.11)

ǫr,t denotes the discretionary component of monetary policy. φπ and φy parametrize how aggressively the monetary authority reacts to deviations of inflation and output from their steady state values. The fiscal authority sets government consumption as a function of current and past output. In addition, government consumption reflects surprise and news shocks. gt = γyt + κyt−1 + ǫg,t + νg,t−1

(A.12)

Lumpsum taxes guarantee government budget balance. Technology may also be subject to news shocks, besides a standard surprise shock. at = ǫa,t + νa,t−1

(A.13)

In addition, the household’s transversality condition has to hold. A.3.2

Solution

Guess and verify the existence of a minimum state variable solution in xt−1 ≡ [rt−1 , yt−1 , νa,t , νg,t , νa,t−1 , νg,t−1 ]′ , and the current set of surprise shocks ǫt = [ǫr,t , ǫa,t , ǫg,t ]′ : ′ ′ πt+1 = pr rt + py yt + pa νa,t + pg νg,t + pe [ǫ′t+1 , νt+1 ]′ ≡ px xt + pe [ǫ′t+1 , νt+1 ]′ ′ ′ yt+1 = yr rt + yy yt + ya νa,t + yg νg,t + yǫ [ǫ′t+1 , νt+1 ]′ ≡ yx xt + yǫ [ǫ′t+1 , νt+1 ]′ ′ ′ gt+1 = gr rt + gy yt + ga νa,t + gg νg,t + gǫ [ǫ′t+1 , νt+1 ]′ ≡ gx xt + gǫ [ǫ′t+1 , νt+1 ]′ .

where the state follows a VAR(1) process: xt = Bxt−1 + A[ǫ′t , νt′ ]′ , 45

(A.14)

where, in turn, B and A are given by:    rr ry 0 0 0 ra rg rx yr yy 0 0 0 ya yg   yx    gr gy 0 0 0 ga gg   gx       0 0 0 0 0 1 0 0 0 0 0 B= =  0 0 0 0 0 0 1  0 0 0 0     0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   rǫ rν  yǫ y ν     gǫ gν     A= 0 0 0 1 0 0 0 0 0 1   0 0 0 0 0 0 0 0 0 0



1 0 0 0

    0  1  0 0

Note that xt need not include gt , but carrying it along leads more naturally to the VAR representation of the economy. To verify, substitute the guess for yt in the government consumption equation (A.12): gt = γ(yx xt−1 + yǫ ǫt ) + [0, κ, 0, 0, 0, 0, 1]xt−1 + [0, 0, 1, 0, 0][ǫ′t , νt′ ] Matching coefficients with the guess gx xt−1 + gǫ ǫt yields: gx = γyx + [0, κ, 0, 0, 0, 0, 1],

gǫ = γyǫ + [0, 0, 1, 0, 0],

From the NKPC (A.9), and using that both νt , ǫt are iid: px xt−1 + πǫ ǫt = βpx (Bxt−1 + Aǫt )   1+ψ σ ˜ + λ(˜ σ + ψ) (yx xt−1 + yǫ ǫt ) − (ǫa,t + νa,t−1 ) − (gx xt−1 + gǫ ǫt ) σ ˜+ψ σ ˜+ψ Matching coefficients: px = px βB + λ(˜ σ + ψ)yx + [0, 0, 0, 0, 0, −λ(1 + ψ), 0] − λ˜ σ gx = (λ(˜ σ + ψ)yx + [0, 0, 0, 0, −λ(1 + ψ), 0] − λ˜ σ gx ) (I − βB)−1 = (λ(˜ σ + ψ)yx + [0, 0, 0, 0, 0, −λ(1 + ψ), 0] − λ˜ σ (γyx + [0, κ, 0, 0, 0, 0, 1])) (I − βB)−1 = yx (λ(˜ σ + ψ)I − λ˜ σ γI) (I − βB)−1 + ([0, 0, 0, 0, 0, −λ(1 + ψ), 0] − λ˜ σ [0, κ, 0, 0, 0, 0, 1]) (I − βB)−1 . πǫ = px βA + λ(˜ σ + ψ)yǫ + [0, −λ(1 + ψ), 0, 0, 0] − λ˜ σ ge The guess implies that the monetary Taylor Rule (A.11) can be written as: rx = [ρ, 0, 0, 0, 0, 0, 0] + (1 − ρ)φπ px + (1 − ρ)φy yx rǫ = (1 − ρ)φπ pǫ + (1 − ρ)φy yǫ + [1, 0, 0, 0, 0]. Last, the guess implies that the NKIS equation (A.10) can be written as: yx = gx −

1 1 rx (I − B)−1 + px B(I − B)−1 σ σ 46

yǫ − yx A = gǫ − gx A −

1 1 rǫ + px A σ σ

Plugging in:   1 γ − (1 − ρ)φy yx + [0, κ, 0, 0, 0, 0, 1] σ ˜ 1 1 1 − [ρ, 0, 0, 0, 0, 0, 0](I − B)−1 − (1 − ρ)φpx (I − B)−1 + px B(I − B)−1 σ σ σ   1 1 1 1 − γ + (1 − ρ)φy yx (I − B) = [0, κ, 0, 0, 0, 0, 1](I − B) − [ρ, 0, 0, 0, 0, 0, 0] + px (B − (1 − ρ)φI) σ ˜ σ σ yx =

After substituting for B, px , rx , this equation collapses to a quadratic equation in (yy , yr ). Picking the stable root, if it exists, gives the solution to the system of equations. The remaining coefficients on x and ǫ, ν can be determined by substituing recursively from the above equations. If such a root exists, we have a stable B. We thus have a VAR representation of this economy, albeit in terms of the unobservable anticipated government spending and technology shocks νg,t and νa,t . Expectations of future government spending and output, in addition to the endogenous state variables, reveal these anticipated shocks. Surveys or institutional forecasts often elicit these expectations. Using this information, I can thus express the fundamental VAR representation in terms of observables. As a first step, I express the anticipated government spending shock in terms of current and expected future output as well as expected future government spending: νg,t = Et [gt+1 ] − (γEt [yt+1 ] + κyt ) = Et [gt+1 ] − (γyr rt + yy yt + ya νa,t + yg νg,t + κyt ) 1 (Et [gt+1 ] − (γyr rt + (κ + γyy )yt + γya νa,t )) = 1 − γyg For output shocks I combine the expectation of the policy function Et [yt+1 ] = yx xt with the previous expression for νg,t : 1 (Et [yt+1 ] − (yr rt + yy yt + yg νg,t )) ya 1 (Et [yt+1 ] − (yr rt + yy yt + yg (Et [gt+1 ] − (γEt [yt+1 ] + κyt )))) = ya 1 ((1 + γyg )Et [yt+1 ] − (yr rt + (yy + yg κ)yt + yg (Et [gt+1 ]))) = ya

νa,t =

47

The VAR observables can    1 0 rt  0 1  yt      0  gt   0   yy +yg κ yr  νa,t   − − ya    ya  νg,t   0 −κ    νa,t−1  =  0    0 νg,t−1   0 0    r    0 0  t−1   y   t−1 0 0 gt−1 0 0 |

thus be written as: 0 0 1 0 0 0 0 0 0 0

0 0 0 1+yg γ ya

−1 0 0 0 0 0

0 0 0 y − yag 1 0 0 0 0 0

0 0 0 0 0 1 0 1+yg γ ya

−1 0

{z

0 0 0 0 0 0 1 y − yag 1 0

0 0 0 0 0 0 0 − yyar 0 0

0 0 0 0 0 0 0 −

yy +yg κ ya

−κ 0

 0  rt  0 yt     0    gt    0 E [y ]  t t+1    0 Et [gt+1 ]  (A.15)  Et−1 [yt ] 0      E [g ] 0  t−1 t    rt−1  0    yt−1  0 gt−1 1 } | {z } ˜t ≡X

˜ ≡B

˜ = 1−yg /(1−γ) . ˜ is invertible if yg 6= 1 because |B| Note that B 1−γ ya ˜ −1 B is stable, then the mapping (A.15) allows to write the law of motion of the economy as a If B ˜ t as: VAR(2) in terms of observables X   −1 ˜ ˜ −1 A ˜ ˜ ˜ Xt = B B B Xt−1 + B 0 ˜ −1 exists, then the eigenvalues of B ˜ are the same as those of B ˜ −1 B B. ˜ 48 Thus, if However, if B the economy has a unique, stable equilibrium as posited in (A.14), then the economy has a VAR(2) representation in terms of observables.

A.4

Narrative policy rule identification

To show that the lower Cholesky factorization proposed in Mertens and Ravn (2013) identifies Taylortype policy rules when ordered first, I start by deriving the representation of the identification problem as the simultaneous equation system (3.8). Recall the definition of forecast errors vt in terms of structural shocks ǫt : 

α α12 vt = Aǫt ≡ 11 α21 α22 Note that: 

α11 α12 α21 α22

−1

=







  −1 ǫ1,t α11 α12 ⇔ ǫ2,t α21 α22



ǫ vt = 1,t ǫ2,t



−1 −1 −1 (α11 − α12 α−1 −α−1 22 α21 ) 11 α12 (α22 − α21 α11 α12 ) −1 −1 −1 −1 −1 −α22 α21 (α11 − α12 α22 α21 ) (α22 − α21 α11 α12 )

−1 −1 −1 (α11 − α12 α−1 −(α11 − α12 α−1 22 α21 ) 22 α21 ) α12 α22 = −1 −1 −1 −(α22 − α21 α−1 (α22 − α21 α−1 11 α12 ) α21 α11 11 α12 )

(A.16)





Note that −1 −1 −1 −1 −1 −1 −1 (α11 − α12 α−1 = α−1 = α−1 22 α21 ) 11 ((α11 − α12 α22 α21 )α11 ) 11 (I − α12 α22 α21 α11 )

48 ˜ −1 ||B − Iλi ||B| ˜ = |B ˜ −1 B B ˜ − Iλi | so that λi is an Let λi be an eigenvalue of B. Then 0 = |B − Iλi |, or 0 = |B ˜ −1 B B. ˜ eigenvalue of B

48

and define: −1 S1 ≡ (I − α12 α−1 22 α21 α11 )α11

−1 S2 ≡ (I − α21 α−1 11 α12 α22 )α22

(A.17)

so that −1 (α11 − α12 α−1 = S1−1 22 α21 )

−1 (α22 − α21 α−1 = S2−1 11 α12 )

Using these equalities gives the first equality in what follows, whereas the second equality is straightforward algebra: 

α11 α12 α21 α22

−1



 S1−1 −S1−1 α−1 α12 11 vt = vt −S2−1 α−1 S2−1 22 α21  −1     S 0 I −α12 α−1 22 v = ǫ1,t = 1 t ǫ2,t 0 S2−1 −α21 α−1 I 11

and equivalently: 

    I −η S1 0 ǫ1,t vt = −κ I 0 S2 ǫ2,t

(A.18)

−1 defining η ≡ α12 α−1 22 and κ ≡ α21 α11 . Equation (3.8) follows immediately.   Lemma 4 (Mertens and Ravn (2013)). Let Σ = AA′ and Γ = G 0 A′ , where G is an mz × mz invertible matrix and A is of full rank. Then α[1] is identified up to a factorization of S1 S1′ with S1 defined in (A.17).

Proof. Since A is of full rank, it is invertible and (A.18) holds for any such A. Given η, κ, (A.18) implies (3.9), which I reproduce here for convenience:     α11 (I − ηκ)−1 [1] chol(S1 S1′ ). (3.9) α = = (I − κη)−1 κ α21 If Σ and Γ pin down η, κ uniquely, α[1] is uniquely identified except for a factorization of S1 S1′ .  To show that Σ and Γ pin down η, κ uniquely, consider κ first. Since Γ = G 0 A and G is an mz × mz invertible matrix, it follows that Assumption 1 holds. It then follows from (3.3) that 2 −1 κ = α21 α−1 11 = Γ Γ1 .   Σ11 Σ12 , where Σ11 is mz × mz , Σ12 is To compute η, more algebra is needed. Partition Σ = Σ′12 Σ22 mz × (m − mz ) and Σ 22 is (m − mz ) × (m − mz ). Define α22 α′22 = Σ22 − κα11 α′11 κ′ = Σ22 − κ(Σ11 − α12 α′12 )κ′ , using (A.2) twice. Using the upper left element of (A.5), it follows that α12 α′12 = (Σ′12 − κΣ11 )′ (ZZ ′ )−1 (Σ′12 − κΣ11 ) with ′



ZZ = κΣ11 κ −

(Σ′12 κ′





+ κΣ12 ) + Σ22 = κ −Im−mz Σ

49



κ′ −Im−mz



.

The coefficient matrix of interest, η, is then defined as: ′ ′ −1 η ≡ α12 α−1 = (Σ′12 − κα11 α′11 )′ (α22 α′22 )−1 22 = α12 α22 (α22 α22 )

= (Σ′12 − κΣ′11 + κα12 α′12 )′ (α22 α′22 )−1 . Thus, η and κ are uniquely identified given Σ, Γ. The above derivations link S1 to A−1 . I now compute S1 for a class of models. Proposition 1. Assume Σ = AA′ = A∗ (A∗ )′ and order the policy variables such that the mp = mz or mp = mz − 1 observable Taylor rules are ordered first and Γ = [G, 0]A∗ . Then α[1] defined in (3.9) satisfies α[1] = A∗ [Imz , 0(m−mz )×(m−mz ) ]′ up to a normalization of signs on the diagonal if (a) mz instruments jointly identify shocks to mp = mz observable Taylor rules w.r.t. the economy (2.1), or (b) mz instruments jointly identify shocks to mp = mz − 1 observable Taylor rules w.r.t. the economy (2.1) and ψp,mz = 0, p = 1, . . . , mp . Proof. Given Lemma 4, α[1] is identified uniquely if S1 is identified uniquely. In what follows, I establish that under the ordering in the proposition, S1 , as defined in (A.17) for arbitrary full rank A, is unique up to a normalization. It then follows that α[1] is identified uniquely and, hence, equal to A∗ [Imz , 0(m−mz )×(m−mz ) ]′ . To proceed, stack the mp policy rules:       ψ1,i 0 . . . 0 λ1 σ11 0 ... 0 m  0 ψ2,i . . .  λ2   σ21 X 0  σ22 . . . 0        p Ytp = y + X +  ..     .. .. .. .. ..  ǫt i,t t−1 .. ..  .     . . ... . . . . .  i=mp +1



m X

i=mp +1

0

0

. . . ψmp ,i

λnp

σnp ,1 σnp ,2 . . . σnp ,np

  Di yi,t + ΛXt−1 + D0 0 ǫt ,

   m m X X   =  D0 0 ǫt + Di 1A′i  ǫt +  Di 1Bi′ Xt−1 + Λ Xt−1 , 

i=mp +1

i=mp +1

where m − mp ≤ n ¯ ≡ maxp np . Define ei as the selection vector of zeros except for a one at its ith position and denote the ith row of matrix A by Ai = (e′i A)’ and similarly for Bi . Without loss of generality, order the policy instruments first, before the m − mp = n ¯ nonpolicy ∗ variables. Then A in the DSGE model observation equation (2.1a) can be written as: P  ∗ ′ [D0 , 0] + m i=mp Di 1(Ai )   (A∗mp +1 )′   ,  . .   . (A∗m )′

where D0 is a full-rank lower diagonal matrix and the Dj matrices are mp × mp matrices. To find (A∗ )−1 , proceed by Gauss-Jordan elimination to rewrite the system A∗ X = Im , with

50

E

solution X = (A∗ )−1 , as [A∗ |Im ]. Define E as a conformable matrix such that [A∗ |Im ] → [B|C] = [EA∗ |EIm ]. Then:  P    ∗ ′ I D0 0 + m mp 0 0 . . . 0 i=mp +1 Di 1(Ai )  0′ 1 0 . . . 0  (A∗mp +1 )′   [(A∗ )|Im ] =  .. .. .. .. . . ..   . .  . . . . ′ ′ ∗ 0 0 0 ... 1 (Am )  Pm    D0 0 + i=mp +2 Di 1(A∗i )′ Imp −Dmp +1 1 0 . . . 0  0′ 1 0 ... 0  (A∗mp +1 )′   E1  ′ ∗ ′ (Amp +2 ) 0 0 1 ... 0  →   ..  .. . . .. .. ..  . .  . . . .  

   →   E2

D0 0



(A∗m )′ Pm

0′

+ i=mp +3 Di 1(A∗i )′ (A∗mp +1 )′ (A∗mp +2 )′

   D0 0  (A∗m +1 )′ p  Em−mp  ∗ ′ →  (Amp +2 )  ..  .

0

0 ... 1

−Dmp +1 1 −Dmp +2 1 1 0 0 1 .. .. .. . . . ′ ′ 0 0 0 (Am ∗)  Imp −Dmp +1 1 −Dmp +2 1 . . . −Dm 1  0′ 1 0 ... 0  ′  0 0 1 ... 0   .. .. .. .. ..  . . . . . Imp 0′ 0′ .. .

(Am ∗)′ 0′ 0 0 ... 1    −1 −1 −1 −D0 Dmp +1 1 −D0 Dmp +2 1 Imp 0 D0 ′  (A∗m +1 )′ 0 1 0 p  ′ ′ ED  (A∗ 0 0 1 → mp +2 )  . . . .. .. .. ..  . ′ ′ 0 0 0 (Am ∗)

 ... 0 ... 0   E3 ... 0   → ... .  .. . ..  ... 1

. . . −D0−1 Dm 1 ... 0 ... 0 .. .. . . ... 1

Thus, ((A∗ )−1 )1:mp ,1:mp = (ED Em−mp . . . E1 Im )1:mp ,1:mp . Now consider cases (a) and (b):

      

(a) mz = mp . From (A.17), S1 is the upper left corner of (A∗ )−1 : S1 ≡ ((A∗ )−1 )1:mp ,1:mp = D0−1 and S1 is a (lower) diagonal matrix because D0 is (lower) diagonal. (b) mz = mp + 1, ψp,mp +1 = 0, p = 1, . . . , mp . The second condition implies that Dmp +1 = 0mp ×mp . It follows that S1 defined in (A.17) is given by:     D0−1 Dmp +1 1 D0−1 0 ∗ −1 S1 ≡ ((A ) )1:mp +1,1:mp +1 = = smp +1,1:mp smp +1,mp +1 smp +1,1:mp smp +1,mp +1 Thus, S1 is lower triangular. In both cases, S1 is lower triangular. Since the lower Cholesky decomposition is unique, a Cholesky

51

decomposition of S1 S1′ recovers S1 if we normalize signs of the diagonal of S1 to be positive. Given identification of S1 , the identification of α[1] follows from Lemma 4.

A.5

Dummy variables prior

Note that the dummy variables prior is no longer conjugate. Hence, my prior can be generated from two different distributions: The coefficients are generated from a N (β0 , V¯0−1 ) distribution, whereas the observations that generate the prior for the covariance matrix are generated from a N (0, V −1 ) distribution. Specify: ¯ −1 ), β ∼ N (β¯0 , N 0

¯0 ≡ X ′ ¯ −1 ⊗ I)XSU R,0 N SU R,0 (V0

Note that this is not equal to ¯0 (V −1 )), β|V −1 ∼ N (β¯0 , N

′ −1 ¯0 (V −1 ) ≡ XSU N ⊗ I)XSU R,0 , R,0 (V

unless V −1 is known and equal to V¯0 . The prior for V −1 is Wishart independent of β. V −1 ∼ Wm+mz (V¯0 T0 , T0 ) Note that because the prior for β is independent of V −1 , the prior is conditionally conjugate with the likelihood function. Otherwise, the presence of |N0 (V −1 )| terms would undo the conjugacy. The prior is therefore: ¯0 (θ)|+1/2 e− 12 (β−β¯0 (θ))′ N¯0 (θ)(β−β¯0 (θ)) π(β, V −1 |θ) = (2π)−n/2 |N 1 −1 ¯ × 2−T0 (m+mz )/2 |V¯0 (θ)T0 |−T0 /2 Γm (T0 /2)−1 |V −1 |(T0 −m−mz −1)/2 e− 2 tr(V V0 (θ)T0 )

¯0 (θ)|+1/2 e− 12 (β−β¯0 (θ))′ N¯0 (θ)(β−β¯0 (θ)) = (2π)−n/2 |N 1

× 2−T0 (m+mz )/2 |S0 (θ)|−T0 /2 Γm (T0 /2)−1 |V −1 |(T0 −m−mz −1)/2 e− 2 tr(V

−1 S

0 (θ))

The joint density is given by: p(Y, Z, β, V −1 , θ) = p(Y, Z|β, V −1 )p(β, V −1 |θ)p(θ), = p(Y, Z|β, V −1 )p(β|θ)p(V −1 |θ)p(θ), −1

−T /2

(A.19a)

−1 T /2 − 12 (vec([Y,Z]−XSU R β)′ (V −1 ⊗IT )(vec([Y,Z]−XSU R β)

|V | e 1/2 ′ −1 ¯ p(β|θ) = (2π)−nβ /2 λB X0,SU ⊗ Im(mp+k) )X0,SU R R (θ)(V0 (θ)

p(Y, Z|β, V

) = (2π)

e−

λB 2

(X0,SU R (β¯0 (θ)−β))′ (V¯0−1 ⊗Im(mp+k) )(X0,SU R (β¯0 (θ)−β))

where λB ≡

,

T0B

m(mp + k) ¯0 (θ)|1/2 e− 12 (β¯0 (θ)−β)′ N¯0 (θ)(β¯0 (θ)−β) , = (2π)−nβ /2 |N p(V −1 |θ) =

− 12 tr(V −1 T0V V¯0 (θ))

e V

2T0

(m+mz )/2 Γ

(A.19b)

m+mz



T0V 2



T0V

|T0V V¯0 (θ)| (T0V

|V |

(A.19c)

/2

(A.19d)

−m−mz −1)/2

p(θ) = 1{DSGE model has a unique & stable solution|θ} ×

nθ Y

n=1

52

pn (θ (n) ),

(A.19e)

where θ (n) denotes the nth component of the vector θ and pn (θ (n) ) is a univariate density.

A.6

Results on the marginal data density in T0V

A.6.1

Analytic results

Del Negro et al. (2007) show that, in an AR(1) model with known variance, the marginal likelihood is strictly increasing, decreasing, or has an interior maximum in T0V = T0B in their DSGE-VAR framework with a conjugate prior. I am interested in the case of T0V 6= T0B and when the prior is not conjugate. Thus, I analyze the case of increasing the degrees of freedom only of the Wishart prior, abstracting from unknown model dynamics (i.e., β = 0) so that T0B becomes irrelevant. The marginal likelihood of an iid sample of length T with yt ∈ Rm is given by: p(y|T0V

)≡ =

Z Z

T Y

∞ 0

!

f (ys |V ) π(V |T0V )dV −1

s=1 ∞

V T tr(V −1 V ˆ )− T0 tr(V −1 V0 ) 2

(2π)−mT /2 |V |−T /2 e− 2

V

2−mT0

/2

V

|V0 T0V |T0

/2

Γm (T0V /2)−1 |V |−(T0 −m−1)/2 dV −1

0 V

= π −mT /2 Z



|V0 T0V |T0 |V0 T0V

e− 2 tr(V 1

/2

Γm ((T + T0V )/2)

(T +T0V )/2

+ T Vˆ | −1

Γm (T0V /2)

×

(T Vˆ +T0V V0 )) 2−m(T +T0V )/2 |V T V + T Vˆ |(T +T0V )/2 Γ ((T + T V )/2)−1 |V |−(T +T0 −2)/2 dV −1 0 0 m 0

0

= π −mT /2

V

|V0 T0V |T0

/2

Γm ((T + T0V )/2)

V |V0 T0V + T Vˆ |(T +T0 )/2 Γm (T0V /2) V

= π −mT /2 |Vˆ |−T /2

|Vˆ −1 V0 |T0 |Vˆ −1 V0 +

/2

V Im TTV |(T +T0 )/2 0

(T0V )−mT /2 Γm ((T + T0V )/2) Γm (T0V /2)

(A.20)

P Q 1 defining Vˆ ≡ T1 Ts=1 ys ys′ and using Γm (T /2) = π m(m−1)/2 m j=1 Γ( 2 (T + 1− j)). It is straightforward to show via the first order condition that the (log) data density is maximized by a DSGE model prior centered at V0 = Vˆ : the data rewards model fit. To gain intuition, consider the scalar case m = 1. Abstracting from terms constant in T0V , the density can then be simplified to:   T +T0V Γ V V 2 T + T0 V0 T V0 T  V ln(T0V ) + ln ln p(y|T0V ) = κ(V, T ) − ln(T0V ) + 0 ln( ) − T ˆ ˆ 2 2 2 V V ln Γ 0 2

The slope of the log data density in T0V is given by: !   V    T0V VVˆ0 T + T0V T0 V0 1 T 1 1 1 d ln p(y|T0V ) ψ ψ 1 − − , (A.21) + + = ln V V V 2 2 2 2 2 2 dT0 T0V Vˆ0 + T Vˆ T0V Vˆ0 + T

where ψ is the digamma function, the derivative of the log Gamma function. Part (a) of the following Lemma establishes that for Vˆ0 in an open neighborhood around unity, the slope of the log data density V is strictly positive (at T that are multiples of 2). Hence, when the DSGE model V0 fits the data well, an infinite prior weight on the DSGE model maximizes the fit. Parts (b)and (c) establish the counterpart that for a sufficiently bad fit so that VVˆ0 is far enough from unity, the slope of the log data density is negative in T0V . Thus, the optimal prior weight diverges. Lemma 5. Let T = 2n, n ∈ N+ and T0V > 0. (a) For

V0 V

in an open neighborhood around unity,

d dT0V

53

ln p(y|T0V ) > 0.

V0 V

(b) There exists a number v ∈ (0, 1) such that for

d dT0V


V0 V

(c) For T > 2, there exists a number v¯ > 1 such that for

ln p(y|T0V ) < 0.

> v¯

d dT0V

ln p(y|T0V ) < 0.

Proof. Consider the three cases in the lemma separately: (a) Let V0 = Vˆ . Note that under the assumption on T , the recurrence relation of the digamma function implies that

ψ



T + T0V 2







T0V 2



T 2

+

−1 X

1

T0V s=0 2

.

+s

The slope (A.21) can therefore be written as: 1 ln 2



T0V T0V + T



T

−1

2 1 X + 2

T0V s=0 2

1

.

+s

Note that for x > 0: 1 ln 2



x 2+x



+

1 > 0. x

This inequality follows from a basic logarithm inequality: − log   1 x 1 ln 2 2+x + x > 0.

The result on

d dT0V



x 2+x



= log 1 +

2 x



< x2 . Thus,

ln p(y|T0V ) follows by induction for V0 = Vˆ . Let T = 2 ⇔ n = 1. Then the

above inequality for x = T0V implies the condition for n = 1 ⇔ T = 2. Now assume that the condition holds for arbitrary n ∈ N+ . Notice that   d 2n + T0 d 1 1 2 V V ln p(y|T0 ) − ln p(y|T0 ) = ln + , V V V 2 2 2n + T0V T =2(n+1) T =2n dT0 dT0 2(n + 1) + T0 d ln p(y|T0V ) which is larger than zero by the above inequality. In addition, by assumption, > 0. dT0V T =2n > 0. It follows that dTdV ln p(y|T0V ) 0

T =2(n+1)

Since the assumption is true for n = 1, the desired result for and any n ∈ N+ by induction.

d dT0V

ln p(y|T0V ) follows for V0 = Vˆ

Last, because p(y|T0V ) and its derivatives are continuous in V0 , the inequality holds for V0 sufficiently close to Vˆ . (b) Fix T, T0V . Note that limV0 /Vˆ ց0

d ln p(y|T0V ) = −∞. Since the limit is −∞, there exists a dT0V d ln p(y|T0V ) < 0 holds. Since, by (a), the inequality is not dT0V

number v such that for Vˆ0 < v V ˆ satisfied at V0 = V , it follows that v < 1.

     V T +T0V T +T0V T0 1 (c) Note that limV0 /Vˆ →∞ dTdV ln p(y|T0V ) = − 2TTV + 12 ψ − . Note also that ψ − ψ 2 2 2 2 0 0  V T0 ψ 2 ≤ T2 T20 given the recurrence relation used in (a) and given that the sum in the recurrance 54

relation has at most T2 increments. These increments are smaller or equal to T20 . When T > 2, the equality is strict. Thus, limV0 /Vˆ →∞ dTdV ln p(y|T0V ) < 0 for T > 2. By the definition of the limit, 0 there exists some v¯ such that the inequality holds for all V0 > v¯Vˆ and T > 2. By (a), v¯ > 1.

A.6.2

Numerical example

The logic behind the previous analytic results for the scalar case applies more widely: If the prior is sufficiently close to the data, increasing the prior precision increases the model fit. Here, I provide a numerical benchmark for the benchmark VAR specification. ¯y, B ¯ z and V¯0 Specifically, I abstract from uncertain DSGE (hyper-)parameters and fix the prior B 0 0 matrices so that the prior fit is perfect: I choose the prior to equal the posterior given the actual data. I then vary the prior precision T0B and T0V on a grid. As expected, the marginal likelihood is strictly increasing in both T0V and T0B and peaks at the limit point of T0B = T0V → ∞. Varying weights: T0B 6= T0V = λV0 × T

0

0

-50

-50 log Bayes factor

log Bayes factor

Equal weights: T0B = T0V = λ0 × T

-100

-150

-200

-100

=5 λB 0 =2 λB 0 =1 λB 0

-150

-200

-250

=inf λB 0

=0.5 λB 0 =0.2 λB 0

-250 0.2

0.5 =λ B λV 0 0

1.0

2.0

5.0

inf

0.2

0.5

1.0

2.0

5.0

inf

(scaled to λ/(1+λ)) λV 0

(scaled to λ/(1+λ))

In this numerical example, the prior is chosen to equal the posterior for the baseline narrative DSGE-VAR. Thus the prior fits the data as well as possible. The figures show that increasing the weights T0B , T0V strictly increases the model fit, which is measured via the marginal likelihood. To give some context, the number of prior observations is expressed as T0 = λ0 × T (i.e., relative to the empirical sample size). The marginal likelihood is strictly increasing in both the dimension of “dynamics” via the number of dummy observations on the coefficient matrix and the dimension of “identification” via the number of dummy observations on the covariance matrix.

Figure A.1: Narrative DSGE-VAR marginal likelihood with fixed hyperparameters when prior is set to equal the posterior

A.7

Likelihood computation

I compute the marginal data density by applying the Chib (1995) method to the inner integral over the SUR-VAR parameters and then applying the Geweke (1999) estimator to integrate over the DSGE model hyperparameters. Likelihood given DSGE parameters. π(y, z|θ) =

The basic insight from Chib (1995) is that:

p(y, z|V −1 , B)π(V −1 , B|θ) p(y, z|V −1 , B)π(V −1 , B|θ) , = π(ˆ z , V −1 , B|y, z, θ) π(B∗ |y, z, θ)π(V∗−1 |B∗ , y, Z, θ) 55

(A.22)

for any V −1 , B. For numerical purposes, however, it is advisable to evaluate (A.22) at a high density point. In what follows I denote this point by (ˆ z∗ , B∗ , V∗−1 ). I choose B∗ as the posterior mean. I first compute: π(B∗ |y, z, θ) = M

−1

M X

π(B∗ |y, z, (V −1 )(m) , zˆ(m) , θ),

m=1

using draws {(V −1 )(m) , zˆ(m) } from the original Gibbs sampler. The second component is computed as: π(V∗−1 |y, z, θ) = M −1

M X

π(V∗−1 |y, z, B∗ , zˆ(m) , θ),

m=1

where (V −1 )(m) , zˆ(m) are draws from a simpler new run of the Gibbs sampler that conditions on B∗ . I draw a third sequence of zˆ(m) conditional on both To compute the likelihood p(y, z|B∗ , V∗−1 ), P M −1 −1 −1 B∗ , V∗ and I compute p(y, z|B∗ , V∗ ) = M ˆ(m) |B∗ , V∗−1 ). m=1 p(y, z, z

Likelihood over DSGE parameters. Geweke (1999) shows that to find the integrating constant of a Kernel k(ψ) we may use that p(˜ y ) is the integrating constant of the posterior kernel k(ψ) = (ψ) . Then: p(ψ|˜ y )p(˜ y ). Let g(ψ) ≡ fk(ψ) E[g(ψ)] =

Z

Ψ

f (ψ) p(ψ|˜ y )dψ = p(˜ y )−1 k(ψ)

Z

Ψ

f (ψ) k(ψ)dψ = p(˜ y )−1 k(ψ)

Z

f (ψ)dψ = p(˜ y )−1

Ψ

for any density f (ψ). Geweke (1999) proposes to use a truncated normal density function with the posterior mean and covariance of ψ. Denote this truncated density by fα (ψ) and its estimate based on the sample posterior distribution with sample size M by fα,M (ψ). Then: −1

p(˜ y)

= E[gα (ψ)] ≈ E[gα,M (ψ)] ≈ M

−1

M X fα,M (ψm ) , k(ψm ) m=1

where ψm are draws from the posterior. Here, ψ = (θ, B, V −1 ) – or strictly (θ, B, vech(V −1 )). This vector is high dimensional, especially because of the presence of B. It would therefore be helpful to reduce the dimensionality of the parameter vector, which I do using the Chib (1995) algorithm previously described. k(θ, B, V −1 ) = p(y, z|B, V −1 )p(B, V −1 |θ)π(θ) Z Z Z Z ⇒ k(θ) ≡ k(θ, B, V −1 )dV −1 dB = π(θ) p(y, z|B, V −1 )p(B, V −1 |θ)dV −1 dB Z Z ≡ π(θ) k(B, V −1 |y, z, θ)dV −1 dB Z Z = π(θ)p(y, z, θ) p(B, V −1 |y, z, θ)dV −1 dB ⇔ k(θ) = p(y, z, θ)π(θ)

Now proceed with this reduced parameter vector as before Z Z f (θ) f (θ) p(θ|y, z)dθ = p(θ|y, z)dθ E[g(θ)] = Θ π(θ)p(y, z, θ) Θ π(θ)p(y, z, θ) 56

−1

= p(y, z)

= p(y, z)−1

Z



f (θ) π(θ)p(y, z, θ)dθ π(θ)p(y, z, θ) f (θ)dθ = p(y, z)−1

Θ

In practice, I approximate p(y, z|θ) with the Chib (1995) estimator: M X fα (θ(m) ) ˆ g (θ)] = 1 ≈ pˆ(y, z)−1 E[ˆ M m=1 π(θ(m) )ˆ p(y, z, θ(m) )

where pˆ(y, z, θ(m) ) is the Chib estimator of the (conditional) marginal likelihood. The approximation R relies on Θ f (θ) p(y,z,θ) pˆ(y,z,θ) dθ being small. In the case without instruments and with a fully conjugate prior, I verify this numerically by comparing the estimated marginal data density with its analytical counterpart. For the SUR case, I verify that with a modest number of posterior draws the numerical error lies within ±0.1 of the truth computed by a very large number of draws.

A.8

DSGE model equations

A.8.1

Households

The law of motion for capital: x ¯ x ¯ p + ¯p (ˆ xt + qˆt+s ) kˆtp = (1 − ¯p )kˆt−1 k k

(A.23)

Household wage setting: ¯ t [w βγE ˆt+1 ] w ˆt−1 + ¯ ¯ 1 + βγ 1 + βγ ¯ w γ)(1 − ζw )  cˆt − (h/γ)ˆ ct−1 (1 − βζ dτtn dτtc  ¯w A + + ν n ˆ − w ˆ + + t t ¯ w 1 − h/γ 1 − τ¯n 1 + τ¯c (1 + βγ)ζ ¯ w ¯ 1 + βµι ιw βγ ǫˆλ,w t − π ˆ + π ˆ E [ˆ π ] + t t−1 t t+1 ¯ ¯ ¯ ¯ 1 + βγ 1 + βγ 1 + βγ 1 + βγ

w ˆt =

(A.24)

Household consumption Euler equation: c Et [ξˆt+1 − ξˆt ] + Et [dτt+1 − dτtc ] =

    1 1 1 − τ¯n w¯ h ¯n h = Et (σ − 1) nt+1 − n ˆ t ] − σ cˆt+1 − 1 + ct + cˆt+1 , ¯ w 1 + τ c c¯ [ˆ 1 − h/γ γ γ 1+λ

(A.25)

Other FOC (before rescaling of qˆtb ): ˆ t + Et [ˆ Et [ξˆt+1 − ξˆt ] = −ˆ qtb − R πt+1 ], 1 ˆ t = −ˆ ˆ t − Et [πt+1 ]) + × Q qtb − (R k k r¯ (1 − τ ) + δτ k + 1 − δ  k × (¯ r k (1 − τ k ) + δτ k )ˆ qtk − (¯ r k − δ)dτt+1 +  k k k ˆ + r¯ (1 − τ )Et (ˆ rt+1 ) + (1 − δ)Et (Qt+1 ) ,

57

(A.26)

(A.27) (A.28)

  1 1 x ¯ ˆ [Qt + qˆt ] , x ˆt = ˆt−1 + βγEt (ˆ xt+1 ) + 2 ′′ ¯ x γ S (γ) 1 + βγ a′ (1) 1 − ψu k u ˆt = ′′ rˆtk ≡ rˆ . a (1) ψu t A.8.2

(A.29) (A.30)

Production side and price setting

The linearized aggregate production function is:   y¯ + Φ a g yˆt = ǫˆt + ζ kˆt−1 + α(1 − ζ)kˆt + (1 − α)(1 − ζ)ˆ nt , y¯

(A.31)

where Φ are fixed costs. Fixed costs, in steady state, equal the profits made by intermediate producers. The capital-labor ratio: kˆt = n ˆt + w ˆt − rˆtk . (A.32) Price setting: π ˆt =

¯ 1 − ζp ¯ 1 − ζp βγ βγ ιp π ˆt−1 + ˆt+1 . A¯p (mc c t + ǫˆλ,p t )+ ¯ ¯ ¯ Et π 1 + ιp βγ 1 + ιp βγ ζp 1 + ιp βγ

(A.33)

Marginal costs with a cost-channel:

A.8.3

ˆ t ). mc c t = αˆ rtk + (1 − α)(wˆt + R

Market clearing

(A.34)

Goods market clearing requires: yˆt = A.8.4

x ¯ x ¯g g r¯k k¯ c¯ cˆt + x ˆt + x ˆt + gˆt + u ˆt . y¯ y¯ y¯ y¯

(A.35)

Observation equations

For the estimation under full information, I need to specify observation equations. The observation equations are given by (3.2) as well as the following seven observation equations from Smets and Wouters (2007) and three additional equations (A.37) on fiscal variables: ∆ ln gtobs = gt − gt+1 + (γg − 1),

(A.36a)

∆ ln xobs = xt − xt+1 + (γx − 1), t

(A.36b)

∆ ln wtobs ∆ ln cobs t obs π ˆt n ˆ obs t obs ˆ Rt

= wt − wt+1 + (γw − 1),

(A.36c)

= ct − ct+1 + (γ − 1),

(A.36d)

=π ˆt + π ¯,

(A.36e)

=n ˆt + n ¯, ˆ t + (β −1 − 1), =R

(A.36f) (A.36g)

By allowing for different trends in the non-stationary observables, I treat the data symmetrically in the VAR and the DSGE model.

58

I use the deviation of debt to GDP and revenue to GDP, detrended prior to the estimation, as observables: ¯b (ˆb − yˆ) + ¯bobs y¯   ¯ n c¯ dτtn n w¯ +w ˆt + n ˆ t − yˆt + rev = τ¯ ¯ n,obs c¯ y¯ τ¯n  k  ¯ dτ r¯k k ˆp kk k r − δ) rˆ + kt−1 − yˆt + rev + k = τ¯ (¯ ¯ k,obs y¯ τ¯k r¯ − δ t

bobs = t revtn,obs revtk,obs

59

(A.37a) (A.37b) (A.37c)

B

Sampling properties of VAR estimator

The estimated credible sets are wide – wider than they would be if uncertainty about Γ were ignored. ˆ = 1 (Y −Xy By )′ (Z −Xz Bz ) Rather than drawing Γ as part of V from the posterior, one can compute Γ T conditional on the estimated coefficients. Figure B.2 compares the posterior uncertainty over the output response to different shocks for both sampling schemes. Output to G shock

Output to Tax

Output to FFR shock

Y to one sd shock in tax

Y to one sd shock in G

Y to one sd shock in FFR

1.2

1 1

1

0.6

−1

0.4

%

−1 %

%

0

0

0.8

−2

−2

−3

−3

0.2 0 −0.2 0

−4 5

10 quarters

15

20

0

−4 5

10 quarters

15

20

0

5

10 quarters

15

20

Comparing the estimated credible sets in gray to credible sets that abstract from sampling error in the estimated covariance produce much tighter estimates when instruments used to identify shocks have only few non-zero observations as is the case for spending and tax shocks. Note: Shown are the pointwise median and 68% and 90% posterior credible ˆ = 1 (Y − Xy By )′ (Z − Xz Bz ) is used sets of the SUR-sampling scheme (black and gray) and when the point estimate Γ T

for Γ (red). Results based on lower Cholesky factorization of S1 S1′ .

Figure B.2: Effects of policy shocks on output: comparing credible sets for IRFs Given the different results for the posterior uncertainty, which sampling scheme should be used? One criterion to judge the proposed prior and Bayesian inference scheme is to investigate its frequentist properties in a Monte Carlo study. I simulate NM C = 200 datasets based on the actual point estimates for the reduced-form VAR as well as the instrument-based inference about a structural impulse response vector. I initialize the VAR at the zero vector and drop the first 500 observations and keep the last T = 236 observations as the basis for estimating the reduced-form VAR. Different scenarios for instrument availability are considered for the structural inference: The fraction of zero observations for the instrument varies from about 5% to 90% of the observations. The data generating process is given by the OLS point estimate of a VAR in the core variables and its covariance as specified in (3.1). Only one instrument is used: the Romer and Romer (2004) monetary policy shocks. I choose their shocks for the analysis because they are available for about half (47%) of the sample. As in the data, I modify (3.1) by setting randomly chosen observations to zero. The section of observations set to zero varies from 5% to 90%. For each dataset m, I compute the pointwise posterior credible set using my Bayesian procedure, with and without conditioning on the observed covariance between instruments and VAR forecast errors. Similarly, I use the wild bootstrap proposed in Mertens and Ravn (2013) to conduct frequentist inference. 49 Each procedure yields an estimate of the true IRF {Ih,m }H h=0 for each point OLS estimate j H ˆ and a pointwise credible set for horizons up to H: {C(α)h,m }h=0 = {[c(α)jh,m , c(α)jh,m ]}H h=0 . The superscript j indexes the different methods. 1 − α is the nominal size of the credible set: A fraction α of draws from the posterior or the bootstrapped distribution for Ih,m should lie under c(α)jh,m or

above c(α)jh,m . To assess the actual level of the credible sets of method j, I compute for each (h, m) whether the ˆ j }. The actual coverage truth at horizon h, Ih,m lies inside the pointwise credible set: 1{Ih,m ∈ C(α) h,m probability is then estimated as: 49

Appendix B describes the algorithm.

60

α ˆ jh

=

1 NM C

N MC X

ˆ j } 1{Ih,m ∈ C(α) h,m

m=1

Figure B.3 plots the deviation of the actual coverage probability for model j at horizon h from the nominal coverage probability: α ˆ jh − α for α = 0.68. The procedures that ignore “first stage” uncertainty about the covariance between the forecast errors and the external instruments understate the size of the credible sets substantially at short horizons (“Bayes – certain”, BaC and “Bootstrap – certain”, BoC), while the Bayesian procedure that allows for uncertainty about the covariance matrix errs on the conservative side (“Bayes–uncertain”, BaU). For the latter scheme, the actual level is typically only zero to five percentage points above the nominal level when half of the instruments are nonzero. 50 However, when only one in 20 observations for the instrument is non-zero, the actual level exceeds the nominal level by around ten percentage points. Output 47% non-zero

5% non-zero Level: Actual minus nominal for Y to FFR

40 30

30

20

20

10

10

0

0

90% non-zero

Level: Actual minus nominal for Y to FFR

40

Level: Actual minus nominal for Y to FFR

40 30 20 10 0 -10

-10

-10 Bayes-uncertain Bayes-certain Bootstrap-uncertain Bootstrap-certain

-20 -30 -40

0

5

10

15

-20 Bayes-uncertain Bayes-certain Bootstrap-uncertain Bootstrap-certain

-20 -30

20

-40

0

10

15

-40 20

-50

0

Inflation 47% non-zero

5% non-zero Level: Actual minus nominal for Inf to FFR

40

5

Bayes-uncertain Bayes-certain Bootstrap-uncertain Bootstrap-certain

-30

30 20 10

10

15

20

90% non-zero

Level: Actual minus nominal for Inf to FFR

30

5

Level: Actual minus nominal for Inf to FFR

40

20

30

10

20 10

0

0

0 -10

-10

-10 -20

-20 Bayes-uncertain Bayes-certain Bootstrap-uncertain Bootstrap-certain

-30 -40 -50

0

5

10

15

-20 Bayes-uncertain Bayes-certain Bootstrap-uncertain Bootstrap-certain

-30 -40

20

-50

-40 -50

0

5

10

15

Bayes-uncertain Bayes-certain Bootstrap-uncertain Bootstrap-certain

-30

20

0

5

10

15

20

With enough observations on narrative instruments, the proposed estimator (shown in solid blue) has good classical coverage probabilities. With few observations, the estimator seems to be too conservative, whereas estimators neglecting uncertainty in the covariance matrix provide misleading tight confidence bands.

Figure B.3: Monte Carlo analysis of actual minus nominal coverage probability α ˆjh − α With more than one instrument at a time, the coverage probability of the responses to the different shocks depends on the specific rotation of the shock. Overall, the Monte Carlo study suggests that the proposed Bayesian procedure accounts properly for the uncertain covariance between instruments and forecast errors in the context of the present application but may be conservative from a frequentist point of view both when there is little variation in the instruments or over longer forecast horizons. This motivates using available prior information to improve on the estimates.

50

The actual level depends slightly on the variable under consideration.

61

Frequentist inference Following Mertens and Ravn (2013), the bootstrap procedure I consider is characterized as follows: 1. For t = 1, . . . , T , draw {eb1 , . . . , ebT }, where ebt ∼ iid with Pr{ebt = 1} = Pr{ebt = −1} = 0.5. 2. Construct the artificial data for Ytb . In a VAR of lag length p, build Ytb , t > p as: • For t = 1, . . . , p set Ytb = Yt . • For t = p + 1, . . . , T construct recursively Ytb =

Pp

ˆ b j=1 Bj Yt−j

3. Construct the artificial data for the narrative instrument: • For t = 1, . . . , T construct recursively ztb = ebt ztb .

62

+ ebt u ˆt .

C

Data and additional results

C.1

Data construction

NIPA and Flow of Funds variables. I follow Smets and Wouters (2007) in constructing the variables of the baseline model, except for allocating durable consumption goods to investment rather than consumption expenditure. Specifically: (nominal GDP: NIPA Table 1.1.5Q, Line 1)t (Population above 16: FRED CNP16OV)t × (GDP deflator: NIPA Table 1.1.9Q, Line 1)t (nominal PCE on nondurables and services: NIPA Table 1.1.5Q, Lines 5+6)t = (Population above 16: FRED CNP16OV)t × (GDP deflator: NIPA Table 1.1.9Q, Line 1)t (Durables PCE and fixed investment: NIPA Table 1.1.5Q, Lines 4 + 8)t = (Population above 16: FRED CNP16OV)t × (GDP deflator: NIPA Table 1.1.9Q, Line 1)t = ∆ ln(GDP deflator: NIPA Table 1.1.9Q, Line 1)t ( 1 (Effective Federal Funds Rate: FRED FEDFUNDS)t t ≥ (1954:Q3 = 14 (else. 4 (3-Month Treasury Bill: FRED TB3MS)t

yt = ct it πt rt

(Nonfarm business hours worked: BLS PRS85006033)t (Population above 16: FRED CNP16OV)t (Nonfarm business hourly compensation: BLS PRS85006103)t wt = (GDP deflator: NIPA Table 1.1.9Q, Line 1)t nt =

f ief t , (Population above 16: FRED CNP16OV)t (nominal fixed investment: NIPA Table 1.1.5Q, Line 8)t =ω (Implicit price deflator fixed investment: NIPA Table 1.1.9Q, Line 8)t (nominal durable goods: NIPA Table 1.1.5Q, Line 4)t , + (1 − ω) (Implicit price deflator durable goods: NIPA Table 1.1.9Q, Line 4)t

kt = (1 − 0.015)kt−1 + f ief t

where ω is the average nominal share of fixed investment relative in the sum with durables. When using an alternative definition of hours worked from Francis and Ramey (2009), I compute: nFt R =

(Total hours worked: Francis and Ramey (2009))t (Population above 16: FRED CNP16OV)t

Fiscal data is computed following Leeper et al. (2010), except for adding state and local governments (superscript “s&l”) to the federal government account (superscript “f”), similar to FernandezVillaverde et al. (2015). Since in the real world

τtc =

(production & imports taxes: Table 3.2, Line 4)ft + (Sales taxes)ts&l

((Durables PCE)t + ct ) × (GDP deflator)t − (production & imports taxes)ft − (Sales taxes)ts&l (Personal current taxes)t τtp = 1 2 (Proprietors’ income)t + (wage income)t + (wage supplements)t + (capital income)t τtn = τtk =

τtp ( 21 (Proprietors’ income)t + (wage income)t + (wage supplements)t ) + (wage taxes)ft (wage income)t + (wage supplements)t + (wage taxes)ft + 12 (Proprietors’ income)t τtp (capital income)t + (corporate taxes)ft + (corporate taxes)ts&l (Capital income)t + (Property taxes)ts&l 63

where the following NIPA sources were used: • (Federal) production & imports taxes: Table 3.2Q, Line 4 • (State and local) sales taxes: Table 3.3Q, Line 7 • (Federal) personal current taxes: Table 3.2Q, Line 3 • (State and local) personal current taxes: Table 3.3Q, Line 3 • (Federal) taxes on corporate income minus profits of Federal Reserve banks: Table 3.2Q, Line 7 − Line 8. • (State and local) taxes on corporate income: Table 3.3Q, Line 10. • (Federal) wage tax (employer contributions for government social insurance): Table 1.12Q, Line 8. • Proprietors’ income: Table 1.12Q, Line 9 • Wage income (wages and salaries): Table 1.12Q, Line 3. • Wage supplements (employer contributions for employee pension and insurance): Table 1.12Q, Line 7. • Capital income = sum of rental income of persons with CCAdj (Line 12), corporate profits (Line 13), net interest and miscellaneous payments (Line 18, all Table 1.12Q) Note that the tax base for consumption taxes includes consumer durables, but to be consistent with the tax base in the model, the tax revenue is computed with the narrower tax base excluding consumer durables. τtc × (ct − (Taxes on production and imports)ft − (Sales taxes)ts&l ((Population above 16)t × ((GDP deflator)t 1 (rev)nt = τtn × ((wage income)t + (wage supplements)t + (wage taxes)ft + (Proprietors’ income)t ) 2 k k s&l (rev)t = τt × ((Capital income)t + (Property taxes)t ) (rev)ct =

I construct government debt as the cumulative net borrowing of the consolidated NIPA government sector and adjust the level of debt to match the value of consolidated government FoF debt at par value in 1950:Q1. A minor complication arises as federal net purchases of nonproduced assets (NIPA Table 3.2Q, Line 43) is missing prior to 1959Q3. Since these purchases typically amount to less than 1% of federal government expenditures with a minimum of -1.1%, a maximum of 0.76%, and a median of 0.4% from 1959:Q3 to 1969:Q3, two alternative treatments of the missing data lead to virtually unchanged implications for government debt. First, I impute the data by imposing that the ratio of net purchases of nonproduced assets to the remaining federal expenditure is the same for all quarters from 1959:Q3 to 1969:Q4. Second, I treat the missing data as zero. In 2012 the FoF data on long term municipal debt was revised up. The revision covers all quarters since 2004 but not before, implying a jump in the debt time series. 51 I splice together a new smooth series from the data before and after 2004 by imposing that the growth of municipal debt from 2003:Q4 to 2004:Q1 was the same before and after the revision. This shifts up the municipal and consolidated debt levels prior to 2004. The revision in 2004 amounts to $840bn, or 6.8% of GDP. 51 www.bondbuyer.com/issues/121 84/holders-municipal-debt-1039214-1.html “Data Show Changes in Muni Buying Patterns” by Robert Slavin, 05/01/2012 (retrieved 01/24/2014).

64

Measured expectations and shock proxies. To control for fiscal foresight, I compile two series on the four quarter ahead federal purchases of goods and services and revenue growth from the Greenbook. To match the Greenbook data to quarters, I use the Greenbook before but closest to the middle of the second month of each quarter. This broadly matches the timing of the SPF that underlies the short-run data in Ramey (2011). It also allows me to use already digitized data on price deflators from the Real Time Data Center website at the Federal Reserve Bank of Philadelphia. Missing data is unproblematic for the defense spending forecast errors, but would be more challenging to handle in a VAR. From 1966:Q1 to 1973:Q2, some observations on three and four quarter ahead forecasts government purchases and revenue are missing. In these case, I impute them based on current and up to two quarter ahead revenue and government spending. For revenue forecasts, I additionally use Greenbook real GDP growth forecasts. I treat the imputed data as the actual data. The above data are combined with data from Mertens and Ravn (2013) on narrative tax shock measures and new data on defense spending and monetary policy shocks constructed in the spirit of the data provided by Ramey (2011) on short-term defense spending shocks and the monetary policy shock proxy in Romer and Romer (2004). For updating the instruments, I also use Greenbook data to update the shock series from Romer and Romer (2004). After their sample ends in 1996, I use the change in the Federal Funds Target Rate (DFEDTAR in the FRED database) to compute the desired change in the FFR rate. As in Romer and Romer (2004) I then construct the shock measure as the residual from a regression of the change in the target at an FOMC meeting on the prevailing level of the funds rate, unemployment, plus levels and changes of current and future real GDP growth and inflation. I construct inflation as the difference between the forecast for nominal and real GDP in the Greenbook. The right panels in Figure C.4 compare my updated series with the Romer and Romer (2004) series. The correlation is 0.93 over the entire sample period with observed shocks. Ramey (2011) provides one-quarter ahead forecast errors from the Survey of Professional Forecasters (SPF) for defense spending. This series runs from 1967 to 1982. The Greenbook, in contrast, provides forecasts for defense spending on a quarterly basis since 1967. I construct the defense spending forecast error as the forecast error in the implied real defense spending growth: n n EGB t [gDef,t+1 − πt+1 ] − (gDef,t − πt ). The left panels in Figure C.4 compare my updated series with the SPF series. The correlation is 0.84 over the entire sample period with observed shocks.

65

Time series comparison Government spending proxy Monetary policy proxy Defense spending forecast error correlation = 0.84 10

4

SPF Greenbook

Romer & Romer (2004) Update

2 forecast errors (%)

5

forecast errors (%)

Monetary shock proxy correlation = 0.93

0

-5

0 -2 -4 -6

-10

-8

1970

1980

1990

2000

1970

1980

1990

2000

Scatter plots Government spending proxy

Monetary policy proxy

4 2 0 -2 -4 -6 -8 -10

4 Updated shock proxy (%)

Greenbook forecast errors (%)

Defense spending forecast error correlation = 0.84 6

-5 0 5 SPF forecast errors (%)

2 0 -2 -4 -6 -8 -8

10

Monetary shock proxy correlation = 0.93

-6 -4 -2 0 2 4 Romer & Romer (2004) shock proxy (%)

Figure C.4: Comparing shock proxies in the literature with their updated counterparts

66

C.2

Approximation quality of VAR representation of DSGE model

67

G shock Tax rate

G to G

3

Debt

tax to G

1.4 1.2

0.8

%

%

1

2

p.p.

2.5

1.5

0.6

1

0.4

0.5

0.2

0 5

10

15

20

FFR

0.7

12

0.6

10

0.5

8

0.4

6

0.3

4

0.2

5

10

15

20

0.1 0 0

Inflation

FFR to G

0

14

0 0

5

10

15

20

0

5

10

15

20

Investment

Inf to G

0

Y to G

0.8

2

0 0

Output

Debt to G

16

%

G 3.5

Inv to G

1

-0.02

-0.02

0.5

-0.04

-0.06

-0.06

p.p.

p.p.

%

-0.04 0

-0.08 -0.08 -0.1 -0.1

-0.5

-0.12

-0.12

-0.14 0

5

10

15

20

-1 0

5

10

15

20

0

5

10

15

20

Tax shock G

Tax rate

G to tax

0.12 0.1

Debt

tax to tax

1.2 1

0

0.8

-0.5

Output

Debt to tax

0.5

Y to tax

0.1 0.05

0.6

0 %

%

p.p.

0.06

%

0.08

-1

-0.05

0.04 0.4

-1.5

-0.1

0.2

-2

-0.15

0.02 0 -0.02

0 0

5

10

15

20

-2.5 0

FFR

15

20

0.03

0.03

0.02

0.02

0.01

0

-0.01

-0.01

-0.02

-0.02

-0.03 10

15

20

15

20

0

5

10

15

20

0 -0.1 -0.2 -0.3 -0.4

-0.03 5

10

Inv to tax

0.1

0.01

0

5

Investment

p.p.

0.04

0

-0.2 0

Inf to tax

0.05

0.04

p.p.

%

10

Inflation

FFR to tax

0.05

5

-0.5 0

5

10

15

20

0

5

10

15

20

FFR shock G

Tax rate

G to FFR

0.8 0.6

Debt

tax to FFR

0.8 0.6

Output

Debt to FFR

20

Y to FFR

1

15

0

10

-1

0.4

0.2

0

%

%

p.p.

0.2

%

0.4

5

-2

0

-3

0 -0.2 -0.4

-0.2

-0.6

-0.4 0

5

10

15

20

-5 0

FFR

10

15

20

-4 0

Inflation

FFR to FFR

1.2

5

10

15

20

0

5

10

15

20

Investment

Inf to FFR

0.2

5

Inv to FFR

3

1

2 0

0.8

1 0

0.4

p.p.

-0.2 p.p.

%

0.6

-0.4

0.2

-1 -2

0

-3 -0.6

-0.2

-4

-0.4

-0.8 0

5

10

15

20

-5 0

5

10

15

20

0

DSGE-VAR

5

10

15

20

Pure DSGE

The IRFs implied by the posterior over θ largely coincide independent of whether I use the VAR(4) approximation or the state-space representation of the DSGE model. Note: Shown are the pointwise median and 68% and 90% posterior credible sets. Results based on lower Cholesky factorization of S1 S1′ .

Figure C.5: Responses of output, investment, and inflation: Quality of VAR approximation to DSGE model (T0V = T0B ր ∞)

68

6

Tax shocks tax shock correlation = 0.99 (0.96,1.00) standard deviations

standard deviations

G shocks G shock correlation = 1.00 (1.00,1.00)

4 2 0 -2 1950

1960

1970

1980

1990

2000

5 0 -5 1950

1960

1970

1980

1990

2000

FFR shocks standard deviations

FFR shock correlation = 1.00 (0.98,1.00) 6 4 2 0 -2 -4 1950

1960

1970

1980

1990

2000

DSGE-VAR

Pure DSGE

The historical shocks implied by the posterior over θ largely coincide independent of whether I use the VAR(4) approximation or the state-space representation of the DSGE model. Note: Shown are the pointwise median and 68% and 90% posterior credible sets of shocks along with the median, 5th, and 95th percentiles of the shock correlations. Results based on lower Cholesky factorization of S1 S1′ .

Figure C.6: Historical policy shocks: Quality of VAR approximation to DSGE model (T0V = T0B ր ∞)

69

G G to G

1.5

(a) G shock tax rate

expected G expected G to G

1.5

Output

tax to G

1 0.8

0.5

0.6

0

expected Y to G

0

%

-0.5 %

%

p.p.

1 0.5

%

1 0.5

expected output

Y to G

1

0

0

0.4

-0.5

-0.5

-0.5

0.2

-1

-1

-1

0

-1 0

5

10

15

20

0

FFR

10

15

20

-1.5 0

5

Inflation

FFR to G

0.2

5

15

20

5

Debt to G

10

10

15

20

0

5

10

15

20

capital capital to G

0.5

8 0.15

-1.5 0

Debt

Inf to G

0.2

10

0

0.15

0.1

%

0.1

%

-0.5

p.p.

%

6 4

-1 2 0.05

-1.5

0 0 0

5

10

15

20

-2 0

G

10

15

20

-2 0

expected G to FFR

1

1

0

15

20

0

5

-1

10

15

20

Output

expected output

Y to FFR

2

expected Y to FFR

2

1

1 p.p.

%

10

tax to FFR

1.5

0 %

5

(b) FFR shock tax rate

expected G

G to FFR

2

5

1

0 %

0

0.5

%

0.05

-1

0

-1

-2 -3 0

5

10

15

20

-2

0

-3

-0.5 0

5

FFR

10

20

-2 -3 0

5

Inflation

FFR to FFR

1

15

-1

10

15

20

-2 0

5

Debt

Inf to FFR 0

15

20

0

5

10

15

20

capital

Debt to FFR

15

10

capital to FFR

3

10

2

-0.5

%

5 %

%

p.p.

0.5 1

0

0 0

-5 -0.5

-1 0

5

10

15

20

-10 0

10

15

20

-1 0

5

10

15

20

0

5

10

15

20

(c) Separate estimate with surprise tax shock expected G tax rate Output

G G to tax

0.3

5

expected G to tax

0.3

tax to tax

1 0.8

0.2

expected output

Y to tax

expected Y to tax

0.1

0

0

0.2

0.4

%

0.1

-0.1 %

0.1

p.p.

%

%

0.6 -0.1

-0.2

0.2 0

0

-0.1

-0.1

0 5

10

15

20

0

FFR

10

15

0

-0.01

-0.01 10

15

20

15

20

5

10

15

20

5

10

15

20

0

5

10

15

capital

0

0

-0.5

-0.1

-1

-0.2

-1.5

-0.3

-2 0

0

capital to tax

0.1

%

p.p.

%

0.01

0.01 0

10

Debt to tax

0.5

0.02

0.02

5

5

Debt

Inf to tax

0.03

0.03

0

-0.4

0

20

Inflation

FFR to tax

0.04

5

%

0

-0.3

-0.2

-0.2

-0.4 0

5

10

15

DSGE-VAR

20

0

5

10

15

20

Pure DSGE

Figure C.7: Responses in expectations-augmented DSGE-VAR: Quality of VAR approximation

70

20

C.3

Additional results

marginal data density

-2410 no cost channel with cost channel Taylor rule with output gap

-2420 -2430 -2440 -2450 -2460 -2470 -2480 .2

.5 TV 0

1 /T

Figure C.8: Robustness of marginal likelihood for varying DSGE model weights: Model specification Debt/Output to G shock

Debt/Output to Tax

Debt to G

3

Debt/Output to FFR shock

Debt to tax

3

Debt to FFR

10

2 2

1

5 %

%

%

1 0 -1

0

0 -2 -1

-3 0

5

10

15

20

-5

0

5

Output to G shock

15

20

0

Output to Tax

Y to G

1.5

10

Y to tax

1.5

5

10

15

20

Output to FFR shock Y to FFR

2

1

0

1

0.5

%

%

%

0.5 0

-2

-0.5

-4

0 -1 -0.5

-6

-1.5 0

5

10

15

20

0

5

10

DSGE-VAR

15

20

0

5

10

15

20

Pure DSGE

Note: Shown are the pointwise median and 68% and 90% posterior credible sets. Results based on lower Cholesky factorization of S1 S1′ .

Figure C.9: Response of the debt-to-output ratio to the identified policy shocks

71

Private output multipliers to G 1.5

PrivMult to TAX shock 20

1

15

0.5

10 %

%

Private output multipliers to taxes

PrivMult to G shock

0 -0.5

5 0

-1

-5 -1.5 0

5

10 quarters

15

0

20

5

DSGE-VAR

10 quarters

15

20

Pure DSGE

Note: Shown are the pointwise median and 68% and 90% posterior credible sets. Results based on lower Cholesky factorization of S1 S1′ .

Figure C.10: Private output multipliers: Best-fitting model (T0V = 15 T, T0B = 4 × T ).

1

1

0.5

1.5 1

0.5

0.5

0

0

0

0

-0.5

-0.5

-0.5

-0.5

5

10 quarters

15

20

0

5

10 quarters

15

20

0

5

10 quarters

15

20

0

2

2

0

0

0

0

-2

-4

-2

-4 0

5

10 quarters

15

20

%

2

%

2

%

5

10 quarters

15

20

5

10 quarters

15

20

0

0.5

0

0

0

0

-1

-0.5

-1 0

5

10 quarters

15

20

p.p.

0.5

p.p.

0.5

-0.5

5

10 quarters

posterior median

15

20

15

20

5

10 quarters

15

20

15

20

-0.5

-1 0

10 quarters

-4 0

0.5

-0.5

5

-2

-4 0

p.p.

%

0.5

(d) Post 1966 + observed E

%

1

%

1.5

-2

p.p.

Private Output to FFR

(c) Post 1966

1.5

0

Inflation to FFR

(b) Updated IV

1.5

%

%

FFR to FFR

(a) IV from literature

-1 0

5

68% posterior CS

10 quarters

15

20

0

5

10 quarters

90% posterior CS

Figure C.11: Effects of the information set and sample period: response of output and inflation to Federal funds rate.

72

-1 0

PrivY to FFR shock

5

10 quarters

15

20

15

20

Inf to FFR shock

1

5

15

0

-1 5

10 quarters

15

20

10 quarters

15

20

5

10 quarters

15

20

posterior median

% -4 5

10 quarters

15

20

0

Inf to FFR shock

10 quarters

15

20

68% posterior CS

5

10 quarters

15

20

Inf to FFR shock

1 0.5

0

0 -0.5

-1 5

20

0

-0.5

0

15

PrivY to FFR shock

0.5

-1 0

10 quarters

-2

1

0

5

2

0

Inf to FFR shock

-0.5

-1 0

0

-4 5

0.5

-0.5

20

0

1

p.p.

p.p.

0 -0.5

15

-2

0

0.5

10 quarters

%

20

Inf to FFR shock

1

0.5

10 quarters

5

PrivY to FFR shock

-4 0

-1 0

2

%

% 10 quarters

20

-2

-4 5

15

0

-2

-4

10 quarters

2

0

%

0 -2

5

PrivY to FFR shock

2

0

-1 0

PrivY to FFR shock

2

0

%

20

0

p.p.

15

1

%

%

% 10 quarters

FFR to FFR shock

2

1

0

-1 5

(e) Same Minnesota prior

FFR to FFR shock

2

1

0

-1

(d) Bonds spreads, oil price

FFR to FFR shock

2

1

0

p.p.

-1 0

5

10 quarters

15

20

0

5

10 quarters

15

20

90% posterior CS

5

standard deviations

1949Q2 – 1969Q4 tax shock correlation = 0.60

0 -5 1950

1955

1960

1965

1970Q1 – 1989Q4 tax shock correlation = 0.73

standard deviations

standard deviations

Figure C.12: Effects of additional controls on the responses of output and inflation to Federal funds rate: Post 1966 sample.

standard deviations

Private Output to FFR

FFR to FFR shock

2

1

(c) Same Minnesota prior

p.p.

FFR to FFR shock

2

0

Inflation to FFR

(b) Et [ˆ yt+4 ], Et [ˆ gt+4 ] Et [ˆ πt+4 ]

%

FFR to FFR

(a) Et [ˆ yt+4 ], Et [ˆ gt+4 ]

5 0 -5 1970

1975

1980

1985 DSGE-VAR

1990Q1 – 2007Q4 tax shock correlation = 0.58 5 0 -5 1990

1995

2000

2005

tax shock correlation = 0.51 (0.23,0.76) 5 0 -5 1950 1960 1970 1980 1990 2000 Pure DSGE

Estimated historical shocks. Note: Shown are the pointwise median and 68% and 90% posterior credible sets of shocks along with the median, 5th, and 95th percentiles of the shock correlations for the full sample. Results based on lower Cholesky factorization of S1 S1′ .

Figure C.13: Historical tax shocks: Best-fitting model (T0V = 51 T, T0B = 4 × T ).

73

Parameter st.dev.-TFP AR(1)-TFP st.dev.-Transf. smoothing-Transf. st.dev.-G AR(1)-G st.dev.-Tax AR(1)-Tax st.dev.-qs AR(1)-qs st.dev.-FFR AR(1)-FFR st.dev.-Infl. AR(1)-Infl. Adj. cost Util. cost Fixed cost Habit Labor supply ela. Calvo prices Index. prices Calvo wages Index. wages Taylor-Infl. Taylor-GDP smoothing-FFR G-to-GDP G-to-Debt smoothing-G Tax-to-GDP Tax-to-Debt smoothing-Tax Transf.-to-GDP Transf.-to-Debt Rel. st. dev. IV-G Rel. st. dev. IV-Tax Rel. st. dev. IV-FFR Loading IV-G Loading IV-Tax Loading IV-FFR

Prior mean 1.000 0.500 1.000 0.500 1.000 0.500 0.500 0.500 1.000 0.500 1.000 0.500 1.000 0.500 4.000 0.500 1.250 0.700 2.000 0.500 0.500 0.500 0.500 1.500 0.125 0.750 0.500 1.000 0.500 1.000 1.000 0.500 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

(SD) (2.000) (0.200) (2.000) (0.200) (2.000) (0.200) (0.100) (0.200) (2.000) (0.200) (2.000) (0.200) (2.000) (0.200) (1.500) (0.150) (0.125) (0.100) (0.750) (0.100) (0.150) (0.100) (0.150) (0.250) (0.050) (0.100) (0.500) (0.500) (0.200) (0.500) (0.500) (0.200) (0.500) (0.500) (0.500) (0.500) (0.500) (0.500) (0.500) (0.500)

Weak prior T0V = T0B = 15 × T Mean (SD) 1.036 (0.209) 0.343 (0.167) 0.726 (0.109) 0.860 (0.186) 0.819 (0.115) 0.894 (0.072) 0.339 (0.030) 0.500 (0.174) 0.555 (0.118) 0.792 (0.125) 0.194 (0.022) 0.337 (0.156) 0.162 (0.034) 0.344 (0.164) 4.735 (1.305) 0.405 (0.133) 1.352 (0.101) 0.735 (0.058) 2.125 (0.715) 0.700 (0.066) 0.433 (0.163) 0.684 (0.087) 0.498 (0.148) 1.606 (0.174) 0.124 (0.046) 0.856 (0.034) 0.398 (0.425) 0.779 (0.404) 0.952 (0.046) 0.649 (0.472) 0.402 (0.231) 0.969 (0.032) 0.354 (0.501) 1.161 (0.505) 1.953 (0.364) 0.628 (0.094) 1.730 (0.341) 0.981 (0.277) 0.429 (0.077) 1.753 (0.380)

Best-fitting model T0V = 15 T, T0B = 4 × T Mean (SD) 1.193 (0.199) 0.320 (0.135) 1.189 (0.316) 0.417 (0.198) 0.966 (0.164) 0.577 (0.075) 0.332 (0.029) 0.160 (0.100) 0.776 (0.137) 0.425 (0.133) 0.168 (0.021) 0.238 (0.096) 0.171 (0.027) 0.212 (0.104) 4.922 (1.308) 0.523 (0.154) 1.352 (0.106) 0.593 (0.074) 3.280 (1.045) 0.732 (0.051) 0.244 (0.097) 0.655 (0.084) 0.504 (0.144) 1.602 (0.215) 0.126 (0.048) 0.815 (0.035) 0.439 (0.570) 0.756 (0.402) 0.998 (0.001) 0.235 (0.551) 0.226 (0.215) 0.853 (0.140) 0.571 (0.350) 0.942 (0.604) 1.755 (0.354) 0.674 (0.095) 2.304 (0.439) 0.774 (0.192) 0.383 (0.068) 1.335 (0.382)

Stronger prior T0V = T, T0B = 4T Mean (SD) 1.341 (0.204) 0.269 (0.118) 1.603 (0.176) 0.292 (0.114) 1.196 (0.080) 0.991 (0.002) 0.323 (0.020) 0.137 (0.077) 0.782 (0.118) 0.476 (0.113) 0.196 (0.013) 0.189 (0.082) 0.186 (0.026) 0.216 (0.099) 6.301 (1.165) 0.435 (0.127) 1.423 (0.111) 0.635 (0.066) 3.405 (1.191) 0.815 (0.051) 0.236 (0.092) 0.682 (0.078) 0.521 (0.159) 1.283 (0.229) 0.164 (0.052) 0.825 (0.029) 0.345 (0.336) 0.383 (0.191) 0.723 (0.064) 0.407 (0.455) 0.389 (0.359) 0.939 (0.057) 0.204 (0.270) 1.208 (0.278) 1.449 (0.133) 0.703 (0.062) 2.118 (0.198) 0.611 (0.108) 0.291 (0.041) 1.364 (0.210)

Note: Shown are the prior and posterior mean and standard deviation for the DSGE model parameters. Parameters that determine the shock processes are shown in the top, policy rule estimates in the middle, and the observation equations for instruments in the bottom. Overall, the posteriors differ significantly from the priors, consistent with identification of the parameters.

Table C.1: DSGE-VAR model parameter estimates with varying priors

74

Parameter st.dev.-TFP AR(1)-TFP st.dev.-Transf. smoothing-Transf. st.dev.-G AR(1)-G st.dev.-f-G st.dev.-f-TFP st.dev.-Tax AR(1)-Tax st.dev.-qs AR(1)-qs st.dev.-FFR AR(1)-FFR st.dev.-Infl. AR(1)-Infl. Adj. cost Util. cost Fixed cost Habit Labor supply ela. Calvo prices Index. prices Calvo wages Index. wages Taylor-Infl. Taylor-GDP smoothing-FFR G-to-GDP G-to-Debt smoothing-G cpsi-f-G-GDP cpsi-f-G-Debt Tax-to-GDP Tax-to-Debt smoothing-Tax Transf.-to-GDP Transf.-to-Debt Rel. st. dev. IV-G Rel. st. dev. IV-FFR Loading IV-G Loading IV-FFR

Prior mean 1.000 0.500 1.000 0.500 1.000 0.500 1.000 1.000 0.500 0.500 1.000 0.500 1.000 0.500 1.000 0.500 4.000 0.500 1.250 0.700 2.000 0.500 0.500 0.500 0.500 1.500 0.125 0.750 0.500 1.000 0.500 0.500 1.000 1.000 1.000 0.500 0.000 1.000 1.000 1.000 1.000 1.000

(SD) (2.000) (0.200) (2.000) (0.200) (2.000) (0.200) (2.000) (2.000) (0.100) (0.200) (2.000) (0.200) (2.000) (0.200) (2.000) (0.200) (1.500) (0.150) (0.125) (0.100) (0.750) (0.100) (0.150) (0.100) (0.150) (0.250) (0.050) (0.100) (0.500) (0.500) (0.200) (0.500) (0.500) (0.500) (0.500) (0.200) (0.500) (0.500) (0.500) (0.500) (0.500) (0.500)

Weak prior T0V = T0B = 41 × T Mean (SD) 0.322 (0.067) 0.740 (0.046) 0.782 (0.100) 0.949 (0.024) 0.802 (0.143) 0.779 (0.088) 2.228 (0.443) 0.832 (0.167) 0.354 (0.034) 0.402 (0.171) 1.570 (0.360) 0.861 (0.050) 0.184 (0.023) 0.203 (0.102) 0.182 (0.026) 0.152 (0.078) 0.766 (0.190) 0.958 (0.006) 1.472 (0.100) 0.477 (0.071) 1.490 (0.446) 0.408 (0.082) 0.826 (0.073) 0.884 (0.044) 0.566 (0.148) 1.508 (0.192) 0.103 (0.034) 0.828 (0.033) 0.972 (0.266) 0.760 (0.189) 0.188 (0.079) 0.272 (0.480) 0.470 (0.243) 0.897 (0.372) 0.316 (0.248) 0.869 (0.105) 0.498 (0.303) 1.088 (0.389) 1.838 (0.473) 1.770 (0.492) 0.995 (0.271) 1.946 (0.531)

Best-fitting model T0V = 14 T, T0B = 7.5 × T Mean (SD) 0.565 (0.069) 0.828 (0.031) 0.681 (0.078) 0.710 (0.073) 0.713 (0.098) 0.846 (0.046) 0.721 (0.101) 0.555 (0.068) 0.357 (0.033) 0.182 (0.101) 1.020 (0.181) 0.512 (0.102) 0.153 (0.017) 0.183 (0.100) 0.186 (0.024) 0.170 (0.078) 0.595 (0.148) 0.933 (0.010) 1.658 (0.103) 0.461 (0.062) 2.707 (1.602) 0.422 (0.052) 0.340 (0.134) 0.757 (0.134) 0.574 (0.122) 1.631 (0.278) -0.040 (0.019) 0.724 (0.047) 0.534 (0.132) 0.245 (0.072) 0.123 (0.062) -0.157 (0.476) 0.426 (0.243) 0.441 (0.179) 0.057 (0.032) 0.557 (0.139) 1.386 (0.179) 0.230 (0.129) 2.212 (0.326) 3.009 (0.415) 0.890 (0.242) 0.889 (0.218)

Note: Shown are the prior and posterior mean and standard deviation for the DSGE model parameters. Parameters that determine the shock processes are shown in the top, policy rule estimates in the middle, and the observation equations for instruments in the bottom. Overall, the posteriors differ significantly from the priors, consistent with identification of the parameters.

Table C.2: DSGE-VAR model parameter estimates with varying priors in model with news shocks

75

standard deviations

5 0 -5

1950

1955

1960

1965

1970Q1 – 1989Q4 FFR shock correlation = 0.90

standard deviations

standard deviations standard deviations

1949Q2 – 1969Q4 FFR shock correlation = 0.50

10 5 0 -5 1970

1975

1980

1985 DSGE-VAR

1990Q1 – 2007Q4 FFR shock correlation = 0.75

2 0 -2 1990

1995

2000

2005

FFR shock correlation = 0.77 (0.66,0.85) 10 5 0 -5 1950 1960 1970 1980 1990 2000

Pure DSGE

Estimated historical shocks. Note: Shown are the pointwise median and 68% and 90% posterior credible sets of shocks along with the median, 5th, and 95th percentiles of the shock correlations for the full sample. Results based on lower Cholesky factorization of S1 S1′ .

Figure C.14: Historical FFR shocks: Best-fitting model (T0V = 51 T, T0B = 4 × T ).

76

Baseline model Flat prior Best-fitting model T0B = 4T, T0V = 51 T G to G shock

2.5

Model with news + obs. E Flat prior Best-fitting model T0B = 7.5T, T0V = 14 T

G to G shock

2.5

G to G shock

2.5

1

1 0.5

1 0.5

0

0

-0.5 0

5

10 quarters

15

20

5

10 quarters

15

20

15

20

Mult to G shock

5

20

0

PrivY to G shock

5

10 quarters

15

20

Mult to G shock

5

5

10 quarters

15

20

Mult to G shock

5

3

3

3

3

2

2

2

2

1

1

1

1

0

0

0

0

-1

-1

-1

-1

20

0

5

10 quarters

posterior median

15

20

0

5

68% posterior CS

10 quarters

15

5

20

10 quarters

15

20

Mult to G shock

5 4

15

PrivY to G shock

0

4

10 quarters

20

-0.5 0

4

5

15

0

4

0

10 quarters

0.5

-0.5 0

5

1

0

-0.5 10 quarters

15

% 0

5

10 quarters

0.5

%

% 0

0

5

1

0.5

-0.5

-0.5 0

PrivY to G shock

1

0.5

0

-0.5 0

PrivY to G shock

1

1 0.5

%

0 -0.5

%

2 1.5

%

2 1.5

%

2 1.5

%

2 1.5

0.5

G to G shock

2.5

0

5

10 quarters

15

20

90% posterior CS

Figure C.15: Comparison of VAR and DSGE-VAR estimated with and without expectations: Response to a government spending shock.

77

Baseline model Flat prior Best-fitting model T0B = 4T, T0V = 15 T FFR to FFR shock

2

0

-1

0

-1 0

5

10 quarters

15

1

% 0

20

0

-1 0

PrivY to FFR shock

5

10 quarters

15

20

-1 0

PrivY to FFR shock

5

10 quarters

15

20

0

PrivY to FFR shock

0

0

0

-2

-2

-2

-4

-4

-4

10 quarters

15

20

2

%

2

5

PrivY to FFR shock

%

2

%

2

%

FFR to FFR shock

2

1

%

1

%

1

FFR to FFR shock

2

%

FFR to FFR shock

2

Model with news + obs. E Flat prior Best-fitting model T0B = 7.5T, T0V = 41 T

0

-2

-4 5

10 quarters

15

20

0

Inf to FFR shock

1

10 quarters

15

20

0

Inf to FFR shock

1

-0.5

0 -0.5

-1 5

10 quarters

15

20

15

20

0

Inf to FFR shock

5

posterior median

10 quarters

15

20

10 quarters

15

20

Inf to FFR shock

0.5

0

0 -0.5

-1 0

5

1

-0.5

-1 0

10 quarters

0.5 p.p.

0

5

1

0.5 p.p.

0.5 p.p.

5

p.p.

0

-1 0

68% posterior CS

5

10 quarters

15

20

0

5

10 quarters

15

20

90% posterior CS

Figure C.16: Comparison of VAR and DSGE-VAR estimated with and without expectations: Response to a monetary policy shock.

78

C.4

Gibbs sampler

To calibrate the Gibbs sampler, I examine the autocorrelation functions and Brooks and Gelman (1998)-type convergence statistics of all model parameters within Markov-chains. See Figures C.18 and C.19 for the flat prior VAR and the DSGE-VAR, respectively. If the distributions differ visibly for different parts of the sample, I increase the number of draws. Similarly, I compute the autocorrelation of the maximum eigenvalue of the stacked VAR(1) representation of (2.2) as well as of the Frobenius norm of V and the log-likelihood. Figure C.17 shows the corresponding plots. With a flat prior, I discard the first 50,000 draws and keep every 20th draw with a total accept sample of 5,000 for the DSGE-VAR and 2,000 for the flat prior VAR. This produces results consistent with convergence of the sampler (see Figures C.18 and C.19). The resulting samples are also reasonably efficient: the autocorrelation of the subsamples in Figure C.17 are reasonably small, particular with low prior weights on the DSGE model. Maximum eigenvalue of (companion form of) B Weak prior: T V = T B = 15 T Best-fitting model: T V = 51 T, T B = 4T Pearson Spearman

Pearson Spearman

0.8

autocorrelation

autocorrelation

0.8 0.6 0.4 0.2

0.6 0.4 0.2

0

0

-0.2 2

4

6

8

10

2

4

8

10

0

2

4

Pearson Spearman

8

10

0.4 0.2

0.6 0.4 0.2

0

0

-0.2 10

lags

0.6 0.4 0.2 0

-0.2 8

Pearson Spearman

0.8

autocorrelation

0.6

6

6

lags

0.8

autocorrelation

autocorrelation

6

Maximum eigenvalue of V (= ||V ||) Weak prior: T V = T B = 51 T Best-fitting model: T V = 51 T, T B = 4T Pearson Spearman

0.8

4

0.2

lags

Flat prior

2

0.4

-0.2 0

lags

0

0.6

0

-0.2 0

Pearson Spearman

0.8

autocorrelation

Flat prior

-0.2 0

2

4

6

lags

8

10

0

2

4

6

8

10

lags

Note: Autocorrelations are reported based on both the Pearson and the Spearman correlation measure. Asymptotic classical 90% credible sets for the Pearson coefficient, computed under the assumption of zero correlation, are included around the horizontal axis. The autocorrelations are based on the thinned out sample after keeping every 20th draw with the informative prior and every 10th draw with the flat prior. The resulting sample is reasonably efficient also with a larger prior weight on the DSGE model.

Figure C.17: Gibbs-Sampler of baseline model: Autocorrelation functions of univariate summary statistics by DSGE prior weight

79

cumulative t-stats B

2 1 0 -1 -2 200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

cumulative t-stats V

2 1 0 -1 -2 -3

Shown are the (within-chain) means of the parameter estimates as the Markov chain grows. To standardize the plots, the parameter estimates are displayed minus their mean and standard deviation in the first half of the chain: For example, for element i of θ, the plot shows

P P⌊T /2⌋ t−1 ts=1 θs (i)−⌊T /2⌋−1 s=1 θs (i)   2 1/2 P P⌊T /2⌋ ⌊T /2⌋ ⌊T /2⌋−1 θs (i)−⌊T /2⌋−1 θu (i) s=1 u=1

as a function of t. Brooks and Gelman

(1998) argue that these means should have converged for a satisfying posterior simulation. The results above indicate that the convergence is very good for both the elements of the VAR coefficient matrix B and the covariance matrix V .

Figure C.18: Brooks and Gelman (1998) type convergence diagnostic for the flat-prior narrative VAR

80

Shown are the (within-chain) means of the parameter estimates as the Markov chain grows. To standardize the plots, the parameter estimates are displayed minus their mean and standard deviation in the first half of the chain: For example, for element i of θ, the plot shows

P P⌊T /2⌋ t−1 ts=1 θs (i)−⌊T /2⌋−1 s=1 θs (i)   2 1/2 P P⌊T /2⌋ ⌊T /2⌋ ⌊T /2⌋−1 θs (i)−⌊T /2⌋−1 θu (i) s=1 u=1

as a function of t. Brooks and Gelman

(1998) argue that these means should have converged for a satisfying posterior simulation. The results above indicate that the convergence is best for the elements of the VAR coefficients B and almost as good for the elements of the covariance matrix V . Some structural parameter draws seem to only settle down after about 4,000 draws.

Figure C.19: Brooks and Gelman (1998) type convergence diagnostic for DSGE-VAR with best-fitting model (T0B = 4T, T0V = 15 T )

81

A Narrative Approach to a Fiscal DSGE Model Abstract

where the dimension of the state vector is typically different across the VAR and the DSGE model but the shocks ǫt and ǫ∗ ...... 0 ,TV. 0 ) and estimate τ as Adjemian et al. (2008) do for a standard DSGE-VAR. The advantage of my approach is that I can report model fit independent of the prior over the hyperparameters τ.

5MB Sizes 1 Downloads 214 Views

Recommend Documents

a regime-switching dsge approach - Tao Zha
May 10, 2011 - shocks are fairly standard and well understood in the DSGE literature, but the capital depreciation .... regime-switching model than in the standard DSGE model (Farmer, Waggoner, and Zha,. 2009). ..... t )Ç«t,. (44) where Ç«t = [Ç«rt,

A DSGE Term Structure Model with Credit Frictions
the model fit of the data for macro, term structure and credit market variables, ...... and the recovery rate following entrepreneurs' defaults to be ξ = 0.70 to ..... economy with credit frictions generate precautionary saving motives which drive d

A Graphical Representation of an Estimated DSGE Model
∗Economic Research Department, Reserve Bank of Australia. ... the impact of productivity and mark-up shocks and the role of fiscal multipliers.2 ... Conversely as the cost of price adjustment rises, ψ → 0, implying that the .... On the aggregate

Non-stationary Hours in a DSGE Model
Haver Analytics' USECON database. Output is defined as non-farm business sector output (LXNFO) divided by civilian non-institutional population age 16 or ...

The Dataflow Model: A Practical Approach to ... - VLDB Endowment
Aug 31, 2015 - Though data processing systems are complex by nature, the video provider wants a .... time management [28] and semantic models [9], but also windowing [22] .... element, and thus translates naturally to unbounded data.

User Message Model: A New Approach to Scalable ...
z Nz|u; Nw|z the number of times word w has been assigned to topic z, and N·|z = ∑ w Nw|z; (·) zu mn the count that does not include the current assignment of zu mn. Figure 2 gives the pseudo code for a single Gibbs iteration. After obtaining the

A Model Based Approach to Modular Multi-Objective ...
Aug 13, 2010 - This is done by letting each individual Control Module Output, see Fig. 1, .... functions Vi : Rn → R, and constants bij ∈ R where bij ≥ bi( j+1) i ...

A Global-Model Naive Bayes Approach to the ...
i=1(Ai=Vij |Class)×P(Class) [11]. However this needs to be adapted to hierarchical classification, where classes at different levels have different trade-offs of ...

The Dataflow Model: A Practical Approach to ... - VLDB Endowment
Aug 31, 2015 - Support robust analysis of data in the context in which they occurred. ... product areas, including search, ads, analytics, social, and. YouTube.

A Bayesian Approach to Model Checking Biological ...
1 Computer Science Department, Carnegie Mellon University, USA ..... 3.2 also indicates an objective degree of confidence in the accepted hypothesis when.

The Dataflow Model: A Practical Approach to ... - VLDB Endowment
Aug 31, 2015 - usage statistics, and sensor networks). At the same time, ... campaigns, and plan future directions in as close to real ... mingbird [10] ameliorates this implementation complexity .... watermarks, provide a good way to visualize this

a model-driven approach to variability management in ...
ther single or multi window), and it has a default value defined in the signature of the template .... syntax of a FSML to the framework API. Similarly to the binding ...

A Continuous Max-Flow Approach to Potts Model
1 Computer Science Department, University of Western Ontario, London Ontario, ... 3 Division of Mathematical Sciences, School of Physical and Mathematical ... cut problem where only provably good approximate solutions are guaranteed,.

A Uniform Approach to Inter-Model Transformations - Semantic Scholar
i=1(∀x ∈ ci : |{(v1 ::: vm)|(v1 ::: vm)∈(name c1 ::: cm) Avi = x}| ∈ si). Here .... uates to true, then those instantiations substitute for the same free variables in ..... Transactions on Software Engineering and Methodology, 6(2):141{172, 1

A Bayesian Approach to Model Checking Biological ...
of the system interact and evolve by obeying a set of instructions or rules. In contrast to .... because one counterexample to φ is not enough to answer P≥θ(φ).

pdf-1873\feminist-security-studies-a-narrative-approach-prio-new ...
Connect more apps... Try one of the apps below to open or edit this item. pdf-1873\feminist-security-studies-a-narrative-approach-prio-new-security-studies.pdf.