Quasi-Bayesian Model Selection∗ Atsushi Inoue



Vanderbilt University

Mototsugu Shintani‡ The University of Tokyo

July 2017

Abstract In this paper we establish the consistency of the model selection criterion based on the quasi-marginal likelihood obtained from Laplace-type estimators. We consider cases in which parameters are strongly identified, weakly identified and partially identified. Our Monte Carlo results confirm our consistency results. Our proposed procedure is applied to select among New Keynesian macroeconomic models using US data.



We thank Frank Schorfheide and two anonymous referees for constructive comments and suggestions. We

thank Matias Cattaneo, Larry Christiano, Yasufumi Gemma, Kengo Kato, Lutz Kilian, Takushi Kurozumi, Jae-Young Kim, Vadim Marmer, for helpful discussions and Mathias Trabandt for providing the data and code. We also thank the seminar and conference participants for helpful comments at the Bank of Canada, Gakushuin University, Hitotsubashi University, Kyoto University, Texas A&M University, University of Tokyo, University of Michigan, Vanderbilt University, 2014 Asian Meeting of the Econometric Society and the FRB Philadelphia/NBER Workshop on Methods and Applications for DSGE Models. Shintani gratefully acknowledges the financial support of Grant-in-aid for Scientific Research. † Department of Economics, Vanderbilt University, 2301 Vanderbilt Place, Nashville, TN 37235. Email: [email protected]. ‡ RCAST, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan. [email protected].

Email:

1

Introduction

Thanks to the development of fast computers and accessible software packages, Bayesian methods are now commonly used in the estimation of macroeconomic models. Bayesian estimators get around numerically intractable and ill-shaped likelihood functions, to which maximum likelihood estimators tend to succumb, by incorporating economically meaningful prior information. In a recent paper, Christiano, Trabandt and Walentin (2011) propose a new method of estimating a standard macroeconomic model based on the criterion function of the impulse response function (IRF) matching estimator of Christiano, Eichenbaum and Evans (2005) combined with prior density. Instead of relying on a correctly specified likelihood function, they define an approximate likelihood function and proceed with a random walk Metropolis-Hastings algorithm. Chernozhukov and Hong (2003) establish that such an approach has a frequentist justification in a more general framework and call it a Laplace-type estimator (LTE) or quasi-Bayesian estimator.1 The quasi-Bayesian approach does not require the complete specification of likelihood functions and may be robust to potential misspecification. Other applications of LTEs to estimate macroeconomic models include Christiano, Eichenbaum and Trabandt (2013), Kormilitsina and Nekipelov (2013) and Gemma, Kurozumi and Shintani (2017). When two or more competing models are available, it is of great interest to select one model for policy analysis. When competing models are estimated by Bayesian methods, the models are often compared by their marginal likelihood. Likewise, it is quite intuitive to compare models estimated by LTE using the “marginal likelihood” obtained from the LTE criterion function. In fact, Christiano, Eichenbaum and Trabandt (2013, Table 4) report the marginal likelihoods from LTE when they compare the performance of their macroeconomic model of wage bargaining with that of a standard labor search model. In this paper, we prove that such practice is asymptotically valid in that a model with a larger value of its marginal likelihood is either correct or a better approximation to true impulse responses with probability approaching one as the sample size goes to infinity. 1

The term “quasi-Bayesian” also refers to the procedure that involves data-dependent prior or multiple

priors in the Bayesian literature.

2

We consider the consistency of model selection based on the marginal likelihood in three cases: (i) parameters are all strongly identified; (ii) some parameters are weakly identified; and (iii) some model parameters are partially identified. While case (i) is standard in the model selection literature (e.g., Phillips, 1996; Sin and White, 1996), cases (ii) and (iii) are also empirically relevant because some parameters may not be strongly identified in macroeconomic models (see Canova and Sala, 2009). We consider the case of weak identification using a device that is similar to Stock and Wright (2000) and Guerron-Quintana, Inoue and Kilian (2013). We also consider the case in which parameters are set identified as in Chernozhukov, Hong and Tamer (2007) and Moon and Schorfheide (2012). Our approach allows for model misspecification and is similar in spirit to the Bayesian model selection procedure considered by Schorfheide (2000). Instead of using the marginal likelihoods (or the standard posterior odds ratio) directly, Schorfheide (2000) introduces the VAR model as a reference model in the computation of the loss function so that he can compare the performance of possibly misspecified dynamic stochastic general equilibrium (DSGE) models in the Bayesian framework. The related DSGE-VAR approach of Del Negro and Schorfheide (2004, 2009) also allows DSGE models to be misspecified, which results in a small weight on the DSGE model obtained by maximizing the marginal likelihood of the DSGE-VAR model. An advantage of our approach is that we can directly compare the (quasi-) marginal likelihoods even if all the competing DSGE models are misspecified.2 The econometric literature on comparing DSGE models include Corradi and Swanson (2007), Dridi, Guay and Renault (2007) and Hnatkovska, Marmer and Tang (2012) who propose hypothesis testing procedures to evaluate the relative performance of possibly misspecified DSGE models. We propose a model selection procedure as in Fernandez-Villaverde and Rubio-Ramirez (2004), Hong and Preston (2012) and Kim (2014). In the likelihood framework, Fernandez-Villaverde and Rubio-Ramirez (2004) and Hong and Preston (2012) consider asymptotic properties of the Bayes factor and posterior odds ratio for model comparison, respectively. In the LTE framework, Kim 2

As established in White (1982), desired asymptotic results can be often obtained even if the likelihood

function is misspecified. The quasi-Bayesian approach is also closely related to the limited-information likelihood principle used by Zellner (1998) and Kim (2002) among others.

3

(2014) shows the consistency of the quasi-marginal likelihood criterion for nested model comparison, to which Hong and Preston (2012, p.365) also allude. In a recent paper, Shin (2014) proposes a Bayesian generalized method of moments (GMM) and develops a novel method for computing the marginal likelihood. In this paper, we make general contributions in three ways. First, we show that the naive quasi-marginal likelihood model selection criterion may be inconsistent when models are not nested. This is why the existing literature, such as Kim (2014), focuses on the nested case. Second, we develop a new modified quasi-marginal likelihood model selection criterion which remains consistent when nonnested models are considered. Third, we consider cases in which some parameters are either weakly or partially identified. The weakly and partially identified cases are relevant for the estimation of DSGE models but have not been considered in the existing literature. The outline of this paper is as follows. We begin our analysis by providing a simple illustrative example of model selection in Section 2. Asymptotic justifications for the quasi-marginal likelihood model selection criterion are established in Section 3. Computational issues of the quasi-marginal likelihood are discussed in Section 4. A small set of Monte Carlo experiments is provided in Section 5. Some discussions regarding the practical implementation of our procedure are provided in Section 6. Empirical applications of our procedure to evaluate New Keynesian macroeconomic models using US data are provided in Section 7. The concluding remarks are made in Section 8. All proofs are relegated to the appendix. Throughout the paper, all the asymptotic statements are made for the case in which the sample size tends to infinity or T → ∞.

2

An Illustrative Example

There are several issues that may arise in model selection. For example, one may compare a correctly specified model and a misspecified model. Or one may compare two correctly specified models where one is more parsimonious than the other. Parameters may be all strongly identified, or some parameters may be weakly or partially identified. To motivate our proposed quasi-marginal likelihood, we illustrate these issues in a simple Monte Carlo setup and show that comparing values of estimation criterion

4

functions alone does not necessarily select the preferred model. Consider a simplified version of the model in Canova and Sala (2009): yt = Et (yt+1 ) − σ(Rt − Et (πt+1 )) + u1t ,

(1)

πt = Et (πt+1 ) + κyt + u2t ,

(2)

Rt = Et (πt+1 ) + u3t ,

(3)

where yt ,πt and Rt are output gap, inflation rate, nominal interest rate, respectively, and u1t ,u2t and u3t are independent iid standard normal random variables, which respectively represents a shock to the output Euler equation (1), New Keynesian Phillips curve (NKPC) (2), and monetary policy function (3). Et (·) = E(·|It ) is the conditional expectation operator conditional of It , the information set at time t, σ is the parameter of elasticity of intertemporal substitution, and κ is the slope of the NKPC. Because a solution is



yt





1 0

−σ



u1t

         πt  =  κ 1 −σκ   u2t     Rt 0 0 1 u3t we have covariance restrictions:

   , 

(4)

3



1 + σ2

κ + κσ 2

−σ



    Cov([yt πt Rt ]0 ) =  κ + σ 2 κ 1 + κ2 + σ 2 κ2 −σκ  .   −σ −σκ 1

(5)

To illustrate the issues of misspecification and identification, we consider six cases in each of which two models, model A and model B, are compared. In the first two cases, case 1 and case 2, suppose that we use f (σ, κ) = [1 + σ 2 , κ + σ 2 κ, −σ, 1 + κ2 + σ 2 κ2 , −σκ]0 ,

(6)

and the corresponding five elements in the covariance matrix of the three observed variables, where we set σ = 1 and κ = 0.5. In these two cases, the parameters are globally and locally identified. In case 1, the two parameters are estimated in model A, while σ is estimated and the value of κ is set to a wrong parameter value, 1, in 3

While there is no unique solution to this model, we simply use a solution from Canova and Sala (2009).

This fact does not cause any problem in our minimum distance estimation exercise based on (5).

5

model B. In other words, model A is correctly specified and model B is incorrectly specified. In case 2, only one parameter (σ) is estimated and the value of κ is set to the true parameter value in model A, while the two parameters are estimated in model B. Although the two models are both correctly specified in this design, model A is more parsimonious than model B. In the next two cases, case 3 and case 4, we use f (σ, κ) = [κ + σ 2 κ, 1 + κ2 + σ 2 κ2 , −σκ]0

(7)

and the corresponding three elements of the covariance matrix are used. As κ approaches zero, the strength of identification of σ becomes weaker. We set σ = 1 and κ = 0.5. Cases 3 and 4 correspond to cases 1 and 2. In case 3, model B is incorrectly specified in that κ is set to 1. In case 4, the two models are both correctly specified and model A is more parsimonious than model B. In the last two cases, cases 5 and 6, the parameters are partially identified in that we estimate α and ζ and the restrictions depend on them only through κ = (1 − α)(1 − 0.99α)ζ/α. We use the five restrictions used in cases 1 and 2, (6), and we set σ = 1, α = 0.5 and ζ = 1 so that κ ≈ 0.5 as in case 1. In case 5, two parameters, α and ζ, are estimated in model A, while the value of σ is set to the correct value, 1, in model A and is set to an incorrect value, 0.5, in model B. In case 6, only α and ζ are estimated while the value of σ is set to the true value in model A, whereas the three parameters are all estimated in model B. Note that in each of the six designs, model A is always preferred to model B because model A is correctly specified in cases 1, 3 and 5 and is more parsimonious in cases 2, 4 and 6. Table 1 summarizes the six cases as well as the parameter values used in the Monte Carlo simulation experiments. Suppose that we employ a classical minimum distance (CMD) estimator and choose the model with a smaller estimation criterion function. Table 2 shows the frequencies of selecting the right model (model A) when one selects a model with a smaller value of the minimized estimation criterion function. The number of Monte Carlo replications is 1,000 and the sample sizes are 50, 100 and 200. The column labeled “Diagonal” indicates the selection probabilities when the diagonal weighting matrix whose diagonal elements are the reciprocals of the bootstrap variances of the sample analogs of the

6

restrictions. The column labeled “Optimal” indicates those when the weighting matrix is the inverse of the bootstrap covariance matrix of the sample analog of the restrictions. This table shows that although this intuitive procedure tends to select the correctly specified models over the incorrectly specified models in designs 1, 3 and 5, it is likely to select an overparameterized model if the two models have equal explanatory power as in population as in cases 2, 4 and 6. Our proposed quasi-marginal likelihood model selection criteria overcomes this issue as formally shown in the next section.

3

Asymptotic Theory

3.1

Quasi-Marginal Likelihood for Extremum Estimators

First, we propose quasi-marginal likelihood model selection criteria in the general framework. Following Chernozhukov and Hong (2003), define the quasi-posterior by R

e−T qˆA,T (α) πA (α) , −T qˆA,T (α) π (α)dα A Ae

(8)

where πA (α) is the prior probability density function and qˆA,T (α) is the estimation criterion function that would be minimized in the conventional frequentist estimation method. By treating (8) as the posterior, their LTE (e.g., quasi-posterior mean, median and mode) is obtained via Markov Chain Monte Carlo (MCMC) and is particularly useful when the criterion function qˆA,T (α) is not numerically tractable or when extremum estimates are not reasonable. We propose quasi-marginal likelihoods for selecting a model and establish the consistency of the model selection based on QMLs. Define the quasi-marginal likelihood for model A by Z mA = e−T qˆA,T (α) πA (α)dα.

(9)

A

Similarly, define the quasi-marginal likelihood for model B by Z mB = e−T qˆB,T (β) πB (β)dβ.

(10)

B

Let qA (α) and qB (β) denote the population analog of qˆA,T (α) and qˆB,T (β), and define the (possibly pseudo) true parameter values of α and β by α0 = argminα∈A qA (α) and β0 = argminβ∈B qB (β), respectively.

7

We say that the quasi-marginal likelihood model selection criterion is consistent if the following property holds: mA > mB with probability approaching one if qA (α0 ) < qB (β0 ) or if qA (α0 ) = qB (β0 ) and pA < pB . For example, if two models are nested and some parameter values are fixed at their true value in one of the models, qA (α0 ) = qB (β0 ) holds and the model with the fixed parameter value is preferable because it reduces parameter estimation uncertainty. This definition is common in the literature on selection of parametric models (see Leeb and P¨otscher, 2009; Nishii, 1988; and Inoue and Kilian, 2006 to name a few), and the model selection criteria, such as those in Nishii (1988), Sin and White (1996) and Hong and Preston (2012), are designed to be consistent in this sense.

4

Hnatkovska, Marmer and Tang (2012) develop a Vuong-type quasi-likelihood ratio test for comparing macroeconomic models estimated by CMD estimators, i.e., qA (α) = (γ − f (α))0 WA (γ − f (α)) where γ is a vector of impulse responses obtained from a structural VAR model and f (α) is a vector of impulse responses obtained from a DSGE model given structural parameter value α. They consider cases of (i) nested, (ii) strictly non-nested and (iii) overlapping models. Let F = {γ ∈
= {γ ∈
definition, we say that models A and B are nested if F ⊂ G or G ⊂ F, strictly nonnested if F ∩ G = ∅ and overlapping if they are neither nested or strictly nonnested. When the models are equally (in)correctly specified in terms of matching IRFs, i.e., qA (α0 ) = qB (β0 ), Hnatkovska et al. (2012) show that qˆA,T (α ˆ T ) − qˆB,T (βˆT ) = Op (T −1 ) 1

if the models are nested and that qˆA,T (ˆ αT ) − qˆB,T (βˆT ) = Op (T − 2 ) if they are strictly nonnested or overlapping under some primitive assumptions.5 When two models that are not nested have equal fit, i.e., qA (α) = qB (β), one may still prefer a more parsimonious model based on Occam’s razor or if a selected model is to be used for forecasting (Inoue and Kilian, 2006). For that purpose we propose the following modified quasi-marginal likelihood: m ˜ A = mA e(T −



m ˜ B = mB e(T − 4

T )ˆ qA,T (α ˆT )



T )ˆ qB,T (βˆT )

,

(11)

.

(12)

One could also call a model selection criterion consistent if mA > mB with probability approaching one

if qA (α0 ) < qB (β0 ). Our model selection criterion is also consistent in this sense. 5 Technically, even in the overlapping case, if f (α0 ) = g(β0 ) we have qˆA,T (α ˆ T ) − qˆB,T (βˆT ) = Op (T −1 ).

8

The modified quasi-marginal likelihood effectively replaces e−T qˆA,T (αˆ T ) in the Laplace √

approximation by e−

T qˆA,T (α ˆT ) ,

and remains consistent for both nested and nonnested

models. We consider two cases. Some of the parameters may be weakly identified in the first case while some parameters may be partially identified in the second case. Assumption 1 (a) A is compact in 0, qbA (α) and qA (α) are twice continuously differentiable in αs ∈ int(As ), supα∈A |b qA,T (α) − qA (α)| = op (1), supα∈A k∇αs qbA,T (α) − ∇αs qA (α)k = op (1) and supα∈A k∇2αs qbA,T (α) − ∇2αs qA (α)k = op (1) [If pBs > 0, qbB (β) and qB (β) qB,T (β) − qB (β)| = are twice continuously differentiable in βs ∈ int(Bs ), supβ∈B |b op (1), supβ∈B k∇βs qbB,T (β) − ∇βs qB (β)k = op (1) and supβ∈B k∇2βs qbB,T (β) − ∇2βs qB (β)k = op (1)]. (c) If pAs > 0, πAs (αs ) is continuous at αs,0 and πAs (αs,0 ) > 0. [If pBs > 0, πBs (βs ) is continuous at βs,0 and πBs (βs,0 ) > 0]. Assumption 1(b) requires uniform convergence of qˆA,T (·), ∇ˆ qA,T (·) and ∇2 qA,T (·) to qA (·), ∇qA (·) and ∇2 qA (·), respectively, which holds under more primitive assumptions, such as the compactness of the parameter spaces, pointwise convergence and stochastic equicontinuity (see Theorem 1 of Andrews, 1992). It is well-known that some parameters of DSGE models may not be strongly identified. See Canova and Sala (2009), for example. It is therefore important to investigate asymptotic properties of our model selection procedure in case some parameters are not strongly identified. To allow for some weakly identified parameters, we impose the following assumptions: Assumption 2 (Weak Identification) (a) qA (α) = qAs (αs ) + T −1 qAw (α) if pAs > 0 and qA (αw ) = T −1 qAw (αw ) if pAs = 0 where qAw (·) is Op (1) uniformly in α ∈ A. [qB (β) = qBs (βs ) + T −1 qBw (β) if pBs > 0 and qB (βw ) = T −1 qBw (βw ) if pBs = 0 where qBw (·) is Op (1) uniformly in β ∈ B].

9

(b) If pAs > 0, then there exists αs,0 ∈ int(As ) such that for every  > 0 inf

αs ∈As :kαs −αs,0 k≥

qAs (αs ) > qAs (αs,0 )

[If pBs > 0, then there exists βs,0 ∈ int(Bs ) such that for every  > 0 inf

βs ∈Bs :kβs −βs,0 k≥

qBs (βs ) > qBs (βs,0 )].

(c) If pAs > 0, the Hessian ∇2αs qAs (αs,0 ) is positive definite [ If pBs > 0, the Hessian ∇2βs qBs (βs,0 ) is positive definite]. Remarks. 1. Assumptions 1 and 2 are high-level assumptions, and sufficient and lower-level assumptions for GMM and CMD estimators are provided in the next subsection. 2. Typical prior densities are continuous in macroeconomic applications, and Assumption 1(c) is likely to be satisfied. 0 ]0 . 3. Assumption 2(a) postulates that αw is weakly identified while αs where α = [αs0 αw

Note that we allow for cases in which the parameters are all strongly identified as well as cases in which they are all weakly identified. When there is a strongly identified parameter, Assumption 2(b) requires that its true parameter value αs,0 uniquely minimizes the population estimation criterion function, and Assumption 2(c) requires that the second-order sufficient condition for minimization is satisfied. Theorem 1 (Weak Identification). Suppose that Assumptions 1 and 2 hold. (a) If qAs (αs,0 ) < qBs (βs,0 ), then mA > mB and m ˜A > m ˜ B with probability approaching one. (b) (Nested Case) If qAs (αs,0 ) = qBs (βs,0 ), pAs < pBs and qbA,T (b αT ) − qbB,T (βbT ) = Op (T −1 ), then mA > mB and m ˜A > m ˜ B with probability approaching one. (c) (Nonnested and Overlapping Cases) If qAs (αs,0 ) = qBs (βs,0 ), pAs < pBs and qbA,T (b αT ) − qbB,T (βbT ) = Op (T −1/2 ), then m ˜A > m ˜ B with probability approaching one. Remarks. 1. Theorem 1(a) shows that the proposed marginal likelihood model selection criterion selects the model with a smaller population estimation criterion function with probability approaching one. Theorem 1(b) implies that, if the minimized population criterion

10

functions take the same value, our model selection criterion will select the model with a fewer number of strongly identified parameters. In the special case where Model A is correctly specified and is a restricted version of Model B, our criterion will select Model A provided the restriction is imposed on a strongly identified parameter. This is because the (quasi-)marginal likelihood has a built-in penalty term for parameters that are not necessary for reducing the population criterion function as can be seen in the Laplace approximation of the marginal likelihood. 2. This consistency result applies whether or not the models are correctly specified or misspecified. If one model is correctly specified in that its minimized population criterion function is zero, while the other model is misspecified in that its minimized population criterion function is positive, our model selection criterion will select the correctly specified model with probability approaching one. Arguably, it may still make sense to minimize the criterion function even when two models are misspecified. Our model selection criterion will select the better approximating model with probability approaching one. 3. When the models are not nested and qA (αs,0 ) = qB (βs,0 ), Theorem 1(c) shows that the marginal likelihood does not necessarily select a more parsimonious model even asymptotically, This is consistent with Hong and Preston’s (2012) result on BIC. Although this may not be a major concern when the models are nonnested,‘ the modified quasi-marginal likelihood selects a parsimonious model even when the nonnested models satisfy qA (αs,0 ) = qB (βs,0 ). The modified quasi-marginal likelihood is less powerful if qA (αs,0 ) < qB (βs,0 ), however. We will investigate this trade-off in Monte Carlo experiments. Next we consider cases in which some parameters may be partially identified. We say that the parameters are partially identified if A0 = {α0 ∈ A : qA (α0 ) = min qA (α)} α∈A

consists of more than one points (see Chernozhukov, Hong and Tamer, 2007). Moon and Schorfheide (2012) lists macroeconometric examples in which this type of identification arises. Similarly, we define B0 = {β0 ∈ B : qB (β0 ) = min qB (β)}. β∈B

11

In addition to Assumption 1, we impose the following assumptions. Assumption 3 (Partial Identification) (a) There exists A0 ⊂ A such that, for every α0 ∈ A0 and  > 0 inf

α∈(Ac0 )−

qA (α) > qA (α0 ),

where (Ac0 )− = {α ∈ A : d(α, A0 ) ≥ } and d(α, A0 ) = inf a∈A0 kα−ak [ There exists B0 ⊂ B such that, for every β0 ∈ B0 and  > 0 inf

α∈(Ac0 )−

qA (α) > qA (α0 ),

where (Bc0 )− = {β ∈ B : d(β, B0 ) ≥ } and d(β, B0 ) = inf b∈B0 kβ − bk.] 0 , α0 ]0 ) is positive definite for some α (b) If pAs > 0, the Hessian ∇2αs qA ([αs,0 p,0 ∈ p,0 0 , β 0 ]0 ) is positive definite for some Ap,0 . [ If pBs > 0, the Hessian ∇2βs qB ([βs,0 p,0

βp,0 ∈ Bp,0 ]. (c)

R

πA (αp |αs,0 )dαp > 0 where πA (αp |αs ) is the prior density of αp conditional R on αs . [ Bp,0 πB (βp |βs,0 )dβp > 0 where πB (βp |αs ) is the prior density of βp conAp,0

ditional on βs ]. Remarks. Assumptions 3(a), (b) and (c) are generalization of Assumptions 2(b), 2(c) and 1(c), respectively, to sets. Theorem 2 (Partial Identification). (a) Suppose that Assumptions 1 and 3(a) hold. If minα∈A qA (α) < minβ∈B qB (β), then mA > mB and m ˜A > m ˜ B with probability approaching one. (b) (Nested Case) Suppose that Assumptions 1 and 3 hold. If qAs (αs,0 ) = qBs (βs,0 ), pAs < pBs and qbA,T (b αT ) − qbB,T (βbT ) = Op (T −1 ), then mA > mB and m ˜A > m ˜B with probability approaching one.

12

(c) (Nonnested and Overlapping Cases) Suppose that Assumption 3 holds. If qAs (αs,0 ) = qBs (βs,0 ), pAs < pBs and qbA,T (b αT ) − qbB,T (βbT ) = Op (T −1/2 ), then m ˜A > m ˜ B with probability approaching one. Remarks Theorem 3(a) shows that even in the presence of partially identified parameters, our criteria select a model with a smaller value of the population estimation objective function. This result occurs because it is the value of the objective function, not the parameter value, that matters to model selection.

3.2

Quasi-Marginal Likelihood for CMD Estimators

Since the extremum estimators include some class of important estimators popularly used in practice, it should be useful to describe a set of assumptions specific to each of the estimators. We first consider the CMD estimator, which has been used to estimate the structural parameters in DSGE models by matching the predicted impulse response function and the estimated impulse response function from the VAR models in the empirical macroeconomics. Suppose that we compare two DSGE models, models A and B. Models A and B are parameterized by structural parameter vectors, α ∈ A and β ∈ B, where A ⊂
1 cT (ˆ (ˆ γT − f (α))0 W γT − f (α)) 2

qˆB,T (β) =

1 cT (ˆ (ˆ γT − g(β))0 W γT − g(β)) 2

with respect to α ∈ A and β ∈ B , respectively, for models A and B, where γˆT is a k × 1 vector of structural impulse responses obtained from an estimated VAR model, cT is a k × k positive semidefinite weighting matrix.6 It should be noted that and W 6

J´ orda and Kozicki (2011) develop a projection minimum distance estimator that is based on restrictions

13

the condition for identifying structural IRFs must be satisfied in DSGE models. For example, if short-run restrictions are used to identify structural IRFs, the restrictions must be satisfied in the DSGE model. Otherwise IRF matching does not yield a consistent estimator and model selection based on IRF matching does not make sense. Let qA (α) = qB (β) =

1 (γ0 − f (α))0 W (γ0 − f (α)), 2 1 (γ0 − g(β))0 W (γ0 − g(β)), 2

(13) (14)

where γ0 is a vector of population structural impulse responses and W is a positive definite matrix. While our model selection depends on the choice of weighting matrices, if one is to cT needs to be set to the inverse of the calculate standard errors from MCMC draws, W asymptotic covariance matrix of γˆT , which eliminates the arbitrariness of the choice of the weighting matrix. When the optimal weighting matrix is not used, the formula in Chernozhukov and Hong (2002) and Kormilitsina and Nekipelov (2014) should be used to calculate standard errors. For CMD estimators, we make the following assumptions: Assumption 4. (a)



d

T (b γT − γ) → N (0k×1 , Σ) where Σ is positive definite.

(b) A is compact in 0, f (α) = fs (αs ) + T −1/2 fw (α), and there is a unique αs,0 ∈ int(As ) such that αs,0 = argminαs ∈As fs (αs )0 W fs (αs ). If pAs = 0, f (α) = T −1/2 fw (αw ) of the form of h(γ, α) = 0. While we could consider a quasi-Bayesian estimator based on such restrictions, we focus on the special case in which h(γ, α) = γ − f (α).

14

[If pBs > 0, g(β) = gs (βs ) + T −1/2 gw (β), and there is a unique βs,0 ∈ int(Bs ) such that βs,0 = argminβs ∈Bs gs (βs )0 W gs (βs ). If pBs = 0, g(β) = T −1/2 gw (βw ). (f) If pAs > 0, Fs (αs,0 )0 W Fs (αs,0 ) − [(γ − fs (αs,0 ))0 W ⊗ IpAs,0 ]

∂vec(Fs (αs,0 )0 ) 0 ∂αs,0

is positive definite [ If pBs > 0, Gs (βs,0 )0 W Gs (βs,0 ) − [(γ − gs (βs,0 ))0 W ⊗ IpBs ]

∂vec(Gs (βs,0 )0 ) ∂βs0

is positive definite]. (g) There is A0 = {α0 } × Ap,0 ⊂ A such that, for any α ∈ A0 , (γ − f (α))0 W (γ − f (α)) = minα∈A (γ − f (˜ α))0 W (γ − f (˜ α)) < (γ − f (¯ α))0 W (γ − f (¯ α)) for any ˜ α ¯ ∈ A ∩ Ac0 [There is B0 = {β0 } × Bp,0 ⊂ B such that, for any β ∈ B0 , ˜ 0 W (γ −g(β)) ˜ < (γ −g(β)) ¯ 0 W (γ −g(β)) ¯ (γ −g(β))0 W (γ −g(β)) = minβ∈B (γ −g(β)) ˜ for any β¯ ∈ B ∩ Bc0 ]. (h) If pAs > 0, there is αp,0 ∈ Ap,0 such that Fs (α)0 W Fs (α) − [(γ − f (α))0 W ⊗ IpAs ]

∂vec(Fs (α)0 ) , ∂αs0

0 α0 ]0 [ If p is positive definite at α = [αs,0 Bs > 0, there is βp,0 ∈ Bp,0 such that p,0

Gs (β)0 W Gs (β) − [(γ − g(β))0 W ⊗ IpBs ]

∂vec(Gs (β)0 ) , ∂βs0

0 β 0 ]0 ]. is positive definite at β = [βs,0 p,0

Remarks 1. The root T consistency and asymptotic normality of structural VAR impulse responses follow from stationary data and restrictions that point-identify structural impulse responses. 2. Assumption 4(e) follows Guerron-Quintana, Inoue and Kilian’s (2013) definition of weak identification in the minimum distance framework. The model selection based on the quasi-marginal likelihood computed from quasiBayesian CMD Estimators is justified by the following proposition.

15

Proposition 1. (a) Under Assumptions 4(a)–(f), Assumptions 1 and 2 hold. (b) Under Assumptions 4(a)–(d), (g) and (h), Assumptions 1 and 3 hold.

3.3

Quasi-Marginal Likelihood for GMM Estimators

Another important class of the estimator we consider is the GMM estimator. For GMM estimators, the criterion functions of models A and B are respectively given by 1 cA,T fT (α), fT (α)0 W 2 1 cB,T gT (β), qˆB,T (β) = gT (β)0 W 2 P P cA,T and W cB,T where fT (α) = (1/T ) Tt=1 f (xt , α), gT (β) = (1/T ) Tt=1 g(xt , β) and W qˆA,T (α) =

are k × k positive semidefinite weighting matrices. It should be noted that the condition for identifying structural IRFs must be satisfied in DSGE models. For example, if short-run restrictions are used to identify structural IRFs, the restrictions must be satisfied in the DSGE model. Otherwise IRF matching does not yield a consistent estimator and model selection based on IRF matching does not make sense. Let qA (α) = qB (β) =

1 E[f (xt , α)]0 WA E[f (xt , α)], 2 1 E[g(xt , β)]0 WB E[g(xt , β)], 2

(15) (16)

where WA and WB are positive definite matrices. For the GMM estimation, we impose the following assumptions: Assumption 5. P = Op (T −1/2 ), supα∈A kT −1 Tt=1 {(∂/∂α0 )f (xt , α)− P E[(∂/∂α0 )f (xt , α)]}k = op (1), and supα∈A kT −1 Tt=1 {(∂/∂α0 )vec((∂/∂α0 )f (xt , α)− P E[(∂/∂α0 )vec((∂/∂α0 )f (xt , α))]}k = op (1) [supβ∈B kT −1 Tt=1 {g(xt , β)−E[g(xt , β)]}k = P Op (T −1/2 ), supβ∈B kT −1 Tt=1 {(∂/∂β 0 )g(xt , β) − E[(∂/∂β 0 )g(xt , β)]}k = op (1),

(a) supα∈A kT −1

PT

t=1 {f (xt , α)−E[f (xt , α)]}k

16

and supβ∈B kT −1

PT

t=1 {(∂/∂β

0 )vec((∂/∂β 0 )g(x

t , β)−E[(∂/∂β

0 )vec((∂/∂β 0 )g(x

t , β))]}k

=

op (1)]. (b) A is compact in 0, E[f (xt , α)] = fs (αs ) + T −1/2 fw (α), and there is a unique αs,0 ∈ int(As ) such that αs,0 = argminαs ∈As fs (αs )0 WA fs (αs ). If pAs = 0, E[f (xt , α)] = T −1/2 fw (αw ) [If pBs > 0, E[g(xt , β)] = gs (βs ) + T −1/2 gw (β), and there is a unique βs,0 ∈ int(Bs ) such that βs,0 ∈ argminβs ∈Bs gs (βs )0 WB gs (βs ). If pBs = 0, E[g(xt , β)] = T −1/2 gw (βw )]. (f) If pAs > 0, Fs (αs,0 )0 WA Fs (αs,0 ) + [E(fs (xt , αs,0 ))0 WA ⊗ IpAs ]

∂vec(Fs (αs,0 )0 ) ∂αs0

is positive definite [ If pBs > 0, Gs (βs,0 )0 WB Gs (βs,0 ) + [(γ − gs (βs,0 ))0 WB ⊗ IpBs ]

∂vec(Gs (βs,0 )0 ) ∂βs0

is positive definite]. (g) There is A0 = {α0 }×Ap,0 ⊂ A such that, for any α ∈ A0 , E[f (xt , α)]0 WA E[f (xt , α)] = minα∈A E[f (xt , α ˜ )]0 WA E[f (xt , α ˜ )] < E[f (xt , α ¯ )]0 WA E[f (xt , α ¯ )] for any α ¯ ∈ A∩ ˜ Ac0 [There is B0 = {β0 }×Bp,0 ⊂ B such that, for any β ∈ B0 , E[g(xt , β)]0 WB E[g(xt , β)] = ˜ 0 WB E[g(xt , β)] ˜ < E[g(xt , β)] ¯ 0 WB E[g(xt , β)] ¯ for any β¯ ∈ B ∩ minβ∈B E[g(xt , β)] ˜ Bc0 ]. (h) If pAs > 0, there is αp,0 ∈ Ap,0 such that Fs (α)0 WA Fs (α) + [E(f (xt , α))0 WA ⊗ IpAs ]

17

∂vec(Fs (α)0 ) ∂αs0

0 α0 ]0 [ If p is positive definite at α = [αs,0 Bs > 0, there is βp,0 ∈ Bp,0 such that p,0

Gs (β)0 WB Gs (β) + [E(g(xt , β))0 WB ⊗ IpBs ]

∂vec(Gs (β)0 ) ∂βs0

0 β 0 ]0 . is positive definite at β = [βs,0 p,0

The model selection based on the quasi-marginal likelihood computed from quasiBayesian GMM Estimators is justified by the following proposition. Proposition 2. (a) Under Assumptions 5(a)–(f), Assumptions 1 and 2 hold. (b) Under Assumptions 5(a)–(d), (g) and (h), Assumptions 1 and 3 hold.

4

Computational Issues

In this section, we describe three methods for computing the quasi-marginal likelihood (hereafter QML): Laplace approximations, modified harmonic estimators of Geweke (1998) and Sims, Waggoner, and Zha (2008) and the estimator of Chib and Jeliazkov (2001). While these are standard methods for computing marginal likelihood in Bayesian analyses, we present these methods for practitioners who are interested in using the Laplace-type estimator. For the Laplace approximation, we evaluate the QML by e

−T qA (α) ˆ



T 2π

 k−pA 2

1 cT | 21 |∇2 qˆA,T (ˆ πA (α ˆ )|W α)|− 2 ,

(17)

at the quasi-posterior mode, α ˆ (here subscript T is omitted for notational simplicity). In our Monte Carlo experiment, we use 20 randomly chosen starting values for a numerical optimization routine to obtain the posterior mode. The weighting matrix of cT , can be either the consistent estimator of optimal CMD and GMM estimators, W weighting matrix or other matrices such as a diagonal matrix. In CMD estimation of IRF matching, it is quite common to use the diagonal weighting matrix whose diagonal elements are the reciprocal of diagonal of the bootstrap variances of impulse responses, because estimators based on the optimal weighting matrix can be often economically

18

implausible. We use 1,000 bootstrap replications to obtain the bootstrap covariance matrix of IRF estimators in the Monte Carlo experiments. For the modified harmonic mean estimators and the Chib-Jeliazkov estimator, we use the random walk Metropolis-Hastings algorithm as suggested in An and Schorfheide ˆ −1 ) where α(0) = α (2007). The proposal distribution is N (α(j−1) , cH ˆ , c = 0.3 for ˆ is the Hessian of the log-quasi-posterior evaluated at the j = 1, c = 1 for j > 1 and H quasi-posterior mode.7 The draw α from N (α(j−1) , c[∇2 qˆA,T (ˆ α)]−1 ) is accepted with probability min 1,

e−T qˆA,T (α) πA (α) e−T qˆA,T (α

(j−1) )

πA (α(j−1) )

! .

(18)

In each Monte Carlo iteration, we use the last 50% of 100,000 [50,000] draws in subsection 5.1 [resp. subsection 5.2] and to estimate the marginal likelihood. To satisfy the generalized information equality of Chernozhukov and Hong (2003), which is necessary for the validity of the MCMC method for the Laplace-type estimator, we c. need to use the inverse of the bootstrap covariance matrix of IRF estimators as W In section 5, however, we also consider a diagonal weighting matrix to investigate the sensitivity to the choice of weighting matrices. In the modified harmonic mean method, the QML is computed as the reciprocal of   w(α) , (19) E exp(−T qˆT (α)) πA (α) which is evaluated using MCMC draws, given an weighting function w(α). We consider two alternative choice of weighting functions which have been proposed in the literature. The first choice is suggested by Geweke (1999), who sets w(α) to be the truncated normal density w(α) =

˜ )0 V˜α−1 (α − α ˜ ) < χ2pA ,τ } exp[−(α − α ˜ )0 V˜α−1 (α − α ˜ )/2] 1{(α − α , τ (2π)pA /2 |V˜α |1/2

where α ˜ is the quasi-posterior mean, V˜α is the quasi-posterior covariance matrix, 1{·} is an indicator function, χ2pA ,τ is the 100τ th percentile of the chi-square distribution with pA degrees of freedom, and τ ∈ (0, 1) is a constant. The second choice is the one proposed by Sims, Waggoner, and Zha (2008). They point out that Geweke’s (1999) method may not work well when the posterior distribution is non-elliptical, and suggest 7

When the parameters are partially identified, we set c = 0.001 to increase the acceptance rate.

19

an weighting function given by w(α) =

Γ(pA /2) p 2π A /2 |Vˆα |1/2

f (r) 1{−T qˆ(α) + ln πA (α) > L1−q } , rpA −1 τ¯

where Vˆα is the second moment matrix centered around the quasi-posterior mode α ˜, f (r) = [vrv−1 /(cv90 /0.9 − cv1 )] 1{c1 < r < c90 /(0.9)1/v }, v = ln(1/9)/ ln(c10 /c90 ), r = [(α − α ˆ )0 Vˆα−1 (α − α ˆ )]1/2 , cj is the jth percentile of the distance r, L1−q is the 100(1 − q) percentile of the log quasi-posterior distribution, q ∈ (0, 1) is a constant, and τ is the quasi-posterior mean of 1{−T qˆ(α) + ln πA (α) > L1−q }1{c1 < r < c90 /(0.9)1/v }. Following Herbst and Schorfheide (2015), we consider τ = 0.5 and 0.9 in the estimator of Geweke (1999) and q = 0.5 and 0.9 in the estimator of Sims, Waggoner, and Zha (2008). For the estimator of Chib and Jeliazkov (2001), the log of the QML is evaluated by ln πA (˜ α) − T qˆA,T (˜ α) − ln pˆA (˜ α) where pˆA (α) =

PJ

(j) ˜ )φ (j) ˜ (α ) j=1 r(α , α α,c ˜ 2Σ , P (k) ) (1/K) K r(˜ α , α k=1

(1/J)

(20)

(21)

˜ and r(˜ φα,c α, c2 Σ) α, α(k) ) is the acceptance probability of moving ˜ (·) is the pdf of N (˜ ˜ 2Σ α ˜ to α(k) in the Metropolis Hasting algorithm. The numerator of (21) is evaluated using the last 50% of MCMC draws and the denominator is evaluated using α(k) from ˜ In our Monte Carlo experiment, K is set to 50,000 [25,000] in section 5.1 N (˜ α, c2 Σ). ˜ is either set to the one used in the proposal [ resp. section 5.2] so that K = J. c2 Σ density or estimated from the posterior draws. The modified QML (11) requires minimizing the estimation criterion function, which may defeat the purpose of using the quasi-Bayesian approach.

Instead we

approximate it by averaging the value of the log of the quasi-posterior density over quasi-posterior draws: E[log(πA (α)) − T qbA,T (α)],

(22)

where the expectation is with respect to the quasi-posterior draws. This is computationally tractable because it can be calculated from MCMC draws. Because the quasiposterior distribution will concentrate around α b asymptotically, and the log prior is

20

O(1) and does not affect the divergence rate of the modified QML, the resulting modified marginal likelihood remain consistent as analyzed in the previous section. Our Monte Carlo results show that this approximation works well.

5

Monte Carlo Experiments

We investigate the small-sample properties of the quasi-marginal likelihood (QML) using the static model in section 2 and using a small-scale DSGE model.

5.1

The Static Model

Using the DGP in section 2, we estimate the QML of each model by four methods: the Laplace approximation, the modified harmonic mean estimator of Geweke (1998), Sims, Waggoner and Zha’s (2008) estimator, and the estimator of Chib and Jeliazkov (2001). For each of the four estimators, we consider the diagonal weighting matrix and the optimal weighting matrix and our proposed modified version of the marginal likelihood. In addition, we also consider Hong and Preston’s (2012) model selection criterion. As in Table 1, the number of Monte Carlo replications is set to 1,000, the number of randomly chosen initial values for numerical optimization is set to 20, and the sample sizes are 50, 100 and 200. The flat prior is used for all the parameters, and the number of MCMC draws is set to 100,000. Tables 3 and 4 report the actual frequencies of selecting model A. In design 1, our proposed criteria tend to select the correctly specified model (model A) over the incorrectly specified model (model B) regardless of the methods used to estimate the QML. The probabilities of selecting model A based on the modified QML are smaller than those based on the unmodified QML. This is because the modification halves the divergence rate of the QML in these cases. In design 2, the frequencies of choosing the more parsimonious model (model A) based on the QML are not as high as those in design 1 because the consistency depends on logarithmic rates as opposed to linear rates in terms of the log QML. The modified QML performs better than the unmodified QML in this design. Because the models are both correctly specified, the less parsimonious model always has a smaller estimation criterion value, the leading term in Laplace approximations, in finite samples. Our modification halves such effects and makes the

21

logarithmic divergence rate of the second term more effective. The optimal weighting matrix performs better than the diagonal weighting matrix in designs 1 and 2. Using more impulse responses enhances the performance of the QML model selection in the first two designs. The probability of selecting model A also tends to approach one as the sample size increases, even when identification is weak (designs 3 and 4) and when some parameters are partially identified (designs 5 and 6). In the last four designs, the diagonal weighting matrix tends to perform better than the optimal weighting matrix, and using a fewer impulse responses improves the performance of the QML model selection. This is probably because identification is not strong in these designs. The proposed modified QML outperforms the Hong and Preston (2012) criterion in the first three designs and in the fourth design for sufficiently large sample sizes. The Hong and Preston (2012) criterion cannot be computed in the last two designs in which the parameters are partially identified. Finally we investigate the accuracy of the QML estimates. The first five panels of Table 5 report the means and standard deviations of 100 marginal likelihood estimates for model A given a realization of data as well as the value of marginal likelihood obtained from numerical integration.8 The results are based on the optimal weighting matrix. Those based on the diagonal weighting matrix are similar and thus are omitted. In the first four designs, the differences among the means in each row are very small and are very close to the values based on numerical integration, if any. In the last design, although the differences among the estimates based on MCMC are not significant, they are different from the numerically integrated value. Because the algorithms are all based on the asymptotically singular covariance matrix, the estimates may not be accurate. The last panel of the table reports the marginal likelihood of model B in design 5 and shows that the marginal likelihood estimates are biased in the same direction. Even though individual marginal likelihood estimates are not accurately estimated, the sign of the differences are accurate and that explains why the right model is chosen with probability one in Tables 4. 8

Model A in design 6 is identical to model A in design 5 and thus the results for design 6 are omitted.

22

5.2

The Small-Scale DSGE Model

Next, we consider the small-scale DSGE model considered in Guerron-Quintana, Inoue and Kilian (2016) that consists of πt = κyt + βE(πt+1 |It−1 ),

(23)

Rt = ρr Rt−1 + (1 − ρr )φπ πt + (1 − ρr )φy yt + ξt , 1 yt = E(yt+1 |It−1 ) − (E(Rt |It−1 ) − E(πt+1 |It−1 ) − zt ) , τ

(24) (25)

where yt , πt and Rt denote the output gap, inflation rate, and nominal interest rate, respectively, and It denotes the information set at time t. The technology and monetary policy shocks follow zt = ρz zt−1 + σ z εzt ,

(26)

ξt = σ r εrt ,

(27)

where εzt and εrt are independent iid standard normal random variables. Note that the timing of the information is nonstandard, e.g., E(πt+1 |It−1 ) instead of E(πt+1 |It ) in the NKPC. The idea behind these information restrictions is to capture that the economy reacts slowly to a monetary policy shock while it reacts contemporaneously to technology shocks. Specifically, inflation does not contemporaneously reacts to monetary policy shocks but it does to technology shocks in this model. We impose such (recursive) short-run restrictions to identify VAR-IRFs. In the data generating process, we set κ = 0.025, τ = 1, β = 0.99, φπ = 1.5, φy = 0.125, ρr = 0.75, ρz = 0.90, σ z = 0.30, σ r = 0.20 as in Guerron-Quintana et al. (2016). We consider four cases. In cases 1 and 3, κ, τ and ρr are estimated in model A, and κ and ρr are estimated in model B with τ = 3. The other parameters are set to the true parameter values. In cases 2 and 4, τ and ρr are estimated in model A with κ set to its true parameter value and κ, τ and ρr are estimated in model B. In other words, model B is misspecified in cases 1 and 3. and model A is more parsimonious than model B in cases 2 and 4. We use a bivariate VAR(p) model of inflation and the nominal interest rate to estimate structural impulse responses. To identify structural impulse responses, we use the short-run restriction that inflation does not respond to the monetary policy

23

shock contemporaneously, which is satisfied in the above model. In cases 1 and 2, all the structural impulse responses up to horizon H are used in LTE. In cases 3 and 4, only the structural impulse responses to the technology shock (up to horizon H) are used. We use the AIC to select the VAR lag order where p is selected from {H, H + 1, ..., [5(T / ln(T ))0.25 ]} where [x] is the integer part of x. We set the lower bound on p to H. When p is smaller than H, the asymptotic distribution of VAR-IRFs is singular and our theoretical results do not hold. See Guerron-Quintana et al. (2016) on inference on VAR-IRFs in such cases. We consider T = 50, 100, 200 and H = 2, 4, 8. The number of Monte Carlo simulations is set to 1,000, the number of random-walk Metropolis-Hasting draws is 50,000, the number of bootstrap draws for computing the weighting matrix is 1,000. Tables 6, 7, 8 and 9 report the probabilities of selecting the right model in cases 1, 2, 3 and 4, respectively. As in Table 2, Tables 7 and 9 show that the method of selecting a model based on the value of the estimation criterion function performs poorly when two models are both correctly specified and one model is more parsimonious than the other The QML model selection tends to perform better with a fewer impulse responses. The optimal weighting matrix tends to perform better than the diagonal weighting matrix when one model is correctly specified and the other model is misspecified as in Tables 6, 8 and 9. Especially for small samples, using the IRFs to the technology shock only helps select the right model especially for the QML. These tables show that the different methods do not produce a substantial or systematic difference in the performance of the QML model selection method. To shed light on the accuracy of QML estimates further, we report the means and standard deviations of 100 QML estimates from a realization of data in Tables 10 and 11. As in Table 5, the differences across the methods tend to be small.

6

Discussions

Although our method is applicable to more general problems, one main motivation for our proposed quasi-marginal likelihood (QML) model selection is IRF matching. IRF matching is a limited-information approach and is an alternative to full-information likelihood based approaches, such as MLE and Bayesian approaches. There is a tradeoff

24

between efficiency and robustness. That is, estimators based on the full-information likelihood are more efficient when the likelihood function is correctly specified, while IRF matching estimators may be more robust to potential misspecification that does not affect the IRF, such as misspecification of distributional forms. In addition to this usual tradeoff, IRF matching has very particular features when it is used to estimate DSGE models. In this section, we discuss these features and provide guidance to practitioners. (a) Bayesian and frequentist inferential frameworks for IRF matching: Like any other extremum estimators, it is sometimes numerically challenging to optimize the IRF matching estimation criterion function especially when it is highly nonlinear. The use of priors not only alleviates such numerical challenges but also gives a limited-information Bayesian interpretation to IRF matching. One can interpret quasiBayesian IRF matching in a limited-information Bayesian inferential framework (Kim, 2002; Zellner, 1998). Because the QML is a functions of data, which model is selected is also a function of the data. Because the quasi-posterior distribution is conditional on the data and the result of model selection is a function of the conditioning set, the posterior distribution remains the same. Therefore, there is no issue of post model selection inference in the limited-information Bayesian inferential framework. See Dawid, 1994, in the full-information Bayesian inferential framework. It is often useful to have a frequentist interpretation of Bayesian estimators, especially if the practitioner is not strictly a Bayesian. Chernozhukov and Hong (2002) provides consistency and asymptotic normality of LTE that includes IRF matching estimators as a special case. Our paper shows the consistency of the QML model selection criterion in a frequentist inferential framework. Unlike in the limited-information Bayesian inferential framework, when we interpret inference based on the model selected by our model selection criterion in a frequentist inferential framework, it is likely to suffer from the problem with post model selection inference (Leeb and P¨otscher, 2005, 2009). This is a typical problem with model selection criteria including those that are consistent. (b) Identification of structural IRFs: To implement IRF matching, VAR-IRFs and DSGE-IRFs must be identical. The number of structural shocks in a DSGE model

25

must be at least as large as the number of observed variables to avoid the problem of stochastic singularity. Consider two cases. In the first case, the number of structural shocks in a DSGE model and the number of observed variables are the same. Fernandez-Villaverde et al. (2007) find a sufficient condition for the two sets of structural IRF to match. One of their conditions is that matrix D in the measurement equation is square and is nonsingular. Even when this condition is not satisfied (e.g., no measurement error), there are cases in which the two structural IRFs coincide. For example, consider xt+1 = Axt + But+1 ,

(28)

yt+1 = Cxt ,

(29)

where xt is a n × 1 vector of state variables, yt and ut are k × 1 vectors of observed variables and economic shocks, respectively, and ut is Gaussian white noise vector with zero mean and covariance matrix Ik . Provided the eigenvalues of A are all less than unity in modulus, yt has an M A(∞) representation: yt = C(I − AL)−1 But = CBut + CABut−1 + CA2 But−2 + · · ·

(30)

and it is invertible. Thus, the structural IRFs, CB, CAB, CA2 B, ..., can be obtained from a VAR(∞) process together with the short-run restriction that the impact matrix is given by CB. In practice, the VAR(∞) process is approximated by a finite-order VAR(p) model where p is obtained by AIC, for example. If different identification conditions are imposed, VAR-IRFs and DSGE-IRFs will be different and thus IRF matching will not work. In the IRF matching literature it is quite common to build a DSGE model in such a way that CB is lower triangular so that the recursive identification condition can be used to identify structural IRFs from a VAR model. See the second DGP in section 5. Many DSGE models do not satisfy typical conditions for identifying structural impulse responses. There are at least two approaches. First, one can match IRFs without matching the impact period matrix. Let Aj denote the jth step ahead structural impulse response matrix implied by a DSGE model with A0 being the impact matrix. Let Bj denote the jth step ahead reduced-form impulse response matrix obtained from a

26

VAR model. Then Σ = A0 A00 , B j A0 = Aj ,

(31) (32)

Then these constraints can be written as g(γ, θ) = 0,

(33)

where γ is a vector that consists of the elements of Bj ’s and the distinct elements of Σ and θ is a vector of DSGE parameters. Technically, this is not IRF matching because the impact period impulse responses are not matched. In the second approach, one can match moments (e.g., Andreasen et al., 2016; Kormilitsina and Nekipelov, 2016). The number of structural shocks in a DSGE model must be at least as large as the number of observed variables to avoid the problem of stochastic singularity. Consider two cases. In the first case, the number of structural shocks in a DSGE model and the number of observed variables are the same. Fernandez-Villaverde et al. (2007) find a sufficient condition for the two sets of structural IRF to match. One of their conditions is that matrix D in the measurement equation is square and is nonsingular. Even when this condition is not satisfied (e.g., no measurement error), there are cases in which the two structural IRFs coincide. For example, consider xt+1 = Axt + But+1 ,

(34)

yt+1 = Cxt ,

(35)

where xt is a n × 1 vector of state variables, yt and ut are k × 1 vectors of observed variables and economic shocks, respectively, and ut is Gaussian white noise vector with zero mean and covariance matrix Ik . Provided the eigenvalues of A are all less than unity in modulus, yt has an M A(∞) representation: yt = C(I − AL)−1 But = CBut + CABut−1 + CA2 But−2 + · · ·

(36)

and it is invertible. Thus, the structural IRFs, CB, CAB, CA2 B, ..., can be obtained from a VAR(∞) process together with the short-run restriction that the impact matrix is given by CB. In practice, the VAR(∞) process is approximated by a finite-order VAR(p) model where p is obtained by AIC, for example. If different identification

27

conditions are imposed, VAR-IRFs and DSGE-IRFs will be different and thus IRF matching will not work. In the IRF matching literature it is quite common to build a DSGE model in such a way that CB is lower triangular so that the recursive identification condition can be used to identify structural IRFs from a VAR model. See the second DGP in section 5. In the second case, there are more structural shocks than observed variables. In this case, the true DSGE-IRFs cannot be recovered by VAR-IRFs. VAR-IRFs would identify IRFs of a DSGE model that is obtained from the true DSGE model by removing some structural shocks. (c) The dimensions of VAR and IRF: The Monte Carlo experiments in Hall et al. (2012) show that the performance of IRF matching estimators deteriorates as the number of IRFs increases. Guerron-Quintana, Inoue and Kilian (2016) show that, when the number of impulse responses is greater than the number of VAR parameters, IRF matching estimators have nonstandard asymptotic distributions because the delta method fails. We conjecture that asymptotic properties of the QML model selection criterion may be affected because the bootstrap covariance matrix estimator is asymptotically singular. (d) The choice of weighting matrices: The optimal weighting matrix and diagonal weighting matrices are common choices for the weighting matrix. There are two arguments for the optimal weighting matrix. First, when the optimal weighting matrix is used, the IRF matching estimation criterion function can be interpreted as an approximate (log-)likelihood function where the IRF estimate is viewed as “an observation” because the optimal weighting matrix is the inverse of the bootstrap covariance matrix of that observation. Thus it is natural to interpret estimation results from the limited-information Bayesian inferential framework when the optimal weighting matrix is used. Second, when the optimal weighting matrix is used, the generalized information matrix equality of Chernozhukov and Hong (2003) is satisfied. Standard errors can be obtained from standard deviations of MCMC draws in the frequentist inferential framework. When the optimal weighting matrix is not used, one needs to use the sandwich formula in Chernozhukov and Hong (2003) or bootstrap the entire MCMC algorithm to obtain correct standard errors. The main argument for diagonal

28

weighting matrices is based on computational tractability of the resulting estimation criterion function. Even if a proposal density is poorly chosen because of the numerical behavior of the estimation criterion function based on the optimal weighting matrix, MCMC draws should still converge to the quasi posterior distribution although it may require a larger number of draws. In that sense, the effect of the choice of weighting matrices is arguably smaller for quasi Bayes estimators than for frequentists estimators, such as CMD and GMM. As shown in Hall and Inoue (2003) in the GMM framework, the pseudo true parameter depends on the choice of a weighting matrix. Although that may affects the finite-sample performance of the QML model selection criterion, the Monte Carlo results in section 5 show that the performance of our model selection criterion does not appear to depend on the choice of the weighting matrix. (e) The use of modified QMLs: There may be some cases the modified QMLs are recommended over the (unmodified) QMLs in model selection because of the possible inconsistency. One possibility is the case of point mass mixture priors which include a mass at a point mixed with a continuous distribution. For example, suppose two alternative models of nominal exchange rates, St , as Model A (IMA(1) model) α = θ:

∆St = εt + θεt−1 Model B (AR(2) model) β = (φ1 , φ2 ):

St = φ1 St−1 + φ2 St−2 + εt The two models are non-nested but are equivalent under the random walk specification, namely, if θ = 0 in model A and (φ1 , φ2 ) = (1, 0) in model B. Because the random walk model is known to be supported by many previous empirical studies as a preferred model for the nominal exchange rate, it make sense to employ point mass priors at θ = 0 in model A and (φ1 , φ2 ) = (1, 0) in model B. If the true model is the random walk model, the (unmodified) QML will select model B with positive probability. The modified QML will select model A over model B with probability approaching

29

one because the former is more parsimonious, however. (f ) Guidance to practitioners: To summarize, we suggest the practitioner to • Build a DSGE model in which – The number of structural shocks equals the number of observed variables; – The condition for identifying VAR-IRFs is satisfied in DSGE-IRFs. • Select the order of VAR models by information criteria, such as AIC, as done in section 5. because the true VAR representation is likely to be of infinite order. • Choose the maximum horizon so that the number of impulse response does not exceed the number of VAR parameters. • Use the optimal weighting matrix if standard errors are to be computed from MCMC draws.

7

Empirical Application

7.1

New Keynesian Phillips Curve: GMM Estimation

In this section, we apply our procedure to choose between the alternative specification of structural Phillips curves under nonzero trend inflation when the models are estimated using Bayesian GMM. Let πt = pt − pt−1 , where pt is the aggregate price index in log, and ulct be a unit labor cost. In Gal´ı and Gertler (1999), the hybrid New Keynesian Phillips Curve (hereafter NKPC) is derived from a Calvo (1983) type staggered price model with firms change prices with probability 1 − ξp . Within those firms that change prices, 1 − ω fraction of firms set prices optimally but the remaining ω fraction are rule-of-thumb (ROT) price setters who set their prices equal to the average price set in the most recent round of price adjustments with a correction based on the lagged inflation rate. When trend inflation is zero (π = 0), firms keep prices unchanged with probability ξp . In the case of nonzero trend inflation (π > 0), however, they are assumed to set prices using indexation to trend inflation with probability ξp (see Yun, 1996). Under these conditions, the NKPC of Gal´ı and Gertler (1999) can be derived as ˆ t π ˆt = γb,A π ˆt−1 + γf,A Et π ˆt+1 + κA ulc

30

(37)

ˆ t = ulct − ulc where π ˆt = πt − π is the log-deviation of πt from the trend inflation π, ulc is the log-deviation of ulct from its steady-state ulc and the coefficients are given by

γb,A =

ω , ξp + ω[1 − ξp (1 − β)]

γf,A =

βξp , ξp + ω[1 − ξp (1 − β)]

κA =

(1 − ξp )(1 − βξp )(1 − ω) , ξp + ω[1 − ξp (1 − β)]

where β = 0.99 is a discounted factor. In Smets and Wouters (2003, 2007), a partial indexation specification is used instead of the ROT specification of Gal´ı and Gertler (1999). In their specification, firms set prices at an optimal level with probability 1 − ξp . For the remaining ξp fraction of the firms, prices are determined as a weighted sum of lagged inflation and trend inflation (or steady state inflation) with an weight ιp on the lagged inflation. Under these conditions, the hybrid NKPC of Smets and Wouters (2003, 2007) can be derived as ˆ t. π ˆt = γb,B π ˆt−1 + γf,B Et π ˆt+1 + κB ulc

(38)

and the coefficients are given by γb,B =

ιp 1 + ιp β

γf,B , =

β , 1 + ιp β

κB =

(1 − ξp )(1 − βξp ) , ξp (1 + ιp β)

where ιp ∈ [0, 1] is the degree of partial indexation to lagged inflation. Note that when ω = 0 in the ROT specification and ιp = 0 in the partial indexation specification, both hybrid NKPCs become the baseline NKPC with only forward looking firms (γb,A = γb,B = 0 and γf,A = γf,B = β). In previous empirical literature, the classical GMM has often been employed to estimate the hybrid NKPC. In our estimation, we utilize the orthogonality condition of expectation error to past information, as well as the definition of π ˆt = πt − π, and estimate structural parameters α = [ξp , ω, π]0 for (37) and β = [ξp , ιp , π]0 for (38)

31

using quasi-Bayesian GMM estimator. In particular, for (37), the objective function cA,T fT (α) where fT (α) = (1/T ) PT f (xt , α), f (xt , α) = is qˆA,T (α) = (1/2)fT (α)0 W t=1 [z0t ut , π ˆ t ]0 , ˆ t ut = π ˆt − γb,A π ˆt−1 − γf,A π ˆt+1 − κA ulc and zt is a vector of instruments. The objective function for (38) can be similarly ˆ is computed from HAC estimator with Bartlett defined. Optimal weighting matrix W kernel and Andrews (1991) automatic bandwidth. For the estimation, we use US quarterly data of the inflation rate based on the GDP implicit price deflator for πt and the labor income share in the non-farm business sector for ulct . As for the choice of instruments zt , we follow Gal´ı and Gertler (1999): four lags of inflation, labor income share, long-short interest rate spread, output gap, wage inflation, and commodity price inflation. For the sample periods, we consider the Great Inflation period (from 1966:Q1 to 1982:Q3) and Post-Great Inflation period (from 1982:Q4 to 2016:Q4). The list of the structural parameters in our analysis, quasi-Bayesian estimates and prior distributions are reported in Table 12. In the table, those of reduced form parameters, including the slope of Phillips curve (κA and κB ) are also reported. The two specifications yield qualitatively similar results: First, the prior and posterior means tend to differ which may suggest that the parameters are strongly identified in these models. Second, the trend inflation rate became substantially lower after the Great Inflation period, as expected. Third, the slope of the Phillips curve (κA and κB ) got flattened in the post Great Inflation period compared to the Great Inflation period mainly due to the increased degree of price stickiness (ξp ). Fourth, the other parameters did not change much between these periods. Figure 1 plots the posterior distributions of structural and reduced form parameters. They are also similar between these two models. The figures show not only that the posterior mean of the slope of the Phillips curve became much smaller but also the posterior of the slope of the Phillips curve became more spread after the Great Inflation period among other things. Given the similarities of the parameter estimates, one may wonder if the two specifications have more or less the same goodness of fit to the data. Table 13 reports the QMLs for the two specifications and shows otherwise. The results suggest that the ROT

32

specification of Gal´ı and Gentler (1999) outperforms the partial index specification of Smets and Wouters (2003, 2007) for both sample periods we consider.

7.2

The Medium-Scale DSGE Model: IRF Matching Es-

timation In this section, we apply our procedure to a more empirically relevant medium-scale DSGE model originally developed by Christiano, Eichenbaum and Evans (2005) (hereafter CEE). In particular, we consider a modified version of the CEE model, which has been estimated by Smets and Wouters (2007), Altig, Christiano, Eichenbaum and Linde (2011), Christiano, Trabandt and Walentin (2011), and Christiano, Eichenbaum and Trabandt (2013), among others. The model is one of the most commonly used macroeconomic models among practitioners that incorporates the investment adjustment cost, habit formation in consumption, sticky prices, sticky wages, and the inflation-targeting monetary policy. In practice, this class of the model has been estimated using various methods. For example, CEE and Altig, Christiano, Eichenbaum and Linde (2011) employ the classical impulse response matching estimator, while Smets and Wouters (2007) estimate the model using a standard Bayesian method. As a third approach, Christiano, Trabandt and Walentin (2011), and Christiano, Eichenbaum and Trabandt (2013) employ the quasi-Bayesian impulse response matching function estimator, or the Laplace-type estimator. For the purpose of evaluating the relative importance of various frictions in the model estimated by the standard Bayesian method, Smets and Wouters (2007) utilize the marginal likelihood. Their question is whether all the frictions introduced in the canonical DSGE model are really necessary in order to describe the dynamics of observed aggregate data. To answer this question, they compare marginal likelihoods of estimated models when each of the frictions was drastically reduced one at time. Among the sources of nominal frictions, they claim that both price and wage stickiness are equally important while indexation is relatively unimportant in both goods and labor markets. Regarding the real frictions, they claim that the investment adjustment costs are most important. They also find that, in the presence of wage stickiness, the introduction of variable capacity utilization is less important.

33

Here, we conduct a similar exercise using QMLs obtained in the quasi-Bayesian IRF matching estimation. The data and estimated impulse response functions are identical to the ones used in Christiano, Trabandt and Walentin (2011).9 They estimate a VAR(2) model of 14 variables using the US quarterly data from 1951Q1 to 2008Q4. Then, a combination of short-run and long-run restrictions is used to identify the responses to three types of shocks in the economy: (i) a monetary policy shock, (ii) a neutral technology shock and (iii) an investment-specific technology shock. All the specifications of shock processes we employ here are same as those used in Christiano, Trabandt and Walentin (2011). In respect to the monetary policy shock, the interest rate Rt is assumed to follow the process given by ln(Rt /R) = ρR ln(Rt−1 /R) + (1 − ρR ) [rπ ln(πt+1 /π) + ry ln(gdpt /gdp)] + εR,t 2 ). The neutral technology Z in log where gdpt is scaled real GDP and εR,t ∼ iid(0, σR t

is assumed to be I(1) with its growth generated from an iid process gZ,t = γZ + εZ,t where gZ,t = ln(Zt /Zt−1 ) and εZ,t ∼ iid(0, σZ2 ). The investment-specific technology Ψt in log is also assumed to be I(1) but its growth is generated from an AR(1) process given by gΨ,t = (1 − ρΨ )γΨ + ρΨ gΨ,t−1 + εΨ,t 2 ). The structural parameters are estiwhere gΨ,t = ln(Ψt /Ψt−1 ) and εΨ,t ∼ iid(0, σΨ

mated by matching the first 15 responses of selected 9 variables to 3 shocks, less 8 zero contemporaneous responses to the monetary policy shock (so that the total number of responses to match is 397). Since our purpose is to evaluate the relative contribution of various frictions, we estimate some additional parameters, such as the wage stickiness parameter ξw , wage indexation parameter ιw and price indexation parameter ιp , which are fixed in the analysis of Christiano, Trabandt and Walentin (2011).10 The list of estimated structural parameters in our analysis, quasi-Bayesian estimates and the prior distribution, are reported in Table 14. This estimated model serves as the baseline model when we compare with other models using QMLs. 9 10

See the data appendix of their paper for the detailed explanation. In our analysis, both price markup and wage markup parameters are fixed at 1.2.

34

The estimated impulse response functions from the VAR model and theoretical impulse response functions implied by the baseline model are provided in Figures 2, 3 and 4. The thin solid line in figures shows the estimated impulse response functions along with the shadow area which represents the 95% error bands obtained by the bootstrap. This part of the results is exactly the same as Figures 10 to 12 of Christiano, Trabandt and Walentin (2011) because we are using the same estimated impulse response functions as a target to match. The thick line shows the theoretical impulse response functions of the baseline model obtained by matching the VAR-based impulse response functions (thin line). Our results of the model-implied impulse response functions slightly differ from theirs reflecting minor changes in the model. However, dynamics predicted by our model and theirs are broadly consistent with estimated impulse response functions. Most notably, the model can replicate (i) delayed and gradual response of inflation to a monetary policy shock (Figure 2); (ii) drop in inflation in response to a neutral technology shock (Figure 3); and (iii) gradual response of the price of investment to an investment-specific technology shock (Figure 4). We follow Smets and Wouters (2007) and divide the sources of frictions of the baseline model into two groups. First, nominal frictions are sticky prices, sticky wages, price indexation and wage indexation. Second, real frictions are investment adjustment costs, habit formation, and capital utilization. We estimate additional submodels, which reduces the degree of each of the seven frictions. The computed QMLs for 8 models, including the baseline model, are reported in Table 15. Both QMLs based on the Laplace approximation and the modified harmonic mean estimator are reported. For the reference, also included in the table are the original marginal likelihoods obtained by Smets and Wouters (2007) based on the different estimation method applied to the different data set. The first column shows the quasi-posterior mean of relevant structural parameters of the baseline model along with its QML. Let us first examine the relative role of nominal frictions. The second and third columns show the results when the degree of nominal price and wage stickiness (ξp and ξw ) is set at 0.10, respectively. Consistent with Smets and Wouters (2007), the results show the importance of both types of nominal frictions. Unlike the result obtained by Smets and Wouters (2007), however, the QML deteriorates more by restricting the degree of wage stickiness. The fourth and fifth columns show the results when the

35

price and wage indexation parameters (ιp and ιw ) are set at 0.01, respectively. Again, consistent with Smets and Wouters (2007), neither price nor wage indexation plays a very important role in terms of improving the value of QMLs. The value of QML is similar to that of baseline model even when the price indexation parameter is restricted to a very low value. In fact, when the wage indexation parameter is set at a small value, QML seems to improve over the baseline model. Thus, we can conclude that Calvotype frictions in price and wage settings are empirically more important than the price and wage indexation to past inflation. Let us now turn to the role of real frictions. The remaining three columns show the results when each of investment adjustment cost parameter (S 00 ), consumption habit parameter (b) and capital utilization cost parameter (σa ) is set at some small values. The results show that restricting habit formation in consumption significantly reduces the QML compared to other two real frictions, suggesting the relatively important role of the consumption habit. Our result on the role of capital utilization costs is also somewhat similar to the one obtained by Smets and Wouters (2007) in the sense that it has a relatively minor role in increasing the fit of the model. Overall, our results seem to support the empirical evidence obtained by Smets and Wouters (2007), despite the fact that our analysis is based on a very different model selection criterion.

8

Concluding Remarks

In this paper we established the consistency of the model selection criterion based on the quasi-marginal likelihood (QML) obtained from Laplace-type estimators. We considered cases in which parameters are strongly identified and are weakly identified. Our Monte Carlo results confirmed our consistency results. Our proposed procedure was also applied to select an appropriate specification in New Keynesian macroeconomic models using US data. Our proposed model selection criterion is useful when one selects a model, estimates the structural parameters of the selected model and interpret them. While Bayesian model averaging will select the correct model asymptotically, weights are nonzero in finite samples. It is not clear how to interpret structural parameters of different DSGE models that are estimated simultaneously. Bayesian model averaging may be more

36

useful for forecasting. Application of Bayesian model averaging to IRF matching is beyond the scope of this paper and is left for future research.

37

Appendix Lemma 1. Suppose that Assumptions 1 and 2 hold. Define a profile estimator of αs by cT (b α bs,T (αw ) = argminαs ∈As (b γT − f (αs , αw ))0 W γT − f (αs , αw ))

(39)

for each αw ∈ Aw . Then αs (αw ) − αs,0 k sup kb

1

= Op (T − 2 ), (40)

αw ∈Aw 1

qA,T (b αs,T (αw ), αw ) − qbA,T (b αs,T (αw,0 ), αw,0 )| = Op (T − 2 ), (41) sup |b

αw ∈Aw

 αs,T (αw ), αw ) − ∇2αs qbA,T (b αs,T (αw,0 ), αw,0 ) sup vech ∇2αs qbA,T (b

1

= Op (T − 2 ). (42)

αw ∈Aw

Proof of Lemma 1. The pointwise convergence of α bs (αw ) to αs , p

α bs,T (αw ) → αs,0

(43)

for each αw ∈ Aw , follows from Assumption 1(a), (b) and 2(b). α bs,T (αw ) satisfies the first order conditions: ∇αs qA (b αs,T (αw ), αw ) = 0pAs ×1 .

(44)

Let Jbs and Jbw denote the Jacobian matrices of the left hand side of (44) with respect to αs and αw , respectively. It follows from the pointwise convergence, Assumptions 1(b), 2(a) and 2(c), Jbs is nonsingular with probability approaching one. Thus, applying the implicit function theorem to (44) yields 1 ∂α bs,T (αw ) = −Jbs−1 Jbw = Op (T − 2 ), 0 ∂αw

(45)

where Op (T −1/2 ) is uniform in αw ∈ Aw which follows from Assumptions 1(a)(b) and 2(a). It follows from the mean value theorem and (45) that 1

0 0 α bs,T (αw )−α bs,T (αw ) = Op (T − 2 kαw − αw k).

(46)

Given the pointwise convergence, the compactness of Aw and stochastic equicontinuity (46), we can strengthen the pointwise convergence to uniform convergence 38

(40) by Theorem 1 of Andrews (1992). Then (41) and (42) follow from (40), Assumptions 1(b) and 2(a). Proof of Theorem 1. Let B (b αs,T ) = {αs ∈ As : kαs − α bs,T k < }, where  > 0. Write the marginal likelihood as the sum of two integrals: Z Z −T qbA,T (α) πA (α)e dα + πA (α)e−T qbA,T (α) dα. mA = (As \B (b αs,T ))×Aw

B (b αs,T )×Aw

(47) It follows from Taylor’s theorem, the first order condition for α bs,T (αw ), (40) and (42) that αs,T (αw ), αw )0 (αs − α bs,T (αw )) qbA,T (α) = qbA,T (b αs,T (αw ), αw ) + ∇αs qbA,T (b 1 + (αs − α bs,T (αw ))0 ∇2αs qbA,T (¯ αs,T (α), αw )(αs − α bs,T (αw )) 2 1 = qbA,T (b αs,T (αw ), αw ) + (αs − α bs,T (αw ))0 ∇2αs qbA,T (¯ αs,T (αw ), αw )(αs − α bs,T (αw )), 2 1 1 (48) bs,T )0 ∇2αs qbA,T (b = qbA,T (b αT ) + (αs − α αT )(αs − α bs,T ) + Op (T − 2 ), 2 uniformly in αw ∈ Aw , where α ¯ s,T (αw ) is a point between αs and α bs,T (αw ). Using (48), Lemma 1 and Assumption 1(c) the first integral on the right hand side of (47) can be written as: Z

T

0

2

πA (α)e−T qbA,T (bαs,T (αw ),αw )− 2 (αs −bαs,T (αw )) ∇αs qbA,T (α¯ s,T (α),αw )(αs −bαs,T (αw )) dα(1 + op (1))

B (b αs,T )×Aw

= e

−T qbA,T (b αT )

Z

T

0

2

πA (α)e− 2 (αs −bαs,T ) ∇αs qbA,T (bαT )(αs −bαs,T ) dα(1 + op (1))

B (b αs,T )×Aw

= e−T qbA,T (bαT )

Z

T

0

T

2

2

πA (b αs,T , αw )e− 2 (αs −bαs,T ) ∇αs qbA,T (bαT )(αs −bαs,T ) dα(1 + op (e− 2 λT  ))

B (b αs,T )×Aw −T qbA,T (b αT )

"Z

#

= πAs (b αs,T )e

e

− T2 (αs −b αs,T )0 ∇2αs qbA,T (b αT )(αs −b αs,T )

dαs

B (b αs,T )

Z × Aw

 πAw |As (αw |b αs,T )dαw (1 + op (1))

αT )e−T qbA,T (bαs,T ) = πAs (b



2π T

 pAs 2

2 − 1 ∇α qbA,T (b 2 (1 + op (1)), α ) T s

(49) T

2

where λT is a sequence of strictly positive bounded constants and op (e− 2 λT  ) is uniform on B (b αT ) and πAw |As (αw |αs ) is the prior of αw conditional on αs . Thus, 39

the first integral on the right hand side of (47) can be approximated by πAs (b αs,T )e−T qbA,T (bαs,T )



2π T

 pA2 s

2 − 1 ∇α qbA,T (b αT ) 2 . s

(50)

By letting  → 0, it follows from Assumptions 1(b) and 2(b) that the second integral on the right hand side of (47) can be bounded as Z Z −T qbA,T (α) πA (α)e dα ≤ πA (α)dα × e−T inf α∈(As \B (αbT ))×Aw qbA,T (α) (As \B (b αT ))×Aw (As \B (b αT ))×Aw  −T (b qA,T (b αT )+η) = O e , (51) for some η > 0. Combining (50) and (51), we obtain mA = πAs (b αs,T )e−T qbA,T (bαT )



2π T

 pA2 s

2 − 1 ∇α qbA,T (b 2 (1 + op (1)). α ) T s

(52)

When there is no strongly identified parameter (pAs = 0), it follows from Assumption 2(a) that mA = Op (1). Similarly,   π (βb )e−T qbB,T (βbT ) Bs s,T mB =  Op (1)

2π T

(53)

− 1  pB2 s 2 2 b q b ( β ) ∇ βs B,T T (1 + op (1)) if pBs > 0, if pBs = 0. (54)

Theorem 1 follows from (52)–(54) and Assumptions 1(b)(c) and 2(c). We will use the following lemma in the proof of Theorem 2: Lemma 2. Suppose that Assumptions 1 and 3 hold. Define a profile estimator of αs by cT (b α bs,T (αp ) = argminαs ∈As (b γT − f (αs , αp ))0 W γT − f (αs , αp )),

(55)

for each αp ∈ Ap . Then sup kb αs (αp ) − αs,0 k = op (1), (56) αp ∈Ap,0

sup |b qA,T (b αs,T (αp ), αp ) − qbA,T (b αs,T (αp,0 ), αp,0 )| = op (1), (57)

αp ∈Ap,0

 1 ). sup vech ∇2αs qbA,T (b αs,T (αp ), αp ) − ∇2αs qbA,T (b αs,T (αp,0 ), αp,0 ) = Op (T − 2(58)

αp ∈Ap,0

40

Proof of Lemma 2. The pointwise convergence of α bs (αp ) follows from the usual arguments. Note that α bs,T (αp ) satisfies the first order conditions: cT (b Fαs (b αs,T (αp ), αp )0 W γT − f (b αs,T (αp ), αp )) = 0pAs ×1 ,

(59)

for every αp ∈ Ap,0 . Let Jbs and Jbp denote the Jacobian matrices of the left hand side of (59) with respect to αs and αp , respectively. By Assumptions 1(b) and 3(c), Jbs is nonsingular with probability approaching one. Thus, it follows from the implicit function theorem that ∂α bs,T (αp ) = −Jbs−1 Jbp = Op (1). ∂αp0

(60)

It follows from the mean value theorem and (60) that α bs,T (αp0 ) − α bs,T (αp ) = Op (kαp0 − αp k),

(61) p

where αp , αp0 ∈ Ap,0 . Given the pointwise convergence α bs,T (αp ) → αs,0 for each αp ∈ Ap,0 , the compactness of Ap and stochastic equicontinuity (61), we can strengthen the pointwise convergence to uniform convergence (56) by Theorem 1 of Andrews (1992). It follows from Assumption 1(b), Lemma 2(a) and the definition of Ap,0 that |b qA,T (b αs,T (αp ), αp ) − qbA,T (b αs,T (αp,0 ), αp,0 )| ≤ |b qA,T (b αs,T (αp ), αp ) − qA (b αs,T (αp ), αp )| +|b qA,T (b αs,T (αp,0 ), αp,0 ) − qA (b αs,T (αp,0 ), αp,0 )| +|qA (b αs,T (αp ), αp ) − qA (b αs,T (αp,0 ), αp,0 )| = op (1) uniformly in αp ∈ Ap,0 , from which (57) follows. (58) follows from similar arguments. Proof of Theorem 2(a). 41

We can write Z Z πA (α) exp (−T qbA,T (α)) dα = A

πA (α) exp (−T qbA,T (α)) dα

A0

Z + (Ac0 )−ε

πA (α) exp (−T qbA,T (α)) dα

Z + A\(A0 ∪(Ac0 )−ε )

πA (α) exp (−T qbA,T (α)) dα

= I1 + I2 + I3 , say. It follows from Assumption 1(b) that Z I1 = πA (α) exp (−T qA (α)) dα(1 + op (1)) A0 Z = πA (α)dα exp (−T qA (α0 )) (1 + op (1)),

(62)

(63)

A0

for any α0 ∈ A0 . It follows from Assumptions 1(b) and 3(a) that I2 = op (exp (−T qA (α0 ))) .

(64)

By letting ε → 0, the term I3 can be made arbitrarily small. Combining (62)– (64), we can approximate the QML for model A by Z mA = πA (α)dα × exp (−T qA (α0 )) (1 + op (1))

(65)

A0

for any α0 ∈ A0 . Similarly, the QML for model B can be approximated by Z mB = πB (β)dβ × exp (−T qB (β0 )) (1 + op (1)), (66) B0

for any β0 ∈ B0 . Theorem 2(a) follows from (65) and (66). Proof of Theorem 2(b). Define B (b αs,T ) = {αs ∈ As : kαs − α bs,T k < }.

42

The QML of model A can be written as Z πA (α) exp (−T qbA,T (α)) dα A Z = πA (α) exp (−T qbA,T (α)) dα B (b αs,T )×Ap,0 Z + πA (α) exp (−T qbA,T (α)) dα ((B (b αs,T )×Ap,0 )c )−ε Z + πA (α) exp (−T qbA,T (α)) dα A\((B (b αs,T )×A0 )∪(B (b αs,T )×A0 )c ))−ε )

= I1 + I2 + I3 , say.

(67)

It follows from Assumption 1(b) and Lemmas 2(a)(b) that Z

πA (α)e−T qbA,T (α) dα

I1 = B (b αs,T )×Ap,0

Z

0

0

0

πA (α)e−T qbA,T ([αs αp,0 ] ) dα(1 + )

= B (b αs,T )×Ap,0

Z

T

0

2

πA (α)e−T qbA,T (bαs,T (αp,0 ),αp,0 ) e− 2 (αs −bαs,T (αp,0 )) ∇αs qbA,T (bαs,T (αp,0 ),αp,0 )(αs −bαs,T (αp,0 )) dα

= B (b αs,T )×Ap,0

×(1 + op ()), = e

−T qbA,T ([b αT (αp,0

−T qbA,T (b αT )



= e

)0

2π T

α0p,0 ]0 )

 pAs 2



2π T

 pAs 2

2

1 0 0 0 − 2

∇α qbA,T ([b α (α ) α ] ) p,0 T p,0 s

2

− 1 0

∇α qbA,T ([b αT (αp,0 )0 αp,0 ]0 ) 2 s

Z πA (αs,0 , αp )dαp (1 + op ()) Ap,0

Z πA (αs,0 , αp )dαp (1 + op ()). Ap,0

Thus, the QML can be approximated by mA = e

−T qbA,T (b αT )



2π T

 pAs 2

2 1 0 0 0 − 2 ∇α qbA,T ([b α (α ) α ] ) p,0 s,T p,0 s

Z πA (αs,0 , αp )dαp (1+op (1)), Ap,0

(69) as  → 0. The rest of the proof is analogous to that of Theorem 1. Proof of Proposition 1(a). Because cT (b ∇αs qbA,T (α) = 2Fs (αs )0 W γT − f (α)),

(70)

∇αs qA (α) = 2Fs (αs )0 W (γ − f (α)),

(71)

it follows from the compactness of A, the twice continuous differentiability of f and

43

(68)

p cT → W W that

h i c ∇αs qbA,T (α) − ∇αs qA (α) = −2Fs (αs ) WT (b γT − f (α)) − W (γ − f (α)) h i cT − W )(γ − f (α)) + W cT (b = −2Fs (αs )0 (W γT − γ) 0

= op (1),

(72)

uniformly in αs ∈ As as required in Assumption 1(b). Similarly, because 0

cT ⊗ Ip ] ∂vec(Fs (αs ) )(73) cT Fs (αs ) − 2[(b γT − fs (αs ))0 W , ∇2αs qbA,T (α) = 2Fs (αs )0 W As ∂αs0 ∂vec(Fs (αs )0 ) , (74) ∇2αs qA (α) = 2Fs (αs )0 W Fs (αs ) − 2[(γ − fs (αs ))0 W ⊗ IpAs ] ∂αs0 it follows that cT − W )Fs (αs ) ∇αs qbA (α) − ∇αs qA (α) = 2Fs (αs )0 (W

0

cT − W ) ⊗ Ip ] ∂vec(Fs (αs ) ) , −2[(γ − fs (αs ))0 (W As ∂αs0 0 cT ⊗ Ip ] ∂vec(Fs (αs ) ) , −2[(b γT − γ)0 W As ∂αs0 = op (1), (75) uniformly in αs ∈ As as in Assumption 1(b). Because Fs (αs,0 ) has rank pAs and W is positive definite, 0

cT ⊗ Ip ] ∂vec(Fs (αs,0 ) ) , ∇2αs qA (α0 ) = 2Fs (αs,0 )0 W Fs (αs,0 ) − 2[(γ − fs (αs,0 ))0 W As 0 ∂αs,0 = 2Fs (αs,0 )0 W Fs (αs,0 )

(76)

is nonsingular as required in Assumption 2(c). Because 1 1 cT (b qbA,T (α) = (b γT − fs (αs ) − T − 2 fw (α))0 W γT − fs (αs ) − T − 2 fw (α)) 1 cT (b cT fw (α) = (b γT − fs (αs )0 W γT − fs (αs ) + 2T − 2 (b γT − fs (αs ))0 W

cT fw (α), +T −1 fw (α)0 W

(77)

qA (α) can be written as qAs (αs ) + T −1/2 qAw (α) where qAs (αs ) = (γ − fs (αs ))0 W (γ − fs (αs )) and qAw (α) is defined as 1

T 2 (b γT − fs (αs ))0 W fw (α) + fw (α)0 W fw (α) ⇒ qAw (α)

44

(78)

and is Op (1) uniformly in α ∈ A as in Assumption 2(a). Because W is positive definite, the uniqueness of αs imply Assumption 2(b). Proof of Proposition 1(b): Because cT (b γT − f (α)), ∇αs qbA,T (α) = 2Fs (α)0 W

(79)

∇αs qA (α) = 2Fs (α)0 W (γT − f (α)),

(80)

it follows from the compactness of A, the twice continuous differentiability of f and p cT → W W that

i h cT (b γT − f (α)) − W (b γT − f (α)) ∇αs qbA,T (α) − ∇αs qA (α) = −2Fs (α)0 W h i cT − W )(γ − f (α)) + W cT (b γT − γ) = −2Fs (α)0 (W = op (1),

(81)

uniformly in α ∈ A as required in Assumption 1(b). Similarly, because ∂vec(Fs (α)0 ) , (82) ∂αs0 ∂vec(Fs (α)0 ) , (83) ∇2αs qA (α) = 2Fs (α)0 W Fs (α) − 2[(γ − f (α))0 W ⊗ IpAs ] ∂αs0

cT ⊗ Ip ] cT Fs (α) − 2[(b γT − f (α))0 W ∇2αs qbA,T (α) = 2Fs (α)0 W As

it follows that cT − W )Fs (α) ∇2αs qbA (α) − ∇2αs qA (α) = 2Fs (α)0 (W

∂vec(Fs (α)0 ) , ∂αs0 0 cT ⊗ Ip ] ∂vec(Fs (α) ) , −2[(b γT − γ)0 W (84) As ∂αs0 = op (1), (85) cT − W ) ⊗ Ip ] −2[(γ − f (αs ))0 (W As

uniformly in αs ∈ As as in Assumption 1(b). Assumption 4(h) implies that ∇2αs qA (α0 ) is positive definite as required by Assumption 3(b). Because W is positive definite, the uniqueness of αs imply Assumption 3(b). Proof of Proposition 2(a). Because T 1 X ∂f (xt , α) ∇αs qbA,T (α) = 2 T ∂α0

"

#0

t=1

∇αs qA (α) = 2Fs (αs )0 WA (fs (αs )),

45

T X cA,T 1 W f (xt , α), T

(86)

t=1

(87)

and because (1/T ) fs (αs ) = op (1) and

PT

0 t=1 ∂f (xt , α)/∂α − Fs (αs ) p cA,T → W WA , it follows from

= op (1) and (1/T )

PT

t=1 f (xt , α)



the compactness of A and the three

times continuous differentiability of f that ∇αs qbA,T (α) − ∇αs qA (α) = op (1),

(88)

uniformly in αs ∈ As as required in Assumption 1(b). Similarly, because #0 " # T T X X 1 ∂f (x , α) 1 ∂f (x , α) t t cA,T ∇2αs qbA,T (α) = 2 W T ∂α0 T ∂α0 t=1 t=1 h P "  # 0 T  1X ∂vec T1 Tt=1 cA,T ⊗ Ip  −2 f (xt , α) W As  T ∂αs0 "

∂f (xt ,α) ∂α0

i

(89),

t=1

∇2αs qA (α) = 2Fs (αs )0 WA Fs (αs ) − 2[fs (αs )0 WA ⊗ IpAs ]

∂vec(Fs (αs )0 ) , (90) ∂αs0

T

∂vec

1 X ∂f (xt , α) T ∂α0 t=1 i h P ∂f (xt ,α) T 1 T

∂α0

t=1

∂αs0

= Fs (αs ) + op (1),

=

(91)

∂vec(Fs (αs )0 ) + op (1), ∂αs0

(92) (93)

it follows that ∇αs qbA (α) − ∇αs qA (α) = op (1),

(94)

∇2αs qbA (α) − ∇2αs qA (α) = op (1),

(95)

uniformly in αs ∈ As as in Assumption 1(b). Because ∇2αs qA (α0 ) = 2Fs (αs,0 )0 WA Fs (αs,0 ) − 2[fs (αs,0 )0 WA ⊗ IpAs ]

∂vec(Fs (αs,0 )0 ) , 0 ∂αs,0

(96)

Assumption 5(h) implies Assumption 2(c). Because #0 " # T T X 1X 1 cA,T qbA,T (α) = f (xt , α) W f (xt , α) T T t=1 t=1 h i0 h i 1 1 1 1 = fs (αs ) + T − 2 fw (α) + Op (T − 2 ) WA fs (αs ) + T − 2 fw (α) + Op (T − 2 ) "

= fs (αs )0 WA fs (αs ) + Op (T −1 )

(97)

uniformly in α ∈ A, qA (α) can be written as qAs (αs ) + T −1/2 qAw (α) where qAs (αs ) = (γ − fs (αs ))0 WA (γ − fs (αs )) and qAw (α) = Op (1) uniformly in α ∈ A as in Assumption

46

2(a). Because WA is positive definite, the uniqueness of αs imply Assumption 2(b). Proof of Proposition 2(b): Because T 1 X ∂f (xt , α) ∇αs qbA,T (α) = 2 T ∂α0

"

t=1

#0

T 1X c WA,T f (xt , α), T

(98)

t=1

∇αs qA (α) = 2Fs (αs )0 WA (fs (αs )),

(99)

it follows from the compactness of A, the three times continuous differentiability p cA,T → of f and W WA that

∇αs qbA,T (α) − ∇αs qA (α) = op (1),

(100)

uniformly in α ∈ A as required in Assumption 1(b). Similarly, it follows that ∇2αs qbA (α) − ∇2αs qA (α) = op (1),

(101)

uniformly in αs ∈ As as in Assumption 1(b). Assumption 5(h) implies that ∇2αs qA (α0 ) is positive definite as required by Assumption 3(b). Because W is positive definite, the uniqueness of αs imply Assumption 3(b).

47

References Altig, David E., Lawrence J. Christiano, Martin Eichenbaum and Jesper Lind´e (2011), “Firm-Specific Capital, Nominal Rigidities and the Business Cycle,” Review of Economic Dynamics, 14 (2), 225–247. An, Sungbae, and Frank Schorfheide (2007), “Bayesian Analysis of DSGE Models,” Econometric Reviews, 26, 113–172. Andreasen, M.M., J. Fern´ andez-Villaverde and J. Rubio-Ram´ırez (2016), “The Pruned State-Space System for Non-Linear DSGE Models: Theory and Empirical Applications,” Working Paper. Andrews, Donald W.K., (1991), “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59 (3), 817–858. Andrews, Donald W.K., (1992), “Generic Uniform Convergence,” Econometric Theory, 8, 241–257. Andrews, Donald W.K., (1999), “Consistent Moment Selection Procedures for Generalized Method of Moments Estimation,” Econometrica, 67, 543–564. Calvo, Guillermo A. (1983), “Staggered Prices in a Utility-Maximizing Framework,” Journal of Monetary Economics, 12 (3), 383–398. Canova, Fabio, and Luca Sala (2009), “Back to Square One: Identification Issues in DSGE Models,” Journal of Monetary Economics, 56, 431–449. Chernozhukov, Victor, and Han Hong (2003), “An MCMC Approach to Classical Estimation,” Journal of Econometrics, 293–346. Chernozhukov, Victor, Han Hong and Elie Tamer (2007), “Estimation and Confidence Regions for Parameter Sets in Econometric Models,” Econometrica, 75, 1243–1284. Chib, Siddhartha, and Ivan Jeliazkov (2001), “Marginal Likelihood from the Metropolis-Hastings Output,” Journal of the American Statistical Association, 96, 270–281. Christiano, Lawrence J., Martin S. Eichenbaum and Charles Evans (2005), “Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy, 113, 1–45.

48

Christiano, Lawrence J., Martin S. Eichenbaum and Mathias Trabandt (2013), “Unemployment and Business Cycles,” unpublished manuscript, Northwestern University and Board of Governors of the Federal Reserve System. Christiano, Lawrence J., Mathias Trabandt and Karl Walentin (2011), “DSGE Models for Monetary Policy Analysis,” in Benjamin M. Friedman and Michael Woodford, editors: Handbook of Monetary Economics, Volume 3A, The Netherlands: North-Holland. Corradi, Valentina, and Norman R. Swanson (2007), “Evaluation of Dynamic Stochastic General Equilibrium Models Based on Distributional Comparison of Simulated and Historic Data,” Journal of Econometrics, 136 (2), 699–723. Dawid, A.P. (1994), “Selection Paradoxes of Bayesian Inference,” in Multivariate Analysis and its Applications, Volume 24, eds., T.W. Anderson, K. A.-T. A. Fang and L. Olkin, IMS: Philadelphia, PA. Del Negro, Marco, and Frank Schorfheide (2004), “Priors from General Equilibrium Models for VARs,” International Economic Review, 45(2), 643–73. Del Negro, Marco, and Frank Schorfheide (2009), “Monetary Policy Analysis with Potentially Misspecified Models,” American Economic Review, 99 (4), 1415–1450. Dridi, Ramdan, Alain Guay and Eric Renault (2007), “Indirect Inference and Calibration of Dynamic Stochastic General Equilibrium Models,” Journal of Econometrics, 136 (2), 397–430. Fernandez-Villaverde, Jesus, and Juan Francisco Rubio-Ramirez (2004), “Comparing Dynamic Equilibrium Models to Data: A Bayesian Approach,” Journal of Econometrics, 123, 153–187. Gal´ı, Jordi, and Mark Gertler (1999), “Inflation Dynamics: A Structural Econometric Analysis,” Journal of Monetary Economics, 44 (2), 195–222. Gemma, Yasufumi, Takushi Kurozumi and Mototsugu Shintani (2017), “Trend Inflation and Evolving Inflation Dynamics: A Bayesian GMM Analysis of the Generalized New Keynesian Phillips Curve,” unpublished manuscript, Bank of Japan and University of Tokyo.

49

Geweke, John, (1998), “Using Simulation Methods for Bayesian Econometric Models: Inference, Development and Communication,” Staff Report 249, Federal Reserve Bank of Minneapolis. Guerron-Quintana, Pablo, Atsushi Inoue and Lutz Kilian (2013), “Frequentist Inference in Weakly Identified DSGE Models,” Quantitative Economics 4, 197229. Guerron-Quintana, Pablo, Atsushi Inoue and Lutz Kilian (2016), “Impulse Response Matching Estimators for DSGE Models,” accepted for publication in Journal of Econometrics. Hall, A.R., and A. Inoue (2003), “The Large Sample Behavior of the GMM Estimator in Misspecified Models,” Journal of Econometrics, 114, 361–394. Hall, A.R., A. Inoue, J.M. Nason and B. Rossi (2012), “Information Criteria for Impulse Response Function Matching Estimation of DSGE Models,” Journal of Econometrics, 170, 499–518. Herbst, Edward P., and Frank Schorfheide (2015), Bayesian Estimation of DSGE Models. Princeton, NJ: Princeton University Press. Hnatkovska, Victoria, Vadim Marmer and Yao Tang (2012), “Comparison of Misspecified Calibrated Models: The Minimum Distance Approach,” Journal of Econometrics, 169, 131-138. Hong, Han and Bruce Preston (2012), “Bayesian Averaging, Prediction and Nonnested Model Selection,” Journal of Econometrics, 167, 358–369. Inoue, Atsushi, and Lutz Kilian (2006), “On the Selection of Forecasting Models,” Journal of Econometrics, 130, 273–306. ` J` orda, Oscar, and Sharon Kozicki (2011), “Estimation and Inference by the Method of Projection Minimum Distance,” International Economic Review, 52, 461–487. Kim, Jae-Young (2002), “Limited Information Likelihood and Bayesian Analysis,” Journal of Econometrics, 107 , 175–193. Kim, Jae-Young (2014), “An Alternative Quasi Likelihood Approach, Bayesian

50

Analysis and Data-based Inference for Model Specification, ” Journal of Econometrics, 178 , 132–145. Kormilitsina, Anna, and Denis Nekipelov (2016), “Consistent Variance of the Laplace Type Estimators: Application to DSGE Models,” International Economic Reviews, 57, 603–622. Leeb, H., and B.M. P¨ otscher (2005), “Model Selection and Inference: Facts and Fiction,” Econometric Theory, 21, 21–59. Leeb, Hannes, and Benedikt M. P¨otscher (2009), “Model Selection,” in T.G. Andersen, R.A. Davis, J.-P. Kreiss and T. Mikosch eds., Handbook of Financial Time Series, Springer-Verlag. Moon, Hyungsik Roger, and Frank Schorfheide (2012), “Bayesian and Frequentist Inference in Partially Identified Models,” Econometrica, 80, 755–782. Newey, Whitney K., and Daniel L. McFadden (1994), “Large Sample Estimation and Hypothesis Testing,” in Robert F. Engle and Daniel L. McFadden eds., Handbook of Econometrics, Volume IV, 2111–2245. Nishii, R. (1988), “Maximum Likelihood Principle and Model Selection when the True Model is Unspecified,” Journal of Multivariate Analysis, 27, 392–403. Phillips, Peter C.B. (1996), “Econometric Model Determination,” Econometrica, 64, 763–812. Schorfheide, Frank, (2000), “Loss Function-Based Evaluation of DSGE Models,” Journal of Applied Econometrics, 15 (6), 645–670. Shin, Minchul (2014), “Bayesian GMM,” unpublished manuscript, University of Pennsylvania. Sims, Christopher A., Daniel F. Waggoner and Tao Zha (2008), “Methods for Inference in Large Multiple-Equation Markov-Switching Models,” Journal of Econometrics, 146, 255–274. Sin, Chor-Yiu, and Halbert White (1996), “Information Criteria for Selecting Possibly Misspecified Parametric Models,” Journal of Econometrics, 71, 207–225. Smets, Frank, and Rafael Wouters (2003), “An Estimated Dynamic Stochastic

51

General Equilibrium Model of the Euro Area,” Journal of the European Economic Association, 1 (5), 1123–1175. Smets, Frank, and Rafael Wouters (2007), “Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach,” American Economic Review, 97, 586–606. Stock, James H., and Jonathan H. Wright (2000), “GMM with Weak Identification,” Econometrica, 68, 1055–1096. Vuong, Quang H., (1989), “Likelihood Ratio Tests for Model Selection and NonNested Hypothesis,” Econometrica, 57, 307–333. White, Halbert (1982), “Maximum Likelihood Estimation of Misspecified Models,” Econometrica, 50(1), 1–25. Yun, Tack (1996), “Nominal Price Rigidity, Money Supply Endogeneity, and Business Cycles,” Journal of Monetary Economics, 37 (2-3), 345–370. Zellner, Arnold (1998), “Past and Recent Results on Maximal Data Information Priors,” Journal of Statistical Research, 32 (1), 1–22.

52

Table 1: Simulation design

True Model

Model A

Model B

Identification

Design

σ

κ

σ

κ

σ

κ

Strong

1

1

0.5

estimated

estimated

estimated

fixed at 1

2

1

0.5

estimated

fixed at 0.5

estimated

estimated

3

1

0.5

estimated

fixed at 0.5

estimated

fixed at 1

4

1

0.5

estimated

fixed at 0.5

estimated

estimated

5

1

α = 0.5

fixed at 1

estimated

fixed at 0.5

estimated

Weak

Partial

ζ=1 6

1

α = 0.5

(α, ζ) fixed at 1

ζ=1

estimated

(α, ζ) estimated

(α, ζ)

estimated (α, ζ)

Notes. In the cases of strong and partial identification (design 1,2,5 and 6), f (σ, κ) = [1 + σ 2 , κ + σ 2 κ, −σ, 1 + κ2 + σ 2 κ2 , −σκ]0 , and the corresponding elements of the covariance matrix are used. In the cases of weak identification (designs 3 and 4), f (σ, κ) = [κ + σ 2 κ, 1 + κ2 + σ 2 κ2 , −σκ]0 and the corresponding elements of the covariance matrix are used instead. In the cases of partial identification (designs 5 and 6), κ = (1 − α)(1 − 0.99α)ζ/α. Model A is correctly specified while Model B is misspecified in designs 1, 3 and 5. Models A and B are both correctly specified and Model A is more parsimonious than Model B in designs 2, 4 and 6.

53

Table 2: Frequencies of selecting model A when the estimation criterion function alone is used Design

T

Diagonal

Optimal

1

50

1.000

1.000

100

1.000

1.000

200

1.000

1.000

50

0.000

0.003

100

0.000

0.000

200

0.000

0.000

50

0.989

0.984

100

0.994

0.988

200

0.996

0.988

50

0.021

0.036

100

0.009

0.017

200

0.009

0.014

50

0.803

0.756

100

0.883

0.851

200

0.927

0.914

50

0.097

0.107

100

0.094

0.115

200

0.077

0.121

2

3

4

5

6

Notes: See table 1 for descriptions of the designs. T denotes the sample size. “Optimal” refers to cases in which the weighting matrix is set to the inverse of the bootstrap covariance matrix of impulse responses. “Diagonal” refers to cases in which the weighting matrix is diagonal and their diagonal elements are the reciprocals of the bootstrap variances of impulse responses.

54

Table 3: Frequencies of selecting model A based on the quasi-marginal likelihood Marginal Likelihood Based

Modified ML Based

on the RWMH Algorithm Weight T

Matrix

Laplace HP

Approx.

Geweke τ = 0.5

τ = 0.9

SWZ q = 0.5

on the RWMH Algorithm CJ

q = 0.9

Geweke

SWZ

CJ

τ = 0.5

τ = 0.9

q = 0.5

q = 0.9

Design 1 50

Diag

0.994

0.994

0.994

0.994

0.994

0.994

0.840

0.840

0.838

0.838

0.841

0.997

0.998

0.998

0.998

0.998

0.998

0.908

0.910

0.907

0.908

0.909

1.000

1.000

1.000

1.000

1.000

1.000

0.855

0.859

0.855

0.860

0.858

Opt

1.000

1.000

1.000

1.000

1.000

1.000

0.948

0.946

0.944

0.945

0.946

Diag

1.000

1.000

1.000

1.000

1.000

1.000

0.946

0.950

0.947

0.950

0.948

1.000

1.000

1.000

1.000

1.000

1.000

0.986

0.985

0.986

0.985

0.986

0.575 Opt 100

Diag 0.672

200

0.764 Opt

Design 2 50

Diag

0.839

0.835

0.836

0.835

0.839

0.835

0.996

0.996

0.996

0.996

0.996

Opt

0.837

0.833

0.833

0.833

0.833

0.831

0.992

0.992

0.992

0.992

0.992

Diag

0.925

0.922

0.924

0.921

0.925

0.923

1.000

1.000

1.000

1.000

1.000

Opt

0.911

0.909

0.905

0.909

0.906

0.906

1.000

1.000

1.000

1.000

1.000

Diag

0.943

0.943

0.943

0.944

0.943

0.943

1.000

1.000

1.000

1.000

1.000

0.938

0.938

0.938

0.939

0.938

0.937

1.000

1.000

1.000

1.000

1.000

0.865 100

0.918 200

0.951 Opt

Design 3 50

Diag

0.956

0.968

0.968

0.966

0.965

0.965

0.983

0.980

0.981

0.981

0.980

Opt

0.967

0.973

0.974

0.974

0.972

0.971

0.972

0.968

0.969

0.971

0.969

Diag

0.979

0.980

0.978

0.980

0.979

0.980

0.964

0.959

0.962

0.962

0.963

Opt

0.993

0.989

0.989

0.989

0.989

0.989

0.970

0.970

0.972

0.969

0.970

Diag

0.994

0.993

0.993

0.992

0.992

0.992

0.937

0.937

0.937

0.937

0.936

0.996

0.995

0.995

0.994

0.994

0.994

0.972

0.972

0.972

0.972

0.973

0.582 100

0.744 200

0.829 Opt

Notes: “HP”, “Geweke”, “SWZ” and “CJ” refer to Hong and Preston (2012), Geweke’s (1998) modified harmonic estimator, Sims, Waggoner and Zha’s (2008) estimator and the estimator of Chib and Jeliazkov (2001), respectively. τ and q are the truncation parameters for Geweke’s (1998) and Sims et al.’s (2008) methods. See table 1 for descriptions of the designs. “Diag” refers to cases in which the weighting matrix is diagonal and their diagonal elements are the reciprocals of the bootstrap variances of impulse responses. “Opt” refers to cases in which the weighting matrix is set to the inverse of the bootstrap covariance matrix of impulse responses. The numbers in the table are the actual probabilities of selecting Model A over Model B over 1,000 Monte Carlo iterations.

55

Table 4: Frequencies of selecting model A based on the quasi-marginal likelihood

Weight T

Matrix

Laplace HP

Approx.

Marginal Likelihood Based

Modified ML Based

on the RWMH Algorithm

on the RWMH Algorithm

Geweke τ = 0.5

τ = 0.9

SWZ q = 0.5

CJ

q = 0.9

Geweke

SWZ

CJ

τ = 0.5

τ = 0.9

q = 0.5

q = 0.9

Design 4 50

Diag

0.789

0.751

0.715

0.796

0.812

0.807

0.734

0.677

0.794

0.831

0.820

0.755

0.705

0.676

0.727

0.733

0.727

0.813

0.765

0.840

0.857

0.850

0.880

0.873

0.861

0.887

0.889

0.888

0.948

0.923

0.977

0.988

0.987

0.829

0.802

0.800

0.806

0.812

0.810

0.976

0.967

0.986

0.993

0.989

0.959

0.962

0.962

0.963

0.963

0.962

0.996

0.995

0.998

0.999

1.000

0.909

0.901

0.903

0.903

0.905

0.905

0.999

0.999

1.000

1.000

1.000

0.929 Opt 100

Diag 0.948 Opt

200

Diag 0.974 Opt

Design 5 50

0.776

0.762

0.761

0.789

0.790

0.773

0.594

0.589

0.587

0.589

0.589

0.639

0.657

0.656

0.661

0.661

0.658

0.465

0.471

0.454

0.458

0.466

0.871

0.890

0.889

0.906

0.905

0.895

0.654

0.655

0.661

0.664

0.657

Opt

0.785

0.780

0.775

0.801

0.797

0.782

0.468

0.462

0.461

0.453

0.460

Diag

0.924

0.969

0.970

0.972

0.972

0.961

0.707

0.712

0.715

0.721

0.711

0.892

0.932

0.931

0.942

0.941

0.924

0.537

0.539

0.548

0.541

0.545

Diag NA Opt

100

Diag NA

200

NA Opt

Design 6 50

0.558

0.582

0.586

0.574

0.574

0.580

0.663

0.674

0.672

0.669

0.675

0.556

0.560

0.566

0.567

0.561

0.561

0.589

0.588

0.584

0.584

0.583

0.601

0.634

0.641

0.636

0.640

0.641

0.718

0.726

0.724

0.730

0.724

Opt

0.616

0.591

0.595

0.589

0.594

0.597

0.631

0.633

0.631

0.632

0.634

Diag

0.669

0.663

0.664

0.663

0.667

0.663

0.726

0.730

0.728

0.733

0.729

0.687

0.681

0.686

0.690

0.688

0.687

0.687

0.691

0.694

0.694

0.687

Diag NA Opt

100

Diag NA

200

NA Opt

Notes: See the notes for Table 3.

56

Table 5: The mean and standard deviation of the quasi-marginal likelihood estimates Geweke τ = 0.5

SWZ τ = 0.9

q = 0.5

T

mean

SD

mean

SD

mean

50

-3.221

[0.016]

-3.219

[0.007]

100

-5.221

[0.017]

-5.221

200

-3.829

[0.017]

-3.830

CJ q = 0.9

SD

mean

SD

mean

SD

-3.220

[0.021]

-3.220

[0.010]

-3.219

[0.011]

[0.007]

-5.221

[0.019]

-5.222

[0.011]

-5.219

[0.011]

[0.005]

-3.830

[0.023]

-3.831

[0.009]

-3.830

[0.011]

Case 1

Case 2 50

-1.655

[0.016]

-1.653

[0.005]

-1.653

[0.019]

-1.652

[0.007]

-1.653

[0.007]

100

-3.986

[0.014]

-3.987

[0.005]

-3.988

[0.018]

-3.987

[0.008]

-3.986

[0.007]

200

-1.654

[0.017]

-1.653

[0.004]

-1.656

[0.021]

-1.653

[0.007]

-1.653

[0.006]

Case 3 50

-1.996

[0.019]

-1.993

[0.029]

-1.999

[0.020]

-1.996

[0.012]

-1.996

[0.013]

100

-2.705

[0.016]

-2.704

[0.006]

-2.706

[0.021]

-2.704

[0.009]

-2.703

[0.012]

200

-2.582

[0.017]

-2.583

[0.006]

-2.583

[0.021]

-2.584

[0.009]

-2.582

[0.012]

50

-1.078

[0.016]

-1.080

[0.005]

-1.079

[0.020]

-1.080

[0.007]

-1.079

[0.007]

100

-1.522

[0.016]

-1.521

[0.005]

-1.522

[0.019]

-1.521

[0.006]

-1.521

[0.007]

200

-1.194

[0.016]

-1.191

[0.005]

-1.197

[0.017]

-1.190

[0.007]

-1.192

[0.006]

Case 4

Case 5 50

-6.330

[0.437]

-6.230

[0.394]

-6.091

[0.383]

-6.112

[0.368]

-6.178

[0.393]

100

-8.253

[0.409]

-8.155

[0.374]

-8.020

[0.370]

-8.044

[0.358]

-8.102

[0.373]

200

-8.405

[1.114]

-8.348

[1.118]

-8.137

[1.070]

-8.187

[1.085]

-8.243

[1.087]

50

-6.330

[0.437]

-6.230

[0.394]

-6.091

[0.383]

-6.112

[0.368]

-6.178

[0.393]

100

-8.253

[0.409]

-8.155

[0.374]

-8.020

[0.370]

-8.044

[0.358]

-8.102

[0.373]

200

-8.405

[1.114]

-8.348

[1.118]

-8.137

[1.070]

-8.187

[1.085]

-8.243

[1.087]

Case 6

Notes: The means and standard deviations in each row are calculated from 100 quasi-marginal likelihood estimates given a realization of data. See the notes for Table 3.

57

58

8

200

0.936

0.922

Opt

0.894

Opt

Diag

0.972

0.672

Opt

Diag

0.990

0.884

Opt

Diag

0.970

0.894

Opt

Diag

0.982

0.730

Opt

Diag

0.992

0.534

Opt

Diag

0.854

0.826

Opt

Diag

0.968

0.744

Opt

Diag

0.988

Diag

Matrix

qbT

0.974

0.024

0.990

0.194

0.988

0.706

0.900

0.046

0.972

0.196

0.956

0.526

0.214

0.046

0.708

0.110

0.810

0.266

Approx.

Laplace

0.982

0.624

0.992

0.866

0.990

0.998

0.912

0.534

0.976

0.762

0.964

0.928

0.272

0.334

0.772

0.506

0.884

0.754

τ = 0.5

0.980

0.642

0.992

0.878

0.990

0.996

0.908

0.524

0.976

0.772

0.964

0.924

0.268

0.334

0.774

0.512

0.886

0.756

τ = 0.9

Geweke

0.978

0.808

0.992

0.958

0.986

1.000

0.910

0.642

0.976

0.830

0.968

0.934

0.296

0.384

0.778

0.572

0.896

0.784

q = 0.5

0.976

0.776

0.992

0.948

0.988

1.000

0.910

0.630

0.974

0.818

0.966

0.934

0.282

0.376

0.776

0.556

0.892

0.770

q = 0.9

SWZ

0.892

0.700

0.896

0.908

0.930

0.956

0.866

0.614

0.938

0.792

0.928

0.896

0.404

0.402

0.784

0.538

0.858

0.764

CJ

0.962

0.528

0.984

0.620

0.930

0.732

0.760

0.422

0.934

0.552

0.898

0.678

0.120

0.246

0.522

0.336

0.636

0.408

τ = 0.5

0.962

0.570

0.984

0.654

0.932

0.752

0.758

0.418

0.930

0.548

0.896

0.682

0.116

0.256

0.502

0.336

0.628

0.398

τ = 0.9

Geweke

0.964

0.712

0.986

0.840

0.934

0.944

0.836

0.510

0.958

0.666

0.944

0.792

0.202

0.314

0.614

0.400

0.738

0.498

q = 0.5

0.964

0.702

0.986

0.834

0.936

0.928

0.816

0.496

0.954

0.640

0.934

0.760

0.186

0.298

0.586

0.362

0.716

0.444

q = 0.9

SWZ

on the RWMH Algorithm

on the RWMH Algorithm

0.880

0.656

0.890

0.766

0.892

0.856

0.760

0.492

0.912

0.618

0.890

0.716

0.288

0.340

0.588

0.366

0.690

0.450

CJ

1,000 Monte Carlo iterations.

for Geweke’s (1998) and Sims et al.’s (2008) methods. The numbers in the table are the actual probabilities of selecting Model A over Model B over

estimator, Sims, Waggoner and Zha’s (2008) estimator and the estimator of Chib and Jeliazkov (2001), respectively. τ and q are the tuning parameters

criterion function is smaller. “Laplace Approx.”, “Geweke”, “SWZ” and “CJ” refer to Laplace approximations, Geweke’s (1998) modified harmonic

matrix is set to the inverse of the bootstrap covariance matrix of impulse responses. qbT refers to the method that chooses the model whose estimation

diagonal and their diagonal elements are the reciprocals of the bootstrap variances of impulse responses. “Opt” refers to cases in which the weighting

Notes: T denotes the sample size and H denotes the maximum horizon for impulse responses. “Diag” refers to cases in which the weighting matrix is

4

4

100

200

2

100

2

8

50

200

4

50

8

2

50

100

H

T

Weight

Modified ML Based

Marginal Likelihood

Table 6: Model A is correctly specified and Model B is misspecified (all impulse responses are used)

59

2

4

8

2

4

8

2

4

8

50

50

50

100

100

100

200

200

200

0.998 0.494

0.036

Opt

0.768

0.210

0.002

Opt

0.998

0.788

0.988

0.464

0.994

0.752

0.990

0.836

0.998

0.556

0.984

0.666

0.996

0.852

0.996

Approx.

Laplace

Diag

0.034

0.004

Opt

Diag

0.000

0.062

Opt

Diag

0.056

0.016

Opt

Diag

0.010

0.006

Opt

Diag

0.000

0.394

Opt

Diag

0.080

0.068

Opt

Diag

0.008

0.032

Opt

Diag

0.000

Diag

Matrix

qbT

Notes: See the notes for Table 6.

H

T

Weight

0.362

0.730

0.588

0.814

0.638

0.830

0.314

0.672

0.592

0.726

0.680

0.752

0.544

0.548

0.490

0.576

0.640

0.662

τ = 0.5

0.360

0.728

0.580

0.820

0.634

0.842

0.310

0.668

0.582

0.736

0.680

0.774

0.542

0.546

0.506

0.582

0.640

0.686

τ = 0.9

0.348

0.660

0.582

0.766

0.646

0.790

0.292

0.596

0.578

0.658

0.666

0.712

0.456

0.452

0.484

0.522

0.632

0.594

q = 0.5

0.346

0.678

0.576

0.794

0.638

0.820

0.302

0.626

0.580

0.698

0.664

0.752

0.486

0.482

0.484

0.548

0.632

0.652

q = 0.9

SWZ

0.346

0.690

0.588

0.796

0.640

0.828

0.316

0.634

0.568

0.710

0.670

0.740

0.530

0.524

0.504

0.574

0.638

0.666

CJ

0.988

0.978

0.996

0.994

0.998

1.000

0.950

0.958

0.990

0.988

0.990

0.998

0.788

0.886

0.916

0.956

0.976

0.988

τ = 0.5

0.988

0.978

0.996

0.994

0.998

1.000

0.950

0.964

0.990

0.988

0.990

0.996

0.794

0.882

0.912

0.968

0.980

0.990

τ = 0.9

Geweke

0.988

0.926

0.994

0.982

0.996

0.996

0.946

0.890

0.990

0.966

0.988

0.998

0.650

0.768

0.866

0.896

0.966

0.974

q = 0.5

0.988

0.938

0.996

0.986

0.996

0.994

0.948

0.904

0.990

0.968

0.988

0.998

0.662

0.778

0.872

0.914

0.964

0.980

q = 0.9

SWZ

on the RWMH Algorithm

on the RWMH Algorithm Geweke

Modified ML Based

Marginal Likelihood

Table 7: Model A is more parsimonious than Model B (all impulse responses are used)

0.982

0.926

0.992

0.976

0.986

0.996

0.950

0.886

0.980

0.968

0.982

0.998

0.680

0.772

0.896

0.904

0.952

0.978

CJ

60

2

4

8

2

4

8

2

4

8

50

50

50

100

100

100

200

200

200

0.804 1.000

0.982

Opt

0.998

0.992

0.960

Opt

0.920

0.996

0.956

0.982

0.576

1.000

0.834

1.000

0.908

0.592

0.266

0.924

0.602

0.974

0.814

Approx.

Laplace

Diag

1.000

0.964

Opt

Diag

1.000

0.920

Opt

Diag

0.962

0.962

Opt

Diag

0.990

0.968

Opt

Diag

0.994

0.758

Opt

Diag

0.832

0.932

Opt

Diag

0.940

0.978

Opt

Diag

0.984

Diag

Matrix

qbT

Notes: See the notes for Table 6.

H

T

Weight

0.998

0.958

0.992

0.992

0.974

0.998

0.986

0.824

0.998

0.946

0.988

0.992

0.648

0.476

0.946

0.782

0.982

0.946

τ = 0.5

1.000

0.958

1.000

0.992

0.998

0.998

0.986

0.826

1.000

0.948

1.000

0.992

0.650

0.482

0.944

0.782

0.982

0.946

τ = 0.9

0.988

0.968

0.948

0.992

0.836

0.998

0.988

0.854

0.994

0.960

0.928

0.994

0.680

0.516

0.946

0.816

0.972

0.958

q = 0.5

0.994

0.966

0.984

0.992

0.938

0.998

0.990

0.838

0.998

0.958

0.978

0.994

0.672

0.512

0.948

0.798

0.982

0.952

q = 0.9

SWZ

0.928

0.936

0.926

0.960

0.942

0.974

0.924

0.826

0.946

0.946

0.948

0.968

0.710

0.530

0.898

0.804

0.962

0.928

CJ

0.996

0.876

0.990

0.966

0.972

0.928

0.958

0.716

0.998

0.864

0.988

0.902

0.466

0.386

0.842

0.628

0.926

0.784

τ = 0.5

0.998

0.884

0.998

0.962

0.996

0.924

0.962

0.730

1.000

0.862

1.000

0.906

0.466

0.384

0.838

0.632

0.930

0.782

τ = 0.9

Geweke

0.988

0.932

0.946

0.972

0.836

0.948

0.974

0.770

0.994

0.896

0.928

0.940

0.566

0.422

0.898

0.676

0.924

0.828

q = 0.5

0.992

0.930

0.982

0.968

0.938

0.940

0.978

0.762

0.998

0.888

0.978

0.922

0.544

0.418

0.892

0.650

0.934

0.806

q = 0.9

SWZ

on the RWMH Algorithm

on the RWMH Algorithm Geweke

Modified ML Based

Marginal Likelihood

0.926

0.892

0.926

0.942

0.942

0.916

0.914

0.742

0.946

0.878

0.950

0.902

0.584

0.428

0.834

0.654

0.914

0.784

CJ

Table 8: Model A is correctly specified and Model B is misspecified (only impulse responses to the technology shock are used)

61

2

4

8

2

4

8

2

4

8

50

50

50

100

100

100

200

200

200

1.000 0.994

0.004

Opt

1.000

0.006

0.002

Opt

1.000

0.998

1.000

0.988

1.000

0.988

1.000

0.998

1.000

0.866

0.982

0.984

0.996

0.984

0.998

Approx.

Laplace

Diag

0.000

0.000

Opt

Diag

0.000

0.018

Opt

Diag

0.048

0.000

Opt

Diag

0.006

0.002

Opt

Diag

0.000

0.100

Opt

Diag

0.130

0.008

Opt

Diag

0.040

0.004

Opt

Diag

0.016

Diag

Matrix

qbT

Notes: See the notes for Table 6.

H

T

Weight

0.912

0.802

0.900

0.866

0.938

0.912

0.880

0.688

0.888

0.828

0.922

0.866

0.780

0.576

0.846

0.706

0.836

0.754

τ = 0.5

0.924

0.802

0.916

0.852

0.946

0.898

0.878

0.716

0.882

0.820

0.914

0.868

0.776

0.582

0.846

0.706

0.848

0.758

τ = 0.9

0.918

0.768

0.898

0.854

0.930

0.890

0.860

0.666

0.874

0.788

0.912

0.814

0.734

0.524

0.826

0.644

0.818

0.706

q = 0.5

0.920

0.768

0.908

0.856

0.946

0.894

0.888

0.696

0.884

0.810

0.916

0.854

0.752

0.564

0.842

0.684

0.834

0.736

q = 0.9

SWZ

0.916

0.792

0.904

0.876

0.938

0.898

0.870

0.704

0.884

0.818

0.922

0.862

0.732

0.568

0.836

0.682

0.834

0.726

CJ

1.000

1.000

1.000

1.000

1.000

1.000

0.998

0.984

0.998

0.990

1.000

1.000

0.948

0.874

0.990

0.980

0.994

0.998

τ = 0.5

1.000

1.000

1.000

1.000

1.000

1.000

0.998

0.986

0.998

0.992

1.000

1.000

0.960

0.862

0.990

0.978

0.994

0.998

τ = 0.9

Geweke

1.000

0.996

1.000

0.996

1.000

1.000

0.998

0.964

0.998

0.988

1.000

1.000

0.924

0.788

0.986

0.966

0.992

0.996

q = 0.5

1.000

0.996

1.000

0.998

1.000

1.000

0.998

0.964

0.998

0.988

1.000

1.000

0.932

0.792

0.986

0.968

0.992

0.996

q = 0.9

SWZ

on the RWMH Algorithm

on the RWMH Algorithm Geweke

Modified ML Based

Marginal Likelihood

1.000

0.998

1.000

0.994

1.000

1.000

0.988

0.976

1.000

0.986

1.000

1.000

0.900

0.812

0.982

0.962

0.988

0.998

CJ

Table 9: Model A is more parsimonious than Model B (only impulse responses to the technology shock are used)

Table 10: The mean and standard deviation of the quasi-marginal likelihood estimates Geweke τ = 0.5

SWZ τ = 0.9

q = 0.5

T

H

mean

SD

mean

SD

50

2

-50.209

[ 0.532]

-50.212

[ 0.518]

4

-35.448

[ 0.472]

-35.420

8

-123.333

[ 5.378]

2

-43.147

[ 0.586]

4

-31.198

8

-53.739

2

mean

CJ q = 0.9

SD

mean

SD

mean

SD

-49.630

[ 0.466]

-49.976

[ 0.494]

-49.964

[ 0.473]

[ 0.453]

-34.957

[ 0.436]

-35.269

[ 0.428]

-35.277

[ 0.429]

-124.276

[ 5.204]

-119.427

[ 5.515]

-123.716

[ 5.109]

-121.716

[ 4.897]

-43.105

[ 0.588]

-42.623

[ 0.513]

-42.957

[ 0.565]

-42.957

[ 0.508]

[ 0.504]

-31.226

[ 0.511]

-30.676

[ 0.406]

-31.091

[ 0.457]

-31.005

[ 0.410]

[ 0.350]

-53.742

[ 0.320]

-53.272

[ 0.298]

-53.622

[ 0.279]

-53.597

[ 0.298]

-31.851

[ 0.532]

-31.843

[ 0.519]

-31.389

[ 0.473]

-31.722

[ 0.489]

-31.708

[ 0.471]

4

-29.194

[ 0.643]

-29.200

[ 0.703]

-28.663

[ 0.450]

-29.058

[ 0.628]

-28.986

[ 0.459]

8

-56.144

[ 0.437]

-56.177

[ 0.455]

-55.639

[ 0.323]

-56.041

[ 0.396]

-55.954

[ 0.323]

2

-32.385

[ 0.216]

-32.424

[ 0.203]

-32.037

[ 0.165]

-32.354

[ 0.152]

-32.319

[ 0.150]

4

-63.123

[ 0.615]

-62.935

[ 0.562]

-62.691

[ 0.578]

-62.836

[ 0.543]

-62.876

[ 0.564]

8

-132.146

[ 1.019]

-132.178

[ 1.181]

-134.631

[ 0.510]

-134.604

[ 0.464]

-133.290

[ 1.383]

2

-83.467

[ 0.722]

-83.478

[ 0.743]

-82.953

[ 0.573]

-83.315

[ 0.700]

-83.263

[ 0.619]

4

-30.451

[ 0.561]

-30.478

[ 0.556]

-29.998

[ 0.393]

-30.360

[ 0.501]

-30.286

[ 0.436]

8

-55.581

[ 0.308]

-55.661

[ 0.275]

-55.197

[ 0.227]

-55.567

[ 0.218]

-55.489

[ 0.222]

Case 1

100

200

Case 2 50

100

200

2

-26.058

[ 0.508]

-26.063

[ 0.495]

-25.621

[ 0.425]

-25.933

[ 0.464]

-25.904

[ 0.442]

4

-27.859

[ 0.494]

-27.890

[ 0.487]

-27.451

[ 0.352]

-27.787

[ 0.434]

-27.725

[ 0.378]

8

-38.355

[ 0.523]

-38.300

[ 0.506]

-37.895

[ 0.414]

-38.151

[ 0.471]

-38.174

[ 0.440]

2

-19.206

[ 0.491]

-19.173

[ 0.496]

-18.644

[ 0.441]

-18.943

[ 0.452]

-18.983

[ 0.442]

4

-21.635

[ 0.586]

-21.656

[ 0.617]

-21.047

[ 0.501]

-21.451

[ 0.552]

-21.383

[ 0.500]

8

-25.976

[ 0.892]

-25.974

[ 0.928]

-25.551

[ 1.523]

-25.779

[ 1.334]

-25.644

[ 0.900]

2

-24.029

[ 0.775]

-23.970

[ 0.680]

-23.494

[ 0.875]

-23.784

[ 0.675]

-23.860

[ 0.720]

4

-22.572

[ 0.509]

-22.538

[ 0.487]

-22.056

[ 0.451]

-22.361

[ 0.443]

-22.377

[ 0.448]

8

-24.897

[ 0.525]

-24.853

[ 0.508]

-24.369

[ 0.466]

-24.669

[ 0.477]

-24.690

[ 0.463]

2

-21.409

[ 0.498]

-21.401

[ 0.487]

-20.926

[ 0.465]

-21.196

[ 0.444]

-21.235

[ 0.470]

4

-23.663

[ 0.450]

-23.628

[ 0.451]

-23.145

[ 0.414]

-23.432

[ 0.411]

-23.454

[ 0.399]

Case 3 50

100

200

-28.269 [ 0.538] -28.241 [ 0.518] -27.744 [ 0.491] -28.036 [ 0.489] -28.059 [ 0.493] 8 Notes: The means and standard deviations (SD) in each row are calculated from 100 quasi-marginal likelihood estimates given a realization of data. See the notes to table 6.

62

Table 11: The mean and standard deviation of the quasi-marginal likelihood estimates Geweke τ = 0.5

SWZ τ = 0.9

q = 0.5

T

H

mean

SD

mean

SD

50

2

-17.798

[ 0.619]

-17.776

[ 0.577]

4

-26.497

[ 0.663]

-26.541

8

-26.345

[ 0.759]

2

-16.447

[ 0.587]

4

-17.181

8

-23.821

2

mean

CJ q = 0.9

SD

mean

SD

mean

SD

-17.393

[ 0.541]

-17.608

[ 0.550]

-17.635

[ 0.560]

[ 0.668]

-25.987

[ 0.495]

-26.379

[ 0.629]

-26.290

[ 0.564]

-26.409

[ 0.848]

-25.828

[ 0.616]

-26.215

[ 0.779]

-26.133

[ 0.662]

-16.438

[ 0.530]

-15.994

[ 0.507]

-16.276

[ 0.499]

-16.265

[ 0.519]

[ 0.579]

-17.168

[ 0.565]

-16.700

[ 0.506]

-17.004

[ 0.530]

-16.986

[ 0.508]

[ 0.519]

-23.810

[ 0.495]

-23.392

[ 0.428]

-23.655

[ 0.460]

-23.645

[ 0.454]

-16.835

[ 0.589]

-16.826

[ 0.572]

-16.414

[ 0.524]

-16.670

[ 0.553]

-16.673

[ 0.541]

4

-24.165

[ 0.490]

-24.135

[ 0.497]

-23.663

[ 0.422]

-23.975

[ 0.484]

-23.963

[ 0.438]

8

-22.412

[ 0.668]

-22.371

[ 0.648]

-21.946

[ 0.519]

-22.206

[ 0.604]

-22.226

[ 0.578]

2

-9.559

[ 0.565]

-9.433

[ 0.571]

-8.845

[ 0.547]

-9.118

[ 0.541]

-9.187

[ 0.530]

4

-33.037

[ 0.842]

-32.988

[ 0.834]

-32.372

[ 0.607]

-32.741

[ 0.744]

-32.781

[ 0.746]

8

-37.735

[ 0.664]

-37.657

[ 0.653]

-37.065

[ 0.584]

-37.338

[ 0.592]

-37.394

[ 0.597]

2

-19.411

[ 0.593]

-19.288

[ 0.594]

-18.758

[ 0.549]

-18.985

[ 0.562]

-19.085

[ 0.550]

4

-13.629

[ 0.636]

-13.480

[ 0.595]

-12.955

[ 0.613]

-13.186

[ 0.573]

-13.301

[ 0.601]

8

-19.983

[ 0.909]

-19.895

[ 0.912]

-19.305

[ 0.774]

-19.625

[ 0.873]

-19.688

[ 0.854]

Case 4

100

200

Case 5 50

100

200

2

-13.299

[ 0.599]

-13.138

[ 0.569]

-12.834

[ 0.563]

-12.986

[ 0.557]

-13.124

[ 0.562]

4

-14.206

[ 0.652]

-14.077

[ 0.621]

-13.686

[ 0.612]

-13.894

[ 0.600]

-14.001

[ 0.604]

8

-20.174

[ 0.663]

-20.013

[ 0.626]

-19.691

[ 0.597]

-19.844

[ 0.596]

-19.980

[ 0.613]

2

-8.566

[ 0.686]

-8.424

[ 0.649]

-7.819

[ 0.521]

-8.128

[ 0.601]

-8.188

[ 0.543]

4

-12.905

[ 0.691]

-12.807

[ 0.677]

-12.243

[ 0.657]

-12.490

[ 0.644]

-12.569

[ 0.635]

8

-47.165

[ 0.693]

-47.095

[ 0.722]

-46.521

[ 0.633]

-46.785

[ 0.656]

-46.846

[ 0.645]

2

-14.806

[ 0.960]

-14.656

[ 0.948]

-14.320

[ 1.205]

-14.408

[ 1.090]

-14.539

[ 0.937]

4

-12.974

[ 0.577]

-12.735

[ 0.575]

-12.247

[ 0.524]

-12.455

[ 0.552]

-12.573

[ 0.527]

8

-17.987

[ 0.634]

-17.785

[ 0.609]

-17.227

[ 0.546]

-17.483

[ 0.558]

-17.588

[ 0.549]

2

-19.049

[ 0.668]

-18.841

[ 0.640]

-18.357

[ 0.582]

-18.565

[ 0.611]

-18.695

[ 0.615]

4

-14.306

[ 0.697]

-14.101

[ 0.667]

-13.600

[ 0.597]

-13.839

[ 0.613]

-13.939

[ 0.635]

Case 6 50

100

200

-22.451 [ 0.598] -22.326 [ 0.618] -21.641 [ 0.522] -21.960 [ 0.504] -22.030 [ 0.488] 8 Notes: The means and standard deviations (SD) in each row are calculated from 100 quasi-marginal likelihood estimates given a realization of data. See the notes to table 6.

63

Table 12: Prior and posteriors of parameters of hybrid NKPCs

Prior

Quasi-posterior Great Inflation

Parameter

Dist

Mean

Std

Post Great Inflation

Mean

[5%, 95%]

Mean

[5%, 95%]

(a) ROT specification (Gal´ı and Gertler, 1999) Trend inflation

π

Norm

3.50

1.50

5.80

[5.11, 6.49]

2.32

[1.94, 2.70]

Price stickiness

ξp

Beta

0.50

0.10

0.69

[0.61, 0.77]

0.87

[0.82, 0.90]

ROT fraction

ω

Beta

0.50

0.10

0.49

[0.45, 0.65]

0.51

[0.37, 0.67]

Backward-looking

γb,A

-

0.50

-

0.44

[0.38, 0.50]

0.37

[0.29, 0.44]

Forward-looking

γf,A

-

0.50

-

0.55

[0.49, 0.61]

0.62

[0.55, 0.70]

Slope of NKPC

κA

-

0.14

-

0.034

[0.017, 0.054]

0.006

[0.003, 0.009]

(b) Partial indexation specification (Smets and Wouters ,2003, 2007) Trend inflation

π

Norm

3.50

1.50

5.77

[5.08, 6.45]

2.34

[1.96, 2.71]

Price stickiness

ξp

Beta

0.50

0.10

0.79

[0.74, 0.84]

0.91

[0.88, 0.93]

Price indexation

ιp

Beta

0.50

0.10

0.62

[0.51, 0.73]

0.47

[0.33, 0.61]

Backward-looking

γb,B

-

0.50

-

0.38

[0.34, 0.42]

0.31

[0.25, 0.38]

Forward-looking

γf,B

-

0.50

-

0.61

[0.57, 0.65]

0.67

[0.61, 0.74]

Slope of NKPC

κB

-

0.40

-

0.035

[0.019, 0.053]

0.006

[0.003, 0.010]

Note: Quasi-posterior distribution is evaluated using the random walk Metropolis-Hastings algorithm.

64

Table 13: Quasi-marginal likelihood of hybrid NKPCs

Laplace approx.

Geweke

SWZ

CJ

τ = 0.5 τ = 0.9 q = 0.5 q = 0.9

Great Inflation Period (a) ROT specification

-11.4

-11.1

-11.1

-8.24

-9.01

-13.0

(b) Partial indexation specification

-15.6

-15.3

-15.3

-11.8

-13.0

-17.2

(a) ROT specification

-22.3

-20.9

-20.9

-17.6

-18.9

-23.9

(b) Partial indexation specification

-25.5

-24.2

-24.2

-20.8

-22.1

-27.1

Post Great Inflation Period

Note: “Laplace Approx.”, “Geweke”, “SWZ” and “CJ” refer to Laplace approximations, Geweke’s (1998) modified harmonic estimator, Sims, Waggoner and Zha’s (2008) estimator and the estimator of Chib and Jeliazkov (2001), respectively.

65

Table 14: Prior and posteriors of parameters of the baseline CEE model

Prior Parameter

Quasi-posterior

Dist.

Mean

Std

Mean

[5%, 95%]

Price-setting rule Price stickiness

ξp

Beta

0.50

0.15

0.66

[0.60, 0.72]

Price indexation

ιp

Beta

0.50

0.15

0.49

[0.32, 0.72]

Wage stickiness

ξw

Beta

0.50

0.15

0.85

[0.83, 0.87]

Wage indexation

ιw

Beta

0.50

0.15

0.30

[0.11, 0.46]

Interest smoothing

ρR

Beta

0.70

0.15

0.89

[0.88, 0.91]

Inflation coefficient



Gamma

1.70

0.15

1.51

[1.37, 1.65]

GDP coefficient

ry

Gamma

0.10

0.05

0.15

[0.10, 0.19]

Consumption habit

b

Beta

0.50

0.15

0.75

[0.73, 0.78]

Inverse labor supply elast

φ

Gamma

1.00

0.50

0.14

[0.04, 0.25]

Capital share

α

Beta

0.25

0.05

0.25

[0.22, 0.28]

Cap util adjustment cost

σa

Gamma

0.50

0.30

0.32

[0.23, 0.46]

Investment adjustment cost S 00

Gamma

8.00

2.00

10.4

[8.30, 12.9]

Monetary policy rule

Preference and technology

Shocks Autocorr invest tech

ρΨ

Beta

0.75

0.15

0.55

[0.42, 0.61]

Std dev neutral tech shock

σZ

InvGamma

0.20

0.10

0.23

[0.21, 0.26]

Std dev invest tech shock

σΨ

InvGamma

0.20

0.10

0.17

[0.15, 0.20]

Std dev monetary shock

σR

InvGamma

0.40

0.20

0.48

[0.43, 0.54]

Note: Quasi-posterior distribution is evaluated using the random walk MetropolisHastings algorithm.

66

Table 15: Empirical importance of the nominal and real frictions

Base

Nominal frictions

Real frictions

ξp =0.1 ξw =0.1 ιp =0.01 ιw =0.01

S 00 =2 b=0.1 σa =0.1

Quasi-marginal likelihood Laplace

370

341

146

369

373

327

279

366

Geweke

366

340

143

368

371

326

276

364

Quasi-posterior mean ξp

0.66

0.10

0.95

0.68

0.67

0.74

0.68

0.66

ιp

0.49

0.53

0.69

0.01

0.51

0.48

0.52

0.52

ξw

0.85

0.88

0.10

0.85

0.87

0.80

0.86

0.85

ιw

0.30

0.32

0.53

0.34

0.01

0.43

0.37

0.29

S 00

10.4

10.3

2.74

9.37

9.23

2.00

8.07

9.81

b

0.75

0.74

0.53

0.76

0.75

0.69

0.10

0.75

σa

0.32

0.44

0.62

0.35

0.32

0.39

0.26

0.10

SW

-923

-975

-973

-918

-927

-1084

-959

-949

Note: QMLs based on the Laplace approximation (Laplace approx.) and the modified harmonic mean estimator of Geweke. SW denotes marginal likelihood estimates from Smets and Wouters (2007).

67

Figure 1: Posterior distribution of hybrid NKPCs (a) ROT specification ξp

π 2

20

1.5

15

1

10

0.5

5

0

0

2

4

6

0

8

0

0.2

0.4

0.6

0.8

1

0.06

0.08

0.1

κA

ω 8

300

6

200

4 100

2 0

0

0.2

0.4

0.6

0.8

0

1

0

0.02

0.04

γb,A

γf,A

15

15

10

10

5

5

0

0

0.2

0.4

0.6

Great Inflation: Post-Great Inflation

0 0.2

0.8

0.4

0.6

0.8

1

(b) Partial indexation specification ξp

π 2

30

1.5

20

1 10

0.5 0

0

2

4

6

0

8

0

0.2

0.4

ιp 300

4

200

2

100

0

0.2

0.4

0.6

0.8

0

1

0

0.02

γb,B 20

15

15

10

10

5

5 0.2

0.3

1

0.04

0.06

0.08

0.1

γf,B

20

0 0.1

0.8

κB

6

0

0.6

0.4

0 0.5

0.5

68

Great Inflation: Post-Great Inflation

0.6

0.7

0.8

0.9

Figure 2: Impulse responses to a monetary shock Policy Shock Medium−Sized Model: Impulse Responses to a policy Monetary VAR Mean

VAR 95% Real GDP

Estimated DSGE model

Inflation

0.4

0.2

0.2

0.1

Federal funds rate 0.2 0

0 −0.2 0

5

10

−0.2

0

−0.4

−0.1

−0.6

0

Real consumption

5

10

0

Real investment

5

10

Capacity utilization 1

1

0.2

0.5

0.1

0.5

0 0 −0.1 0

5

10

0

Rel. price of investment

5

10

0.15

0.2

0.1

0.1

0.05

0

0

−0.1 5

10

0

Hours worked per capita 0.3

0.2

0

0

−0.5

0

5

10

Real wage 0.05 0 −0.05 −0.1

5

10

69

−0.15 0

5

10

3: Impulse responses to a neutral Medium−SizedFigure Model: Impulse Responses to atechnology Neutral shock Technology Shock VAR Mean

VAR 95% Real GDP

Estimated DSGE model

Inflation

Federal funds rate

0 0.6 −0.4

0.2

−0.2 −0.6

0 0

0

−0.2

0.4

5

10

−0.8 0

Real consumption

5

10

Real investment

1 0.4

5

10

Capacity utilization 0.5

1.5

0.6

−0.4 0

0

0.5 −0.5

0

0.2

−0.5 0

5

10

0

Rel. price of investment 0 −0.1 −0.2 −0.3 0

5

10

5

10

0

Hours worked per capita 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 5

10

70

10

Real wage

0.4

0

5

0

5

10

4: Impulse Impulse responses to an investment specific technology shock Medium−Sized Figure Model: Responses to an Investment Specific Technology Shock VAR Mean

VAR 95% Real GDP

Estimated DSGE model

Inflation

Federal funds rate 0.4

0.6 0

0.2

0.4 −0.2

0

0.2 −0.4

0 0

5

10

0

Real consumption

−0.2 5

10

0

Real investment

5

10

Capacity utilization

0.6 1

1 0.4

0.5

0.5

0

0.2

0

−0.5 0

5

10

−1 0

Rel. price of investment

5

10

Hours worked per capita 0.4

5

10

Real wage 0.2

−0.2

0.1

0.2

0

−0.4

−0.1

0 −0.6 0

0

−0.2 5

10

0

5

10

71

0

5

10

Quasi-Bayesian Model Selection

We also thank the seminar and conference participants for helpful comments at ... tification in a more general framework and call it a Laplace-type estimator ...... DSGE model in such a way that CB is lower triangular so that the recursive identifi-.

625KB Sizes 4 Downloads 320 Views

Recommend Documents

Kin Selection, Multi-Level Selection, and Model Selection
In particular, it can appear to vindicate the kinds of fallacious inferences ..... comparison between GKST and WKST can be seen as a statistical inference problem ...

Quantum Model Selection
Feb 14, 2011 - Quantum Model Selection. Examples. 1. Non-equiliblium states in QFT ωθ = ∫. B ρβ,µ dνθ(β, µ), where β > 0 is the inverse temparature, µ ∈ R another parameter such as the chemical potential. 2. Reducible representations i

ACTIVE MODEL SELECTION FOR GRAPH ... - Semantic Scholar
Experimental results on four real-world datasets are provided to demonstrate the ... data mining, one often faces a lack of sufficient labeled data, since labeling often requires ..... This work is supported by the project (60675009) of the National.

Quasi-Bayesian Model Selection
the FRB Philadelphia/NBER Workshop on Methods and Applications for DSGE Models. Shintani gratefully acknowledges the financial support of Grant-in-aid for Scientific Research. †Department of Economics, Vanderbilt University, 2301 Vanderbilt Place,

Model Selection for Support Vector Machines
New functionals for parameter (model) selection of Support Vector Ma- chines are introduced ... tionals, one can both predict the best choice of parameters of the model and the relative quality of ..... Computer Science, Vol. 1327. [6] V. Vapnik.

A Theory of Model Selection in Reinforcement Learning - Deep Blue
seminar course is my favorite ever, for introducing me into statistical learning the- ory and ..... 6.7.2 Connections to online learning and bandit literature . . . . 127 ...... be to obtain computational savings (at the expense of acting suboptimall

MUX: Algorithm Selection for Software Model Checkers - Microsoft
model checking and bounded or symbolic model checking of soft- ware. ... bines static analysis and testing, whereas Corral performs bounded goal-directed ...

Anatomically Informed Bayesian Model Selection for fMRI Group Data ...
A new approach for fMRI group data analysis is introduced .... j )∈R×R+ p(Y |ηj,σ2 j. )π(ηj,σ2 j. )d(ηj,σ2 j. ) is the marginal likelihood in the model where region j ...

Model Selection Criterion for Instrumental Variable ...
Graduate School of Economics, 2-1 Rokkodai-cho, Nada-ku, Kobe, .... P(h)ˆµ(h) can be interpreted as the best approximation of P(h)y in terms of the sample L2 norm ... Hence, there is a usual trade-off between the bias and the ..... to (4.8) depends

MUX: Algorithm Selection for Software Model Checkers - Microsoft
mation, and have been hugely successful in practice (e.g., [45, 6]). Permission to ..... training the machine learning algorithms and the validation data V S is used for .... plus validation) and the remaining 2920 pairs were used in the online.

Mutual selection model for weighted networks
Oct 28, 2005 - in understanding network systems. Traffic amount ... transport infrastructure is fundamental for a full description of these .... work topology and the microdynamics. Due to the .... administrative organization of these systems, which

Inference complexity as a model-selection criterion for ...
I n Pacific Rim International. Conference on ArtificialIntelligence, pages 399 -4 1 0, 1 998 . [1 4] I rina R ish, M ark Brodie, Haiqin Wang, and ( heng M a. I ntelligent prob- ing: a cost-efficient approach to fault diagnosis in computer networks. S

Dynamic Model Selection for Hierarchical Deep ... - Research at Google
Figure 2: An illustration of the equivalence between single layers ... assignments as Bernoulli random variables and draw a dif- ..... lowed by 50% Dropout.

A Theory of Model Selection in Reinforcement Learning
4.1 Comparison of off-policy evaluation methods on Mountain Car . . . . . 72 ..... The base of log is e in this thesis unless specified otherwise. To verify,. γH Rmax.

Bootstrap model selection for possibly dependent ...
Abstract This paper proposes the use of the bootstrap in penalized model selection ... tion for dependent heterogeneous data sets using bootstrap penalties.

ACTIVE MODEL SELECTION FOR GRAPH-BASED ...
In many practical applications of pattern classification and data mining, one often faces a lack of sufficient labeled data, since labeling often requires expensive ...

Selection Sort
for i = 0 to n - 2 min = i for j = i + 1 to n - 1 if array[j] < array[min] min = j; if min != i swap array[min] and array[i]. Page 10. What's the best case runtime of selection ...

Optimal Monetary Policy and Model Selection in a Real ...
new Keynesian business cycle framework. Our choice of .... and that in our test it provides superior power in detecting the mis-specification. Table 1 reports the ...

Evolution of Norms in a Multi-Level Selection Model of ...
help to a bad individual leads to a good reputation, whereas refusing help to a good individual or helping a bad one leads to a bad reputation. ... Complex Systems. PACS (2006): 87.23.n, 87.23.Ge, 87.23.Kg, 87.10.+e, 89.75.Fb. 1. Introduction. Natura

Evolution of norms in a multi-level selection model of ...
context of indirect reciprocity, whereas at the higher level of selection conflict .... tribes also engage in pairwise conflicts (higher level of selection, level 2 in .... with respect to the equatorial plane (not taking the inner layer into account

A Novel Model of Working Set Selection for SMO ...
the inner product of two vectors, so K is even not positive semi-definite. Then Kii + Kjj − 2Kij < 0 may occur . For this reason, Chen et al. [8], propose a new working set selection named WSS 3. WSS 3. Step 1: Select ats ≡. { ats if ats > 0 τ o