Identifying Speculative Bubbles using an Infinite ...

Viewer
Transcript

OXFORD UNIVERSITY PRESS LTD JOURNAL 00 (0000), 1–31 doi:10.1093/OUP Journal/XXX000

Identifying Speculative Bubbles using an Infinite Hidden Markov Model Shuping Shia ,Yong Songb a

Department of Economics, Macquarie University; Email: [email protected]. b Department of Economics, the University of Melbourne and Rimini Centre for Economic Analysis, Italy; Email: [email protected].

Received July 2014

c 0000 Oxford University Press Copyright ⃝ Prepared using oupau.cls [Version: 2007/02/05 v1.00]

2

Acknowledgement We thank the co-editor George Tauchen, an anonymous associate editor, and two anonymous referees for their helpful comments and suggestions. We are grateful to seminar participants at Carleton University, Canada, the University of Technology, Sydney, the University of Sydney, and the University of Melbourne as well as participants in the 2013 Econometric Society Australasian Meeting for their comments and suggestions.

3

ABSTRACT This paper proposes an infinite hidden Markov model (iHMM) to detect, date stamp, and estimate speculative bubbles. Three features make this new approach attractive to practitioners. First, the iHMM is capable of capturing the complex nonlinear dynamics of bubble behaviors because it allows for an infinite number of regimes. Second, implementing this procedure is straightforward because bubbles are detected, dated, and estimated simultaneously in a coherent Bayesian framework. Third, because the iHMM assumes hierarchical structures, it is parsimonious and superior in out-of-sample forecasts. This model and extensions of this model are applied to the NASDAQ stock market. The in-sample posterior analysis and out-of-sample predictions find evidence of explosive dynamics during the dot-com bubble period. A model comparison shows that the iHMM is strongly supported by the data compared with finite hidden Markov models. Keywords: Speculative bubbles, Infinite hidden Markov model, Dirichlet process JEL classification: C11, C15

1

Introduction

Speculative bubbles, which are generally recognized as the seeds of economic and financial instability, have been the focus of considerable attention over the past several decades. The literature on the econometric detection of bubbles extends back several decades and is overviewed in a recent survey by Gurkaynak (2008). Speculative bubbles are defined as periods characterized by asset pricing that deviates from market fundamentals. However, market fundamentals are generally diﬃcult to pin down with certainty or consistency. The standard present-value model, which has often been used to calculate market fundamentals, has been criticized for its over-simplification (for instance, Driﬃll and Sola (1998) and Cochrane (2011)). Therefore, rather than relying on a definition of a bubble, one strand of research identifies and date stamps bubbles based on their characteristics, i.e., they are explosive (Diba and Grossman, 1988) and subject to periodic collapse (Blanchard, 1979). Such research includes the Markov-switching Augmented Dickey-Fuller (MSADF) test developed by Hall et al. (1999), the recursive right-tailed unit root test by Phillips et al. (2011), Phillips et al. (2013), and Phillips et al.(forthcoming), and the CUSUM test by Homm and Breitung (2012). The idea underpinning this strand of research is to search for explosive bubble behavior within the sample period that is consistent with the characteristics of a periodically collapsing bubble. As evidenced by the volume of empirical applications, these methods have become popular because they can date the turning points of bubbles.1 Nevertheless, these studies are not designed for estimating bubble dynamics and thus cannot be used to forecast bubble survival probability. This paper proposes using an infinite hidden Markov model (iHMM) to unify the identification, dating and estimation of bubble dynamics within a coherent framework. The iHMM’s specification embeds both the

4

explosive and nonlinear characteristics of bubbles. The dynamic within each regime is assumed to be in the form of an ADF equation, as it is in the MSADF test and the recursive right tailed unit root test. Explosiveness is measured by a temperature coeﬃcient (defined below). The iHMM reconciles the existing bubble modeling framework by allowing for a potentially infinite number of regimes. Regime-switching is a salient feature of bubble dynamics and has been widely accepted in the literature – see, for example, the two-regime (normal and bubble regimes) MSADF test and the three-regime (deterministic expansion, fast expansion and collapse) bubble test of Brooks and Katsaris (2005). However, the specification for the numbers of regimes in these bubble tests is subjective and likely to be inaccurate. In this paper, the number of regimes is treated as an unknown parameter and estimated simultaneously with other model parameters, which is consistent with Song (forthcoming).2 This strategy provides the model with suﬃcient flexibility to capture the complex nonlinear dynamics of bubbles. The iHMM is estimated using the Markov Chain Monte Carlo (MCMC) method. We impose hierarchical structures on both the transition matrix and the parameters that characterize each regime, following Song (forthcoming). The hierarchical structures allow us to draw inferences for the priors of those parameters, which provides a systematic way to explore prior sensitivity. Moreover, these structures are able to exploit more information from the data and are shown to improve forecasting performance (Pesaran et al. (2006), among many others). In practice, they may also bring computational benefits. Because regime switching in a Gibbs sampler may be saddled with misleading priors, the hierarchical structures allow us to learn reasonable priors from the data and to facilitate the mixing of the Markov chain. The identification and dating algorithms are built on the posterior distribution of the temperature coeﬃcient. Unlike the MSADF test, the algorithm based on the iHMM is much less computationally demanding3 and the Bayesian methodology allows inferences to be drawn from a small sample size. Importantly, the iHMM can provide us with information regarding the magnitude of the bubble’s explosiveness as well as the overall dynamic of the asset price by means of the posterior distribution of the temperature coeﬃcient. In addition, we consider four non-trivial extensions of the iHMM. The first extension estimates the hyper parameters of the transition matrix – instead of treating them as constant – as a robustness check. In the second extension, we allow the regime persistence parameter in the transition matrix to be diﬀerent for the bubble and normal regimes. The third extension proposes a mixture hierarchical prior for the temperature parameter. The last extension considers GARCH volatility dynamics in each regime. Models are compared using the marginal likelihood (Kass and Raftery, 1995), the predictive likelihood (Geweke and Amisano, 2010) and the long-run predictive likelihood (Song, forthcoming). These criteria are based on predictions and act as Ockham’s razor by automatically punishing over-parameterization. We apply the iHMM model, along with its four extensions, three finite K-regime Markov switching (MSK) models (with K = 2, 3, 4), and a Bayesian model average approach (over MS-K with K = 2, 3, 4) to the NASDAQ stock market. We find that the iHMM model fits the data best. The ex-post identification of the

5

iHMM reveals the existence of three bubble episodes, namely the 1983 stock market boom, the dot-com bubble and the recovery phase of the subprime mortgage crisis in 2009. Real-time forecasting provides a warning signal regarding the existence of a bubble for the periods before Black Monday in 1987 and before the dot-com bubble crash in 1999. The remainder of this paper is organized as follows. Section 2 introduces the infinite hidden Markov model and various extensions. Section 3 describes a sketch of the estimation method, the bubble dating algorithm and the model comparison criteria. The model is applied in Section 4. Section 5 concludes the paper. The detailed MCMC algorithms for posterior sampling are collected in the appendices to this study.

2

Infinite Hidden Markov Model

The basic iHMM is expressed as

Pr(st = i | st−1 = j, S1,t−2 , P, Y1,t−1 ) = Pr(st = i | st−1 = j, P ) = πji , ∆yt | st = i, Θ, Y1,t−1 ∼ f (∆yt | θi , Y1,t−1 ),

(1) (2)

where yt is the data of interest at time t and ∆yt = yt − yt−1 . Y1,t is defined as (y1 , · · · , yt ), which is an empty set if t < 1. The regime indicator at time t is denoted by st and its path up to time t is S1,t = (s1 , · · · , st ). The parameter characterizing regime i is θi . The collection of all θi s is Θ = (θ1 , θ2 , · · · ). The infinite dimensional transition matrix P has πji on its jth row and ith column for i, j = 1, 2, · · · . By construction, we have πji ≥ 0 ∞ ∑ πji = 1 for any positive integers i and j. and i=1

We assume that ∆yt in each regime is a Gaussian autoregressive process of a finite order of q

f (∆yt | st = i, Θ, Y1,t−1 ) ∼ N(ϕi,0 + βi yt−1 +

q ∑

ϕi,k ∆yt−k , σi2 ),

(3)

k=1

where θi = (ϕi , σi ) by construction, with ϕi = (ϕi,0 , βi , ϕi,1 , · · · , ϕi,q )′ . The mean function is specified as the ADF equation. If (ϕi,1 , · · · , ϕi,q ) satisfy the stationarity restriction of an AR(q) process, βi = 0 implies a random walk process in regime i. Under the same condition, the data dynamic is locally explosive if βi > 0 and stationary if βi < 0. We draw an inference regarding the existence of a bubble based on the value of βi . Specifically, we define β˜t = βst and call it the temperature parameter. A positive temperature β˜t > 0 means that the economy is overheated, whereas a negative sign indicates that the economy is cool. A caveat here is that the temperature evaluates whether there is an explosive dynamic, but this evaluation does not necessarily mean that the economy is in danger. The iHMM serves as a thermometer rather than a cure. For example, just as a healthy person may have a fever after a flu shot, the iHMM may indicate that the economy is overheated after curative measures have been taken to prevent it from overheating.

6

The Bayesian approach is applied in the estimation to address the infinite dimensionality. Both priors for the parameter Θ and the transition matrix P are modeled as hierarchical and introduced next.

2.1

Prior of Θ

For each regime, we assume θi has the regular normal-gamma distribution NG(ϕ, H, χ2 , ν2 ) (e.g., see Geweke (2009)), which is conjugate to linear models:

σi−2 ∼ G

(χ ν ) ( ) , and ϕi | σi ∼ N ϕ, σi2 H −1 . 2 2

The inverse of the variance σi−2 is drawn from a gamma distribution with degree of freedom parameter

χ 2.

(4)

ν 2

and scale

Conditional on σi , the vector of regression coeﬃcients, ϕi , has a multivariate normal distribution

with a mean ϕ and a covariance matrix σi2 H −1 . Define λ = (ϕ, H, χ, ν) as the collection of hyper parameters in the normal gamma distribution. A common practice for a finite hidden Markov model is to assume λ to be constant (see, for example, Teh et al. (2006) and Fox et al. (2011)). A hierarchical prior, however, allows us to learn λ using the information across regimes. Therefore, the hierarchical prior is able to exploit more information from the data. Specifically, as in Pesaran et al. (2006), we assume the following:

H ∼ W(A, a); ϕ | H ∼ N(m, τ H −1 ); χ ∼ G(d/2, c/2);

ν ∼ Exp(ρν ).

(5)

The positive definite matrix H is drawn from a Wishart distribution with a degree of freedom a and mean Aa. Conditional on H, ϕ has a multivariate normal distribution with mean m and covariance matrix τ H −1 . The prior of χ is a gamma distribution with scale parameter d/2 and degree of freedom c/2. The prior of ν is an exponential distribution with parameter ρν . The hierarchical structure can facilitate the mixing of the Markov chain by shrinking λ to a reasonable region using the data information such that a new regime can be easily drawn in the MCMC when a structural change is implied by the data.

2.2

Prior of P

The infinite-dimensional transition matrix P consists of an infinite number of infinite-dimensional row vector πj s, where j = 1, 2, · · · . Each πj = (πj1 , πj2 , · · · ) represents a probability measure for the natural numbers. By ∞ ∑ definition, πji ≥ 0 for each j and i and πji = 1 for each j. The prior of P is set as i=1

π0 ∼ SBP(γ), πj | π0 ∼ DP (c, (1 − ρ)π0 + ρδj ) ,

(6) where ρ ∈ (0, 1).

(7)

7

The infinite dimensional vector π0 represents a random probability measure for the natural numbers and is drawn from a stick breaking process (SBP). This vector serves as a hierarchical parameter of all πj s. Conditional on π0 , each πj is drawn from a Dirichlet process (DP) with concentration parameter c and shape parameter (1 − ρ)π0 + ρδj , where δj is a degenerate probability measure with mass 1 at integer j. The aforementioned constraints on π0 and πj s are automatically satisfied by this prior. The formal definition of DP and SBP are found in Appendices A.1 and A.2, respectively. From (7), the shape parameter, expressed as (1 − ρ)π0 + ρδj , is an infinite discrete distribution and represents the mean of πj by the definition of DP. The shape parameter is a convex combination of the hierarchical distribution π0 and the degenerate distribution δj . The hierarchical distribution π0 creates a common shape for each πj , and ρ reflects the prior belief of regime persistence. By construction, conditional on π0 and ρ, the mean of the transition matrix P is a convex combination of two infinite-dimensional matrices:

  π01   π01  E(P | π0 , ρ) = (1 − ρ) ·   π  01  . ..

 π02

π03

π02

π03

π02 .. .

π03 .. .



···       ···   +ρ·   ···     .. .

 1

0

0

1

0 .. .

0 .. .

0 ···   0 ···   . 1 ···   .. . .  . .

These matrices show that the self-transition probability is larger as ρ nears unity. Following Fox et al. (2011), we refer to ρ as the sticky coeﬃcient, and we introduce it into the iHMM for two reasons. First, empirical evidence shows that regime persistence is a salient feature of many macroeconomic and financial variables, and the sticky coeﬃcient brings this feature into the prior. Second, a finite hidden Markov model usually has a small number of regimes, which guarantees that each regime can have a reasonable amount of data. However, the iHMM may assign each data point to one distinct regime. This phenomenon is called state saturation, and it is obviously not particularly notable and is also harmful to forecasting. The sticky coeﬃcient can avoid such a problem by shrinking the over-dispersed clustering. In summary, the iHMM is composed of (1) and (2), in which (2) takes the form of (3) for bubble detection and estimation. The hierarchical prior for Θ is specified as in (4) and (5), whereas (6)-(7) comprise the hierarchical prior for P .

2.3

Extensions

In this section, we propose four non-trivial extensions to the iHMM. The original iHMM is referred to as the basic iHMM or simply the basic model.

iHMM:hyper We extend the basic model by imposing another layer of hierarchy on the hyper parameters related to the transition probability (i.e., γ, c, ρ). Specifically, we assume γ ∼ G(χγ , ν γ ) and c ∼ G(χc , ν c ), where χγ and χc

8

are the scale parameters and ν γ and ν c are the degrees of freedom. The sticky coeﬃcient ρ has a flat beta distribution prior, i.e., ρ ∼ B(1, 1). This extension serves as a robustness check for the prior specification of the transition probabilities and is referred to as iHMM:hyper.

iHMM:endogenous

We allow the sticky coeﬃcient to be diﬀerent across bubble and normal regimes. Specifically, we allow the sticky coeﬃcient to be ρ1 in bubble regimes (i.e., βi > 0) and ρ2 in normal regimes (i.e., βi < 0).4 The prior of the transition probability πj becomes

πj | gj = k, ρk , π0 ∼ DP (c, (1 − ρk )π0 + ρk δj ) ,

where gj takes the value of one if βj > 0 and two otherwise. This extension enables endogenous switching by linking the transition probability to the temperature parameter and is thus referred to as iHMM:endogenous.

iHMM:mixture

We generalize the basic model by imposing a two-component mixture hierarchical prior on the temperature parameter βi because βi s may be drawn from a positive-value distribution for bubble regimes and from a negative-value distribution for normal regimes. Structurally, we draw the parameters that characterize each regime θi from a mixture distribution of

θi ∼ ωNG(ϕ1 , H1 ,

χ2 ν1 χ2 ν2 , ) + (1 − ω)NG(ϕ2 , H2 , , ). 2 2 2 2

The weight ω is assumed to be 0.5 in the application for simplicity. The prior for each set of the hierarchical parameters (ϕl , Hl , χl , νl ) follows (5) as Hl ∼ W(A, a), ϕl | Hl ∼ N(ml , τ Hl −1 ), χl ∼ G(d/2, c/2),

νl ∼ Exp(ρν ),

for l = 1 or 2. One can see that all hyper parameters of these two priors are the same except for ml , which is the prior mean of ϕl . We assume that only the prior mean of the temperature parameter βi (the second element of vector ml ) varies across these two components. Furthermore, we impose a positive value for l = 1 and negative for l = 2. Note that the prior identification is stochastic such that a bubble regime can still be drawn from the second component. This extension is labeled as iHMM:mixture, and its purpose is to accommodate potential multimodality in the predictive density.

9

iHMM:GARCH The final extension is referred to as iHMM:GARCH, in which we consider GARCH volatility dynamics within each regime. In this extension, the conditional data density is specified as

f (∆yt | st = i, Θ, Y1,t−1 ) ∼ N(ϕi,0 + βi yt−1 +

q ∑

ϕi,k ∆yt−k , ht,i ),

(8)

k=1

ht,i = ai,0 + ai,1 ε2t−1,i + ai,2 ht−1,i ,

where εt,i = ∆yt − ϕi,0 − βi yt−1 −

q ∑

ϕi,k ∆yt−k is the regime specific residual. This specification is similar to

k=1

that of Haas et al. (2004).5 The prior for the mean function coeﬃcient ϕi is set as N(ϕ, H −1 ). The prior for (ϕ, H) is the same as the basic model, i.e., H ∼ W(A, a) and ϕ | H ∼ N(m, τ H −1 ). The prior for the volatility function coeﬃcient ai = (ai,0 , ai,1 , ai,2 ) is assumed to be independent truncated normal ai,j ∼ N(aj , h−1 j ) with ai,j > 0 and ai,1 + ai,2 < 1 for i = 1, 2, · · · and j = 0, 1, 2. The mean value aj equals the corresponding value after fitting the data to a univariate ARMA-GARCH(1,1) model. The variance h−1 is set to be twice as large as the estimated variance j from the maximum likelihood estimator. Because the prior for the volatility parameters is informative and the domain of ai is compact, we do not use a hierarchical structure on it.

3 3.1

Estimation, Dating Algorithm, and Model Comparison Estimation

The posterior sampling is based on MCMC methods, and we apply the block sampler of Fox et al. (2011) to improve eﬃciency (Ishwaran and James, 2001). The detailed algorithms are collected in Appendix B. The objective is to draw random samples from the posterior distribution. After discarding the first block of the sample to remove dependence on the initial values, the remaining N draws from the sample, {S (i) , Θ(i) , P (i) , λ(i) }N i=1 , where λ represents the collection of hyper parameters for the transition probability and the hierarchical prior parameters, are used for inferences as if they were drawn from their posterior distributions. Simulation consistent posterior statistics are computed as sample averages. For example, the posterior mean ∑N (i) of ϕ is calculated using N1 i=1 ϕ . To avoid the label-switching problem in the mixture models (e.g., Stephens (2000)), we use the label-invariant statistics suggested by Geweke (2007) such that the posterior sampling algorithm can safely ignore this problem.

3.2

Dating Algorithm of Bubbles

The presence of explosive bubble behavior during period t is indicated by the temperature parameter β˜t (recall that β˜t = βst ). Heuristically, β˜t should be positive when observation t is a realization of bubble behavior and

10

negative when it is stationary. The Bayesian approach provides the posterior distribution p(β˜t | Y ), where Y represents the entire sample instead of a point estimate, such that a decision can be made regarding the existence of a bubble based on the specific loss function. Bubble detection consists of two steps: the first step is to use the model to obtain the posterior distribution of the parameters; the second step is to make decisions, which might be diﬀerent if the loss functions are distinct. We can see that the second step is clearly not our task in this paper. Meanwhile, an illustration of the identification procedure is necessary. Here, we consider two intuitive posterior statistics – which are derived from two simple loss functions – to help identify bubbles: the first is the ( ) ( ) posterior probability P β˜t > 0 | Y , and the second is the posterior mean, E β˜t | Y . For the first statistic, we claim that a bubble exists at time t when the posterior probability of a positive temperature is above 0.5, and there is no bubble otherwise. More concisely,

( ) bubble exists in period t if P β˜t > 0 | Y > 0.5 .

This criterion is derived from an absolute value loss function. The cutoﬀ value can be diﬀerent from 0.5 if the loss function is asymmetric. The second statistic is based on a quadratic loss function. We claim that a bubble exists at time t when the posterior mean of the temperature is above zero, and no bubble exists otherwise. More concisely,

( ) bubble exists in period t if E β˜t | Y > 0 .

This statistic also shows the magnitude of explosiveness. The higher the value, the faster the bubble expands.

3.3

Model Comparison

We consider three diﬀerent model comparison criteria, i.e., the log marginal likelihood (ML, Kass and Raftery (1995)), the log predictive likelihood (PL, Geweke and Amisano (2010)), and the log long-run predictive likelihood (Song, forthcoming). The marginal likelihood of model Mi can be written as the product of one-period predictive likelihoods T ∏ such that p(Y1,T | Mi ) = p(yτ | Y1,τ −1 , Mi ). The one-period predictive likelihood of model Mi is calculated τ =1

by pˆ(yt | Y1,t−1 , Mi ) =

N 1 ∑ f (yt | Υ(l) , Y1,t−1 , Mi ), N

(9)

l=1

where Υ(l) is one draw of the parameters from the MCMC conditional on the past data, Y1,t−1 . After obtaining the one-period predictive likelihood, pˆ(yt | Y1,t−1 , Mi ), the data is updated by adding one more observation, yt , and the model is re-estimated to compute the next period predictive likelihood. This process is repeated until the last predictive likelihood, pˆ(yT | Y1,T −1 , Mi ), is obtained.

11

Suppose that one is interested in comparing the performance of models Mi and Mj . Kass and Raftery (1995) suggest examining the log marginal likelihood diﬀerence between the candidate models, i.e., log p(Y1,T | Mi ) − log p(Y1,T | Mj ) denoted by log(BFij ) (BF stands for Bayes Factor). We conclude that the data favors model Mi if log(BFij ) is positive and Mj otherwise. The rules suggested by Kass and Raftery (1995) are as follows: Not worth more than a bare mention if 0 ≤ log(BFij ) < 1; Positive if 1 ≤ log(BFij ) < 3; Strong if 3 ≤ log(BFij ) < 5; Very strong if log(BFij ) ≥ 5. The predictive likelihood is calculated as p(Yt+1,T | Y1,t , Mi ) =

∏T τ =t+1

p(yτ | Y1,τ −1 , Mi ). Y1,t can be viewed

as a training sample for the priors. The predictive likelihood is equivalent to the marginal likelihood when t = 0.6 The log long-run predictive likelihood measures the long-run performance of candidate models and is defined as ∑T −h the summation of the log h-periods ahead of predictive likelihood, i.e., τ =t log p(yτ +h | Y1,τ , Mi ).

4

The NASDAQ Stock Market

In this section, we estimate the dynamics of the log price-dividend ratio of the NASDAQ stock market and make an inference regarding bubble existence.7 According to the present value model, the log price and the log dividend are cointegrated when there is no bubble in the market (see, for example, Campbell and Shiller (1989)). Therefore, the log price-dividend ratio, which equals the diﬀerence between the log price and the log dividend, can be either an I(0) or an I(1) process, depending on the value of the cointegration vector. Specifically, it is a stationary process if the cointegration vector is [1, −1] and a unit root process otherwise. Conversely, when there are bubbles in the market, given that the log price is explosive and the log dividend is an I(1) process, the log price-dividend ratio should follow an explosive process. Therefore, the presence of locally explosive behavior in the log price-dividend ratio implies the existence of bubbles in the NASDAQ stock market. For the estimation of the log price-dividend ratio dynamics, we consider the iHMM (including its four extensions), three finite MS-K models (with K = 2, 3 and 4), and a Bayesian model averaging approach over MS-K with K = 2, 3, 4. For all models, the lag order q is set to four. The NASDAQ stock price index and dividend yield are obtained monthly from DataStream International and sampled from February 1973 to January 2013. Figure 1 depicts the trajectories of the log price-dividend ratio of the stock index (solid line).

4.1

Prior Specifications

The prior coeﬃcients of the basic iHMM are set as follows: ρ = 0.5, c = 1, γ = 1, m = 0, τ = 1, a = q + 10, A = I/a, c = 10, d = 1/Var(∆yt ), ρν = c + 2. The specifications of a and A imply that the prior mean of H is an identity matrix and ensure the existence of the mean of H −1 . The prior mean of χ and ν is c/d = cVar(∆yt ) and c + 2, respectively. If the values χ and ν equal their respective means, the prior mean of the conditional volatility in each regime (i.e., σi2 ) is the unconditional variance Var(∆yt ). For iHMM:hyper, we set the prior distribution of both c and γ to G(1, 1) and the prior of the sticky coeﬃcient ρ to an uninformative beta distribution B(1, 1). The prior of ρ1 and ρ2 in iHMM:endogenous are

12

set as independent flat beta distributions B(1, 1). For iHMM:mixture, we assume m1 = (0, 0.1, 0, · · · , 0)8 and m2 = (0, −0.1, 0, · · · , 0). The mean and variance of the volatility parameter vector in iHMM:GARCH are set to (0.0008, 0.1965, 0.7123) and (0.004, 0.093, 0.35), respectively. Note that we do not use a hierarchical structure on the volatility parameters because their domain is compact due to the stationarity restriction. The prior specifications of other model parameters are identical to the basic iHMM. For the finite MS-K models, the prior for each row of the transition matrix P is simply set as the independent Dirichlet distribution with the same density across the simplex, namely, Dir(1, · · · , 1). The hierarchical prior for Θ is the same as in the basic iHMM. For the Bayesian modeling averaging approach, we impose equal probabilities on each of the three finite MS models, and the prior settings of the finite MS models are identical to those of the individual models.

4.2

Model Comparison

Table 1 reports the log marginal likelihoods, the log predictive likelihoods, and the long-run log predictive likelihoods (including two-month, three-month, six-month, one-year and two-year ahead forecasts) of the aforementioned models for the log price-dividend ratio of the NASDAQ index. As evidenced by the table, all three criteria suggest that the basic iHMM performs best. The dominance of the basic iHMM over iHMM:mixture and iHMM:GARCH indicates that multimodality may not be a feature of the data dynamic and that regime specific volatility is suﬃcient to capture heteroskedasticity.9 Note that the long-run density forecasts of the iHMM:hyper model are very close to those of the basic iHMM, compared with other models, because iHMM:hyper can be viewed as a model to check the robustness of the basic iHMM on the hyper parameters. Given the model comparison results, we conduct our subsequent analysis on only the basic iHMM.

4.3

Posterior Analysis and Real-time Forecasts

Table 2 tabulates the posterior probabilities of the number of regimes implied by the basic iHMM for the data series. As the table shows, the probability of having three, four and five regimes (i.e., K = 3, 4, 5) are 34%, 36% and 18%, respectively. The posterior mean of the number of regimes is 4.2, which is larger than what has been used in the previous literature (i.e., two or three). Figure 1 illustrates the posterior probabilities of β˜t > 0 (top panel) and the posterior means of β˜t (bottom panel). Remember that a bubble exists at period t when the posterior mean of β˜t is greater than zero or the posterior probability of β˜t > 0 is greater than 0.5. The graph shows three episodes of explosive bubble dynamics. The first and third episodes are relatively short, running from March 1983 to June 1983 and from May 2009 to August 2009. The second episode is the famous dot-com bubble episode, which originated in November 1998 and concluded in March 2000 (with a break in between).

13

0.8 0.6

Right axis: log NASDAQ P/D index ~ Left axis: p(βt > 0 | Y)

3.0

0.0

0.2

4.0

0.4

5.0

6.0

Figure 1: The posterior probabilities (top panel) and the posterior means (bottom panel) of β˜t for the log NASDAQ price-dividend ratio sampled from February 1973 to January 2013.

198304

199103

199902

200701

199902

200701

0.00

0.01

Right axis: log NASDAQ P/D index ~ Left axis: E(βt | Y)

3.0

−0.02

4.0

5.0

6.0

197505

197505

198304

199103

Next, we conduct a real-time forecast using the basic iHMM. Specifically, we estimate the model recursively by adding one observation at a time to an expanding sample and calculating the one-period ahead forecast of the posterior probability P (β˜t > 0 | Y1,t−1 ) and the posterior mean E(β˜t | Y1,t−1 ). Figure 2 plots the one-periodahead iHMM real-time forecast of P (β˜t > 0 | Y1,t−1 ) (top panel) and E(β˜t | Y1,t−1 ) (bottom panel). We disregard the first 100 observations (before November 1981) of the forecasting result to account for prior sensitivity. Assessing periods after 1981M11, there are dramatic and persistent increases in both the posterior probability and the posterior mean in 1986 and 1999, which are associated with the bubble episodes that lead to Black Monday in 1987 and the dot-com bubble episode, respectively.

4.4

Prior Sensitivity Check

For a prior sensitivity check, we consider perturbing one value of the prior at a time while holding the others constant. We let ρ = {0.9, 0}, c = {10, 0.1}, γ = {0.1, 10} , τ = {0.1, 10} , a = {q + 20, q + 5} , c = {20, 5}. Each perturbation is associated with a tighter (the first value) or a flatter (the second value) prior than the original. The last prior sensitivity check combines all of the tight (flat) priors together. Table 3 shows the log marginal and predictive likelihoods of the basic iHMM with diﬀerent prior settings. Although there is a certain amount of variation in the log marginal likelihoods, the log predictive likelihoods show that these priors produce similar results. The only outlier is c = 0.1, which produces a log predictive likelihood of 384.6, which is 5.2 less than the original setting. This result shows that the data is strongly against

14

0.8 0.6

Right axis: log NASDAQ P/D index ~ Left axis: p(βt > 0 | Y1, t−1)

0.0

3.0

0.2

4.0

0.4

5.0

6.0

Figure 2: The one-period-ahead iHMM real-time forecast of P r(β˜t > 0 | Y1,t−1 ) (top panel) and E(β˜t | Y1,t−1 ) (bottom panel) of the log NASDAQ price-dividend ratio.

199103

199902

200701

199902

200701

0.02

198304

Right axis: log NASDAQ P/D index ~ Left axis: E(βt | Y1, t−1)

3.0

−0.06

4.0

−0.02

5.0

6.0

197505

197505

198304

199103

the flat prior belief that rows of the transition matrix are weakly related and is consistent with our motivation for using the hierarchical structure. For the other prior settings, the diﬀerence does not exceed 5.0, which shows that they do not strongly disagree with the data.

5

Conclusions

This paper proposes a new infinite hidden Markov model to integrate the detection, date-stamping, and estimation of bubble behaviors in a coherent Bayesian framework. This model reconciles the existing bubble modeling framework and can be utilized to conduct real-time forecasting of bubble probability. Two parallel hierarchical structures provide a parsimonious methodology for robust prior elicitation and improve out-ofsample forecasts. This model and its extensions are applied to the log price-dividend ratio of the NASDAQ Composite Index from February 1973 to January 2013. The in-sample posterior analysis and out-of-sample prediction find evidence for explosive dynamics during the dot-com bubble period. Model comparisons show that the iHMM is strongly supported by the data compared with the finite hidden Markov models.

15

Table 1: Model comparison: log marginal likelihoods, log predictive likelihoods, and log long-run predictive likelihoods with diﬀerent forecasting horizons. log ML log PL log long-run PL 2M 3M 6M 1Y 2Y iHMM 478.3 389.8 383.4 382.0 379.1 365.4 357.5 iHMM:hyper 470.3 388.0 383.0 380.7 379.0 364.8 356.4 iHMM:endogenous 469.5 388.3 382.5 380.3 377.7 362.5 353.0 iHMM:mix 473.0 385.1 379.6 379.1 377.2 363.8 355.1 iHMM:GARCH 462.5 382.3 377.6 376.7 373.7 357.7 347.9 MS-2 468.6 384.4 381.9 381.5 377.5 362.8 356.3 MS-3 468.6 384.8 381.3 380.1 376.4 362.9 355.2 MS-4 468.6 384.7 380.5 380.5 375.7 363.6 354.8 BMA 468.6 384.6 381.5 380.8 376.6 362.9 355.5

16

Table 2: The posterior distribution of regime numbers of the basic iHMM Estimated number of regimes K=2 K=3 K=4 K=5 K=6 Posterior probability 5.4% 34.0% 36.3% 18.1% 4.6%

K>6 1.6%

17

original ρ = 0.90 ρ=0 c = 10 c = 0.1 γ = 0.1 γ = 10 ϕ : τ = 10 ϕ : τ = 0.1

Log ML 478.3 467.1 469.8 473.0 466.1 468.4 473.7 465.1 478.2

Table 3: Prior sensitivity check for the basic iHMM Log PL Log ML Log PL 389.8 H : a = q + 20 472.1 388.0 385.6 H :a=q+5 469.9 387.0 386.4 χ, ν : c = 20 468.9 385.6 387.8 χ, ν : c = 5 473.3 388.0 384.6 All: tight 460.0 385.8 385.6 All: flat 473.3 386.4 389.9 388.0 389.6

18

Figure Legends Figure 1: The posterior probabilities (top panel) and the posterior means (bottom panel) of β˜t for the log NASDAQ price-dividend ratio sampled from February 1973 to January 2013. Figure 2: The one-period-ahead iHMM real-time forecast of P r(β˜t > 0 | Y1,t−1 ) (top panel) and E(β˜t | Y1,t−1 ) (bottom panel) of the log NASDAQ price-dividend ratio.

19

Notes 1 See,

for example, Phillips et al. (2011), Das et al. (2011), Homm and Breitung (2012), Gutierrez (2013), Bohl et al. (2013),

Etienne et al. (2014). 2 McCulloch

and Tsay (1993), Koop and Potter (2007), Giordani and Kohn (2008) and Maheu and Song (forthcoming) also

consider a specification of unknown number of regimes. However, they do not allow the regimes to recur over time. 3 Psaradakis 4 We

et al. (2001) shows that the MSADF test is computationally burdensome due to the need to bootstrap critical values.

have also considered the case where ρ is diﬀerent for βi > 0, βi = 0, and βi < 0. In our application, this specification is

strongly rejected by the data, namely the posterior probabilities of βi = 0 are always zero; it is thus not reported here. 5 The

mean function is zero in Haas et al. (2004). We do not pursue Gray’s (1996) approach due to problems with path dependence.

6 Geweke

and Amisano (2010) argue that this method is more robust to prior elicitation than the marginal likelihood.

7 Several

papers have studied the evidence for bubbles in this market (among others, Johansen and Sornette (2000), P´ astor and

Veronesi (2006), and Phillips et al. (2011)). Although the sample periods and methodologies used in these papers are diﬀerent, most studies have found evidence of bubble existence. 8 We

have tried replacing 0.1 with the standard deviation ratio of ∆yt and yt (i.e. sd(∆yt )/sd(yt )) for normalization purposes.

The conclusion does not change. 9 This

finding is consistent with the argument of Lamoureux and Lastrapes (1990): the GARCH eﬀect may diminish after

controlling for regime change.

20

References Blanchard, O. J. (1979). Speculative bubbles, crashes and rational expectations. Economics Letters 3, 387–389. Bohl, M. T., P. Kaufmann, and P. M. Stephan (2013). From hero to zero: Evidence of performance reversal and speculative bubbles in german renewable energy stocks. Energy Economics. Brooks, C. and A. Katsaris (2005). A three-regime model of speculative behaviuor: Modelling the evolution of the S&P 500 composite index. The Economic Journal 115, 767–797. Chib, S. (1996). Calculating posterior distributions and modal estimates in Markov mixture models. Journal of Econometrics 75 (1), 79–97. Cochrane, J. H. (2011). Presidential address: Discount rates. The Journal of Finance 66 (4), 1047–1108. Das, S., R. Gupta, and P. T. Kanda (2011). International articles: Bubbles in south african house prices and their impact on consumption. Journal of Real Estate Literature 19 (1), 69–91. Diba, B. T. and H. I. Grossman (1988). Explosive rational bubbles in stock prices? The American Economic Review 78, 520–530. Driﬃll, J. and M. Sola (1998). Intrinsic bubbles and regime-switching. Journal of Monetary Economics 42, 357–373. Dufays, A. (2012). Infinite-state markov-switching for dynamic volatility and correlation models. working paper No. 2012043 . Etienne, X. L., S. H. Irwin, and P. Garcia (2014). Bubbles in food commodity markets: Four decades of evidence. Journal of International Money and Finance 42, 129–155. Ferguson, T. (1973). A bayesian analysis of some nonparametric problems. The annals of statistics, 209–230. Fox, E., E. Sudderth, M. Jordan, and A. Willsky (2011). A Sticky HDP-HMM with Application to Speaker Diarization. Annals of Applied Statistics 5 (2A), 1020–1056. Geweke, J. (2007). Interpretation and inference in mixture models: Simple MCMC works. Computational Statistics & Data Analysis 51 (7), 3529–3550. Geweke, J. (2009). Complete and Incomplete Econometric Models. Princeton University Press. Geweke, J. and G. Amisano (2010). Comparing and evaluating Bayesian predictive distributions of asset returns. International Journal of Forecasting. Giordani, P. and R. Kohn (2008). Eﬃcient Bayesian inference for multiple change-point and mixture innovation models. Journal of Business and Economic Statistics 26 (1), 66–77.

21

Gray, S. F. (1996). Modeling the conditional distribution of interest rates as a regime-switching process. Journal of Financial Economics 42 (1), 27–62. Gurkaynak, R. S. (2008).

Econometric tests of asset price bubbles: Taking stock.

Journal of Economic

Surveys 22 (1), 166–186. Gutierrez, L. (2013). Speculative bubbles in agricultural commodity markets. European Review of Agricultural Economics 40 (2), 217–238. Haario, H., E. Saksman, and J. Tamminen (2001). An adaptive metropolis algorithm. Bernoulli , 223–242. Haas, M., S. Mittnik, and M. Paolella (2004). A new approach to markov-switching garch models. Journal of Financial Econometrics 2 (4), 493–530. Hall, S., Z. Psaradakis, and M. Sola (1999). Detecting periodically collapsing bubbles: A markov-switching unit root test. Journal of Applied Econometrics 14, 143–154. Homm, U. and J. Breitung (2012). Testing for speculative bubbles in stock markets: a comparison of alternative methods. Journal of Financial Econometrics 10 (1), 198–231. Ishwaran, H. and L. James (2001). Gibbs Sampling Methods for Stick-Breaking Priors. Journal of the American Statistical Association 96 (453). Ishwaran, H. and M. Zarepour (2000). Markov chain Monte Carlo in approximate Dirichlet and beta twoparameter process hierarchical models. Biometrika 87 (2), 371. Ishwaran, H. and M. Zarepour (2002). Dirichlet prior sieves in finite normal mixtures. Statistica Sinica 12 (3), 941–963. Johansen, A. and D. Sornette (2000). The nasdaq crash of april 2000: Yet another example of log-periodicity in a speculative bubble ending in a crash. The European Physical Journal B-Condensed Matter and Complex Systems 17 (2), 319–328. Kass, R. and A. Raftery (1995). Bayes factors. Journal of the American Statistical Association 90 (430). Koop, G. and S. Potter (2007). Estimation and forecasting in models with multiple breaks. Review of Economic Studies 74 (3), 763. Lamoureux, C. G. and W. D. Lastrapes (1990). Persistence in variance, structural change, and the garch model. Journal of Business & Economic Statistics 8 (2), 225–234. Maheu, J. and Y. Song (forthcoming). A new structural break model with application to Canadian inflation forecasting. International Journal of Forecasting.

22

McCulloch, R. and R. Tsay (1993).

Bayesian inference and prediction for mean and variance shifts in

autoregressive time series. Journal of the American Statistical Association 88 (423), 968–978. M¨ uller, P. (1991). A generic approach to posterior integration and Gibbs sampling. Rapport technique, 91–09. P´astor, L. and P. Veronesi (2006). Was there a nasdaq bubble in the late 1990s? Journal of Financial Economics 81 (1), 61–100. Pesaran, M., D. Pettenuzzo, and A. Timmermann (2006). Forecasting time series subject to multiple structural breaks. Review of Economic Studies 73 (4), 1057–1084. Phillips, P., S. Shi, and J. Yu (2013). Testing for multiple bubbles: Historical episodes of exuberance and collapse in the s&p 500. Working Paper . Phillips, P., S. Shi, and J. Yu (forthcoming). Testing for multiple bubbles: Limit theory of real time detectors. International Economic Review . Phillips, P., Y. Wu, and J. Yu (2011). Explosive behavior in the 1990s Nasdaq: When did exuberance escalate asset values? International Economic Review 52, 201–226. Psaradakis, Z., M. Sola, and F. Spagnolo (2001). A simple procedure for detecting periodically collapsing rational bubbles. Economics Letters 24 (72), 317–323. Roberts, G., A. Gelman, and W. Gilks (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability 7 (1), 110–120. Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica Sinica 4, 639–650. Song, Y. (forthcoming). Modeling regime switching and structural breaks with an infinite dimension markov switching model. Journal of Applied Econometrics. Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society. Series B, statistical methodology 62 (4), 795–809. Teh, Y., M. Jordan, M. Beal, and D. Blei (2006). Hierarchical dirichlet processes. Journal of the American Statistical Association 101 (476), 1566–1581.

23

A

Dirichlet Process and Stick Breaking Process

A.1

Dirichlet Process

Before introducing the Dirichlet process, the definition of the Dirichlet distribution is as follows:

Definition A.1 The Dirichlet distribution is denoted by Dir(α), where α is a K-dimensional vector of K ∑ positive values. Each sample x from Dir(α) is a K-dimensional vector with xi ∈ (0, 1) and xi = 1. The i=1

probability density function is Γ( p(x | α) =

K ∑

i=1 K ∏

αi ) ∏ K

i −1 xα i

Γ(αi ) i=1

i=1

A special case is the Beta distribution, where K = 2. Define α0 =

K ∑

αi and Xi as the ith element of the random vector X from a Dirichlet distribution Dir(α).

i=1

The random variable Xi has mean

αi α0

and variance

αi (α0 −αi ) . α20 (α0 +1)

We can further decompose α into two parts: a

αK 1 shape parameter, G0 = ( α α0 , · · · , α0 ), and a concentration parameter, α0 . The shape parameter G0 represents

the center of the random vector X, and the concentration parameter α0 controls how close X is to G0 . The Dirichlet distribution is conjugate to the multi-nominal distribution in the following sense: If

X ∼ Dir(α), β = (n1 , . . . , nK ) | X ∼ Mult(X),

where ni is the number of occurrences of i in a sample of n =

K ∑

ni points from the discrete distribution on

i=1

{1, · · · , K} defined by X, then X | β = (n1 , . . . , nK ) ∼ Dir(α + β). This relationship is used in Bayesian statistics to estimate the hidden parameters, X, given a collection of n samples. Intuitively, if the prior is represented as Dir(α), then Dir(α + β) is the posterior following a sequence of observations with histogram β. The Dirichlet process was introduced by Ferguson (1973) as an extension of the Dirichlet distribution from finite dimensions to infinite dimensions. It is a distribution of distributions and has two parameters: the shape parameter G0 is a distribution over a sample space Ω, and the concentration parameter α0 is a positive scalar. These parameters have similar interpretations to their counterparts in the Dirichlet distribution. The formal definition is as follows:

Definition A.2 The Dirichlet process over a set Ω is a stochastic process whose sample path is a probability distribution over Ω. For a random distribution F distributed according to a Dirichlet process DP(α0 , G0 ), given

24

any finite measurable partition A1 , A2 , · · · , AK of the sample space Ω, the random vector (F (A1 ), · · · , F (AK )) is distributed as a Dirichlet distribution with parameters(α0 G0 (A1 ), · · · , α0 G0 (AK )).

Using the results from the Dirichlet distribution, for any measurable set A, the random variable F (A) has mean G0 (A) and variance

G0 (A)(1−G0 (A)) . α0 +1

The mean implies that the shape parameter G0 represents

the center of a random distribution F drawn from a Dirichlet process, DP(α0 , G0 ). Define ai ∼ F as an observation drawn from the distribution F . Because, by definition, P (ai ∈ A | F ) = F (A), we can derive P (ai ∈ A | G0 ) = E(P (ai ∈ A | F ) | G0 ) = E(F (A) | G0 ) = G0 (A). Thus, the shape parameter G0 is also the marginal distribution of an observation, ai . The variance implies that the concentration parameter α0 controls how close the random distribution F is to the shape parameter G0 . The larger α0 is, the more likely F is close to G0 , and vice versa. Suppose that there are n observations, a = (a1 , · · · , an ), drawn from the distribution F . Use

n ∑

δai (Aj ) to

i=1

represent the number of ai in set Aj , where A1 , · · · , AK is a measurable partition of the sample space Ω and δai (Aj ) is the Dirac measure, where δai (Aj ) =

( Conditional on (F (A1 ), · · · , F (AK )), the vector

  1 if ai ∈ Aj  0 if a ∈ i / Aj

n ∑

.

n ∑

δai (A1 ), · · · ,

) δai (AK ) has a multi-nominal distribu-

i=1

i=1

tion. By the conjugacy of the Dirichlet distribution to the multi-nominal distribution, the posterior distribution of (F (A1 ), · · · , F (AK )) is still a Dirichlet distribution:

( (F (A1 ), · · · , F (AK )) | a ∼ Dir α0 G0 (A1 ) +

n ∑

δai (A1 ), · · · , α0 G0 (AK ) +

n ∑

) δai (AK )

i=1

i=1

Because this result is valid for any finite measurable partition, the posterior of F is still a Dirichlet process by definition, with new parameters α0∗ and G∗0 , where α0∗ = α0 + n α0 n ∑ δai G0 + α0 + n α0 + n n n

G∗0 =

i=1

The posterior shape parameter, G∗0 , is the mixture of the prior and the empirical distribution implied by observations. As n → ∞, the shape parameter of the posterior converges to the empirical distribution. The concentration parameter α0∗ → ∞ implies that the posterior of F converges to the empirical distribution with a probability of one. Ferguson (1973) showed that a random distribution drawn from a Dirichlet process is almost surely discrete, although the shape parameter G0 can be continuous.

25

A.2

Stick breaking process

For a random distribution, F ∼ DP(α0 , G0 ), because F is almost surely discrete, it can be represented by two parts: diﬀerent value θi s and their corresponding probabilities pi s, where i = 1, 2, · · · . Sethuraman (1994) found the stick breaking representation of the Dirichlet process by writing F ≡ (θ, p), where θ ≡ (θ1 , θ2 , · · · )′ , ∞ ∑ p ≡ (p1 , p2 , · · · )′ with pi > 0 and pi = 1. The F ∼ DP(α0 , G0 ) can be generated by i=1

iid

Vi ∼ Beta(1, α0 ) pi = Vi

i−1 ∏

(1 − Vj )

(10) (11)

j=1 iid

θi ∼ G 0

(12)

where i = 1, 2, · · · . In this representation, p and θ are generated independently. The process generating p, (10) and (11), is called the stick breaking process and denoted by p ∼ SBP(α0 ). The name comes after the pi s i−1 ∑ generation. For each i, the remaining probability, 1 − pj , is sliced by a proportion of Vi and given to pi , j=1

which is equivalent to breaking a stick an infinite number of times.

B

Markov Chain Monte Carlo

B.1

Block sampler

The infinite number of regimes makes the method of Chib (1996) impossible to implement. Fox et al. (2011) suggest approximating the iHMM using a finite Markov-switching model in which the number of regimes, denoted by L, is assumed to be large. The consistency of this approximation is provided in Ishwaran and Zarepour (2000) and Ishwaran and Zarepour (2002). In practice, if L is large enough, the finite MS model is equivalent to the iHMM. In our application, the regime number L is chosen such that it is always greater than the number of regimes in MCMC. Structurally, we replace equation (6) and (7) with

π0 ∼ Dir

(γ L

,··· ,

γ) , L

( ) πj | π0 ∼ Dir (1 − ρ)cπ01 , ..., (1 − ρ)cπ0j + ρc, · · · , (1 − ρ)cπ0L ,

(13) (14)

where Dir represents the Dirichlet distribution and j = 1, 2, · · · , L. For the iHMM:endogenous, equation (7) is replaced by

( ) πj | gj = k, π0 ∼ Dir (1 − ρk )cπ01 , ..., (1 − ρk )cπ0j + ρk c, · · · , (1 − ρk )cπ0L .

26

Note that the only approximation is 13.10 Equation {block2 is not an approximation because a DP is equivalent to a Dirichlet distribution if its shape parameter only has support in a finite set. The rest of the model specification remains the same.

B.2

Basic iHMM

To sample from the posterior distribution, the MCMC method partitions the parameter space into four parts: (S, I), (Θ, P, π0 ), (ϕ, H, χ) and ν, where S and I are the collection of regime indicators st and binary auxiliary variables It , respectively. Each part is randomly sampled conditional on the other parts and the data Y = (y1 , · · · , yT ). The sampling algorithms are as follows: 1. Sample (S, I) • S is sampled using the forward filter and backward sampler of Chib (1996). From (13) and (14), the filtered distribution of πj conditional on S1,t and π0 is a Dirichlet distribution:

) ( (t) (t) (t) πj | S1,t , π0 ∼ Dir c(1 − ρ)π01 + nj1 , · · · , c(1 − ρ)π0j + cρ + njj , · · · , c(1 − ρ)π0L + njL , (t)

where nji is the cardinality of {τ | sτ = i, sτ −1 = j, τ ≤ t}. Integrating out πj , the conditional distribution of st+1 , given S1,t and π0 , is

(t)

p(st+1 = i | st = j, S1,t , π0 ) ∝ c(1 − ρ)π0i + cρδj (i) + nji . • I is introduced to facilitate the sampling of π0 . Construct a Bernoulli random variable It which takes the value of one if st is drawn from π0 and zero otherwise. The probability density function of It is

p(It+1 | st = j, S1,t , π0 ) ∝

 L ∑ (t)   cρ + nji

if It+1 = 0

 

if It+1 = 1

.

j=1

c(1 − ρ)

Assume the conditional distribution of st+1 to be

(t)

p(st+1 = i | It+1 = 0, st = j, St , β) ∝ nji + cρδj (i); p(st+1 = i | It+1 = 1, st = j, St , β) ∝ π0i .

This construction preserves the conditional distribution of st+1 , given S1,t and π0 . To sample I | S, use the Bernoulli distribution:

It+1 | st+1 = i, st = j, π0 ∼ Ber(

c(1 − ρ)π0i (t) nji

+ cρδj (i) + c(1 − ρ)π0i

).

27

2. Sample (Θ, P, π0 )

• After sampling I and S, write mi =

∑

It . By construction, the conditional posterior of π0 given S

st =i

and I only depends on I and is a Dirichlet distribution:

π0 | S, I ∼ Dir(

γ γ + m1 , · · · , + mL ). L L

This approach to sampling π0 is easier than the method of Fox et al. (2011) because we use one fewer auxiliary variable. • Conditional on π0 and S, we sample πj from a Dirichlet distribution

Dir(c(1 − ρ)π01 + nj1 , · · · , c(1 − ρ)π0j + cρ + njj , · · · , c(1 − ρ)π0L + njL ),

where nji is the cardinality of {τ | sτ = i, sτ −1 = j}.

( ) • By conjugacy, the posterior conditional distribution of ϕi , σi−2 is NG(ϕi , H i , χ2i , ν2i ) with ϕi = ′

−1

H i (Hϕ + Xi′ Yi ), H i = H + Xi′ Xi , χi = χ + ∆Yi′ ∆Yi + ϕ′ Hϕ − ϕ Hϕ and ν i = ν + ni . The vector ∆Yi is the collection of ∆yt in regime i. Xi and ni are the collection of xt = (1, yt−1 , ∆yt−1 , · · · , ∆yt−q ) and the number of observations in regime i, respectively.

3. Sample (ϕ, H, χ) Suppose K is the number of active regimes (i.e., at least one data point is associated with the regime).

( −1

K ∑

σi−2 ϕi

• The conditional posterior of (ϕ, H) is NW(m, τ , A, a), where m = τ m+ i=1 ( )−1 ( )−1 K K ∑ ∑ σi−2 ϕi ϕ′i + τ −1 mm′ − τ −1 mm′ τ −1 + σi−2 , A = A−1 + , and a = a + K. 1 τ

i=1

i=1

• The conditional posterior of χ is G(d/2, c/2) with d = d +

K ∑ i=1

) , τ=

σi−2 and c = c + Kν.

4. Sample ν The conditional posterior of ν is

(

(χ/2)ν/2 Γ(ν/2)

)K (∏ K

)ν/2 σi−2

exp{−

i=1

ν }. ρν

A Metropolis-Hastings method is applied to sample ν. Draw a new ν from a proposal distribution G( ζνν′ , ζν ) { } ′ ζν p(ν|χ,{σi }K i=1 )fG (ν ; ν ,ζν ) and accept with probability min 1, p(ν ′ |χ,{σ }K )f (ν; ζν ,ζ ) , where ν ′ is the value from the previous i i=1

G

ν′

ν

sweep. The ζν is fine-tuned to produce a reasonable acceptance rate nearing 0.5, as suggested by Roberts et al. (1997) and M¨ uller (1991).

28

B.3

iHMM:hyper

The MCMC algorithm for drawing S, Θ and ϕ, H, χ, ν is the same as the basic iHMM. The sampling procedure of the hyper parameters (γ, c, ρ) and the transition probabilities π0 , P rely on three binary auxiliary variables. The binary variables are denoted by (It , It′ , It′′ ) and defined as

(t)

where ni = t ∑

Iτ (1 −

∑ j

(t)

It

| st−1 = i, S1,t−2 ∼ Ber(

It′

| It = 1 ∼ Ber(1 − ρ),

It′′

| It = 1, It′ = 1 ∼ Ber(

c (t−1)

),

c + ni.

γ ), γ + m(t−1)

(t)

nij , nij is the cardinality of {τ : sτ −1 = i, sτ = j, 1 < τ ≤ t}, m(t) =

Iτ′ )1 (sτ

∑ j

(t)

mj

(t)

and mj =

= j).

τ =1

Note that It here is diﬀerent from that in the basic iHMM. The first variable, It , takes the value of one (zero) if st is a new (old) draw from the distribution (1 − ρ) π0 + ρδj (the shape parameter). The second auxiliary variable, It′ | It = 1, equals one if st is related to the common shape π0 and zero if it is related to the sticky coeﬃcient ρ. The last binary variable, It′′ | It = 1, It′ = 1, takes the value of one (zero) if st is a new (old) draw from π0 .

1. Sample (It , It′ , It′′ ) Let I1 = I1′ = 0 and I1′′ = 1 and start sampling from t = 2. Suppose st−1 = i and st = j. We draw (It , It′ , It′′ ) from

  (0, 0, 0)       

if i ̸= j and from

(t−1)

w.p. nij

(t−1)

(1, 1, 0)

mj , w.p. c(1 − ρ) γ+m (t−1)

(1, 1, 1)

γ/L w.p. c(1 − ρ) γ+m (t−1)

   (0, 0, 0)       (1, 0, 0)   (1, 1, 0)       (1, 1, 1)

(t−1)

w.p. nii w.p. cρ

m

(t−1)

i w.p. c(1 − ρ) γ+m (t−1)

γ/L w.p. c(1 − ρ) γ+m (t−1)

otherwise. 2. Sample (π0 , P ) (T )

• Draw π0 from Dir(m1

(T )

+ γ/L, · · · , mL + γ/L). (T )

(T )

(T )

• Draw πj from Dir(c(1 − ρ)π01 + nj1 , · · · , c(1 − ρ)π0j + cρ + njj , · · · , c(1 − ρ)π0L + njL ). 3. Sample c

29

Let mi =

T∑ −1

It+1 .1 (st = i) and ni =

t=1

T∑ −1

1 (st = i). Fox et al. (2011) show the conditional density of c is

t=1

given by ∑

p(c | ·) ∝ p(c)cni >0

mi

∏ ni >0

∑

Γ(c) Γ(c + ni )

∏ Γ(c)Γ(ni ) Γ(c + ni ) ni >0 ∑ ∫ mi ∏ ∝ p(c)cni >0 xc−1 (1 − xi )ni −1 dxi i ∝ p(c)cni >0

mi

ni >0

We simulate auxiliary variable xi from beta∑ distribution B(c, ni ) for all i such that ni > 0. The conditional ∑ mi

density of c is then given by p(c)cni >0

c log xi

eni >0

. Because we assume that the prior of c is a Gamma ∑ ∑ log xi , cc + mi ). distribution, the posterior is also a Gamma distribution G(dc − ni >0

ni >0

4. Sample ρ T ∑

Let m =

It (1 − It′ ) and n =

t=1

T ∑

It It′ . The sticky coeﬃcient ρ is drawn from B(aρ + m, bρ + n).

t=1

5. Sample γ Define u =

T ∑

It It′ It′′ . The conditional density of γ is proportional to p(γ)γ u

∫

xγ−1 (1 − x)n−1 dx. We draw

t=1

auxiliary variable x from B(γ, n) and then draw γ from G(dγ − log x, cγ + u).

B.4

iHMM:endogenous

The MCMC algorithm is the same as the basic iHMM except for the regime parameter ϕi , the bubble indicator gi and the sticky coeﬃcient ρk (with k = 1, 2).

1. Sample (ϕi , gi ) As in the basic iHMM, we first draw ϕi from the normal distribution N(ϕ, σi2 H −1 ) and calculate gi = 1 + 1(βi < 0). If the sign of βi is the same as that in the previous sweep of the MCMC, we keep the new value. Otherwise, compute the value p(Y1,T | ϕi , S)p(πi | gi , π0 ) and p(Y1,T | ϕ′i , S)p(πi | gi′ , π0 ), where ϕ′i and gi′ are the values from the previous MCMC sweep. Accept the new values of ϕi and gi with probability

{ p(Y | ϕ , S)p(π | g , π )p(ϕ )p(g )f (ϕ′ | ϕ, σ 2 H −1 ) } 1,T i i i 0 i i N i i min 1, , p(Y1,T | ϕ′i , S)p(πi | gi′ , π0 )p(ϕ′i )p(gi′ )fN (ϕi | ϕ, σi2 H −1 )

where fN represents the normal density. 2. Sample (It , It′ , It′′ ) The definitions are the same as those for iHMM:hyper. The sampling procedure of those auxiliary variables are similar to that of iHMM:hyper (with minor modifications). Specifically, let I1 = I1′ = 0 and I1′′ = 1 and

30

begin sampling from t = 2. Suppose gi = k, st−1 = i and st = j. We draw (It , It′ , It′′ ) from

  (0, 0, 0)        if i ̸= j and from

(t−1)

w.p. nij

m

(t−1)

(1, 1, 0)

j w.p. c(1 − ρk ) γ+m (t−1)

(1, 1, 1)

γ/L w.p. c(1 − ρk ) γ+m (t−1)

   (0, 0, 0)       (1, 0, 0)  (1, 1, 0)        (1, 1, 1)

(t−1)

w.p. nii

w.p. cρk m

(t−1)

i w.p. c(1 − ρk ) γ+m (t−1)

γ/L w.p. c(1 − ρk ) γ+m (t−1)

otherwise. 3. Sample ρk Let mk =

T ∑

It (1 − It′ ).1 (gst = k) and nk =

t=2

T ∑

It It′ .1 (gst = k). Draw ρk from B(aρ + mk , bρ + nk ) for

t=2

k = 1 and 2.

B.5

IHMM:mixture

The estimation of iHMM:mixture requires constructing a component indicator. The component indicator zi takes a value of one if θi is drawn from the positive value component and two if it is drawn from the negative value component. Because we use the conjugate prior, we can integrate out nuisance parameters to obtain a Student’s t density function of p(Y1,T | S, zi = k). The conditional odds ratio of zi is given by p(zi = 1 | S, Y1,T ) p(zi = 2 | S, Y1,T )

=

ω ft (Yi∗ | zi = 1) 1 − ω ft (Yi∗ | zi = 2)

=

( ) Γ ν20 Γ ω |H0 | 2 2 ν ( ) ν20 ( ) , 1 − ω |H | 12 ( χ1 ) 21 Γ ( ν0 ) 1 χ0 1 2 2 |H 0 | 2 Γ ν21 2 1 2

′

( χ0 ) ν20

( ν1 )

′

1

|H 1 | 2

−1

(

χ1 2

) ν21

′

′

where χk = χk + Yi∗ Yi∗ + ϕ′k Hk ϕk − ϕk H k ϕk , ν k = νk + ni , ϕk = H k (Hk ϕk + Xi∗ Yi∗ ), H k = Hk + Xi∗ Xi∗ . Yi∗ is the collection of ∆y in regime i, Xi∗ is the corresponding regressors and ni is the number of observations in that regime. 1. Sample zi The sampling of zi is straightforward given the odds ratio described above, such as podd . Namely, zi takes the value of 1 with probability

podd 1+podd

and 2 with probability

1 1+podd .

2. Sample θi | zi = k The sampling algorithm for θi | zi = k is similar to that of the basic iHMM, as it simply uses the prior parameters in the corresponding component.

31

The sampling procedures for the other parameters are exactly the same as those in the basic iHMM.

B.6

iHMM:GARCH

Drawing parameters from iHMM:GARCH is very challenging, because of the dependence structure in both the mean and volatility functions. For simplicity, Haas et al. (2004) impose an independence assumption. Dufays (2012) considers an infinite hidden Markov switching GARCH model (without mean function) and suggests applying the adaptive Metropolis algorithm with random block sampling techniques (Haario et al., 2001). To estimate iHMM:GARCH, we use a simple random walk Metropolis-Hastings algorithm to draw θi and draw all other parameters in the same manner as in the basic iHMM. Because we calibrate the prior of the GARCH parameters, the MCMC suﬀers less from the initial value problem caused by slow mixing.

identifying individuals using ecg beats - Palaniappan Ramaswamy's