Infinite-State Markov-switching for Dynamic Volatility Models : Web ...

Viewer
Transcript

Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays1

Centre de Recherche en Economie et Statistique March 19, 2014

1

Comparison of the two MS-GARCH approximations

Section three of the paper (see Estimation by Bayesian inference and model comparison) details two algorithms to infer the parameters of an MS-GARCH model. These two algorithms differ on the approximation to the MS-GARCH model. A more accurate approximation will lead to a higher acceptance rate as well as a lower autocorrelation between posterior draws. In order to differentiate the two algorithms, we carry out a Monte Carlo study based on simulated data of 1000 observations and analyze the mixing properties. For each simulation we compute the autocorrelation time1 as well as the required time for sampling one effective posterior draw.2 We consider four different MS-GARCH models exhibiting a break in ω with one or multiple switches (CP and MS in the Table 1). The data generated processes differ on their persistence (α + β) that are assumed to be equal across regimes. The Monte Carlo study consists of eight hundred different simulated data (one hundred per DGP per algorithm). The main interest of this study lies in the ability of sampling the state vector. These MCMC simulations therefore fix the GARCH parameters at their MLE and only draw the state vector given these parameters. The Table 1 displays the difference average of the maximum autocorrelation time and the difference average of the effective time for the two algorithms (Kl-MH for the model of Klaassen (2002) minus HMP-MH for the model of Haas, Mittnik, and Paolella (2004)). P The autocorrelation time is computed by batch means (see Geyer (1992)) and is defined as 1+2 ∞ i=1 ρi where ρi is the autocorrelation coefficient of order i between the posterior draws of a state variable. 2 following the formula : autocorrelation time x elapsed time for N posterior draws divided by N 1

Break in ω Persistence

ω1 = 0.1 0.7

0.8

ω2 = 0.7 0.9

0.95

CP MS

90 % CI of the difference between elapsed times for one effective draw [-0.23;0.02] [-0.54;0.02] [-0.75;0.01] [-1.10;0.01] [-0.85;0.15] [-1.05;0.01] [-1.19;0.05] [-1.35;0.29]

CP MS

90 % CI of the difference between autocorrelation times [-20.91;1.06] [-49.03;1.65] [-68.34;0.75] [-90.24;0.11] [-79.05;6.55] [-96.63;-1.10] [-111.96;-1.08] [-109.71;10.15]

Table 1: The differences are as follows Kl-MH minus HMP-MH where Kl-MH denotes ’Klaassens Metropolis-Hastings’ and HMP-MH stands for ’Haas, Mittnik and Paolella Metropolis-Hastings’. A negative value provides evidence in favor of the Kl-MH algorithm. Although almost all confidence intervals include zero, the distributions are skewed to the left whatever the persistence and the number of breaks. The left limit also increases if the persistence and/or the number of switches grow while no systematic rule can be depicted from the table for the right limit. These two observations lead us to believe that the Klaassen model provides a better approximation to the MS-GARCH model than the specification of Haas, Mittnik, and Paolella (2004). Another evidence is given in the simulation exercise of the paper (see subsection 4.3 Comparison with other algorithms). The result is not surprising since the former model keeps track of the preceding variances when a switch in the state occurs. Despite these comments, the HMP model could still be preferred in the IHMM context. Indeed the beam sampler combined with the Klaassen model somewhat complicates the sampling of the state vector. These computational difficulties do not arise with the HMP model.

2

Estimation of the spline-GARCH parameters by SMC sampler

The SMC sampler discretely approximates an artificial sequence of distributions {πn }pn=1 by sequential importance sampling. Let denote x = {α, β, ω0 , ..., ωk+1 }, the set containing all the spline-GARCH parameters. The artificial sequence of distributions is obtained by introducing an increasing function φ : Z → [0, 1] with φ(1) = 0 and φ(p) = 1 such that πi (x|YT ) ∝ f (YT |x)φ(i) f (x)

2

where f(x) denotes the prior density evaluated at x. Note that when n = p, the distribution πp coincides with the posterior distribution of interest. On the contrary, when n = 1, π1 is equal to the prior distribution (if the latter is proper). In the paper of Del Moral, Doucet, and Jasra (2006), a SMC algorithm is provided for sequentially approximating each of the distribution in the artificial sequence. As the function φ is user-defined, one can choose a function that smoothly increases such that, at a specific iteration of the SMC, the approximation of the previous targeted distribution remains close to the current one. The algorithm operates as follows. First, sample N draws {xi1 }N i=1 from the prior distribution and link them to uniform weights {W1i =

1 N N }i=1 .

Then, starting from n = 2 until

n = p, apply the steps (a)-(c) : 1. Correction step : ∀i ∈ [1, N ], Re-weight each particle with respect to the nth posterior distribution w ˜ni = f (YT |xin−1 )φ(n)−φ(n−1) Normalize the particle weights : Wni =

i i Wn−1 w ˜n PN j j . ˜n j=1 Wn−1 w

2. Re-sampling step : Compute the Effective Sample Size (ESS) as ESS = [

N X

(Wni )2 ]−1 .

i=1

if ESS <

3N 4

then re-sample the particles and reset the weights uniformly.

3. Mutation step : Run J steps of an MCMC kernel with invariant distribution πn (xn |YT ) for each particle in the system. At the end of the procedure, an estimate of the marginal likelihood is given by p X N Y

i Wn−1 w ˜ni .

n=2 i=1

The SMC sampler contains many user-defined parameters. As MCMC kernel, we use a random block strategy such as in Chib and Ramamurthy (2010) conjugated with a Metropolis update. The covariance matrix of the proposal distribution, which is a normal one, is directly derived from the particles. The number of MCMC moves J is set to 10. 3

Finally, the function φ is adapted on the fly as proposed by Jasra, Stephens, Doucet, and Tsagaris (2011).

3

The sticky infinite hidden Markov model

The sticky infinite hidden Markov model is based on Dirichlet processes and hierarchical Dirichlet processes. The second Section of the paper (see Model definition) defines the Dirichlet process and its stick-breaking representation. We go further by deepening some of the Dirichlet process properties and by reviewing the concept of hierarchical Dirichlet process and the sticky infinite hidden Markov model.

3.1

More on Dirichlet process

Let be interested in the posterior distribution of G given n i.i.d. draws {θ1 , ..., θn } from G itself (since G is a distribution over Θ). Using the fundamental relation of the Dirichlet process (see equation (1) in the paper), the Bayes theorem as well as the conjugacy of the multinomial distribution with the Dirichlet distribution, it can be shown that n η G0 + G|θ1 , ..., θn ∼ DP (η + n, η+n η+n

Pn

i=1 δθi

n

)

(1)

where the operator δi denotes the probability measure concentrated at i. Considering the partition {θ1 , ..., θn , Θ\{θ1 , ..., θn }}, using equation (1) in the paper and (1), we directly have the relation :

G(θ1 ), ..., G(θn ), G(Θ\{θ1 , ..., θn })|θ1 , ..., θn ∼ Dir(1, ..., 1, η)

(2)

The above equation highlights the discrete nature of the distribution G since the probability of observing one already drawn θ is greater than zero. Moreover it also emphasizes that the expected probability of drawing a new element θ 6= θi , ∀i ∈ [1, n] is equal to η η+n .

To provide more intuition on Dirichlet process, the predictive distribution θn+1 |θ1 , ..., θn can be derived as follows

4

f (θn+1 |θ1 , ..., θn ) = E(G(θn+1 )|θ1 , ..., θn ) n X 1 θn+1 |θ1 , ..., θn ∼ (ηG0 + δθi ) η+n

from eq. 2

i=1

The p´ olya urn methaphor (see Blackwell and MacQueen (1973)) helps for interpreting the last equality. Consider that each possible element θ ∈ Θ is associated to a ball with a specific color and that all these balls are gathered in a urn. The predictive scheme iterates as follows. At the beginning, we randomly pick a ball from the urn (i.e. a draw from G0 ) and identify its color. The ball is afterward dropped again in the urn. Before proceeding to the next step, we put a new ball with the corresponding color in an empty urn. At iteration n, we randomly choose a color from the initial urn (i.e. a draw from G0 ) with probability

η η+n−1

or from the other one otherwise. Since the second urn only

contains balls with already observed colors, the probability of getting a new color (i.e. a new element in Θ) keeps decreasing as long as the scheme evolves. From equation (2), we can also derive the expected number of elements that have been sampled from G0 (denoted, hereafter, by m) given the number of draws n. At iteration n + 1, the probability of sampling θn+1 from G0 is equal to E(m|n) = η

n X i=1

η η+n

which gives

1 η+i−1

limn→∞ E(m|n) = limn→∞ η(log n + C) where C is the Euler-Mascheroni constant that is approximately equal to 0.577. The expected number of distinguished elements (which will denote different regimes in the GARCH model) is by far smaller than the number of observations. Notice that the concentration parameter η has a direct impact on the number of distinct elements.

3.2

The Hierarchical Dirichlet process

The infinite hidden Markov Model assumes an infinite number of states. It models a doubly stochastic Markov chain in which a sequence of multinomial state {s1 , ..., sT } are linked via a state transition matrix and given this unobserved dynamic referring to a set of parameters, the element yt in a sequence of observations {y1 , ..., yT } is drawn from a

5

parametric distribution. Therefore, the structure should ensure that whatever the followed path, if we reach a specific state, it should refer to the same model parameter. For instance the state one should always be related to the same parameter Θ1 . The hierarchical Dirichlet process has been designed on this purpose. The hyper-parameters of the hierarchical Dirichlet process (HDP), (Teh, Jordan, Beal, and Blei (2006)) consist of the base distribution G0 and concentrated parameters η ∈ <+ and λ ∈ <+ . The HDP is defined as follows

G|η, G0 ∼ DP (η, G0 ) and Gj |λ, G ∼ DP (λ, G)

∀j = 1, . . . , n

So Gj |G⊥Gi if i 6= j. As G is a random probability measures over Θ (the support of the base distribution G0 ), the hierarchical process defines a set of random probability measures Gj , one for each group, over Θ. The stick-breaking representation of a HDP can be formulated as follows : G=

∞ X

πk δΘk

and Gj =

k=1

∞ X

pjk δΘk where ∀j = 1, . . . , n

k=1

where Θk ∼ G0 , π = {πk }∞ k=1 ∼ Stick(η) are mutually independent, δΘk is the probability measure concentrated at Θk and {pjk }∞ k=1 |λ, π ∼ DP (λ, π) (as shown in Teh, Jordan, Beal, and Blei (2006)). Notice that by definition of the DP, each Gj (∀j ∈ {1, ..., n}) has the same support which is the support of G. This property of the HDP is essential to develop an infinite hidden Markov model.

The hidden Markov-switching model is driven by two stochastic processes. On one hand a Markov-chain determines a discrete state vector {s1 , ..., sT } and on the other hand the observations follow a specific distribution conditioned to the state vector and the parameters of each regime (yt |st , {Θk }∞ k=1 ∼ F (Θst )). The hierarchical Dirichlet process can build this kind of structure with an infinite number of state (and of Dirichlet processes) and is shortly stated in Table 2.

6

1. Dirichlet process : G =

P∞

k=1 πk δΘk

∼ DP (η, G0 )

π ∼ Stick(η) Θk ∼ G0

Stick-breaking representation of the Dirichlet Process Θk : Parameters of the model related to the state k P 2. Hierarchical Dirichlet processes : Gj |G = ∞ k=1 pjk δΘk ∼ DP (λ, G) pj = {pjk }∞ k=1 ∼ DP(λ, π)

Each row of the transition matrix is driven by a DP

3. Markov-switching model st |st−1 , {pj }∞ j=1 ∼ pst−1 yt |st , {Θk }∞ k=1 ∼ F (Θst )

First order Markovian with transition matrix {pj }∞ j=1 Each state shares the same support (of G0 )

Table 2: Infinite hidden Markov Model (IHMM)

3.3

The sticky parameter

Persistence of regimes is a well-known stylized fact of time series. However the IHMM probability transition matrix does not exhibit any persistence (i.e. E[pjk |λ, π] = πk ∀j (see Table 2)). The IHMM transition actually does not differ between a self-transition and a transition to another state, an unrealistic feature for time series. Fox, Sudderth, Jordan, and Willsky (2011) have developed an IHMM framework which excludes a high probability posterior with rapid switching. They called it ’the sticky HDPHMM’ or ’the sticky IHMM’. They specify a new parameter κ for self-transition bias and set a separate prior on this parameter. Their specification is as follows : π|η ∼ Stick(η) ∀j = 1, . . . , n

pj |λ, π, κ ∼ DP (λ + κ,

λπ + κδj ) λ+κ

An amount κ > 0 is added to the j th component of the (infinite) vector λπ. The new parameter implies a higher probability of staying in the same state in the next period than the original model (i.e. E[pkk |λ, κ, π] =

λπk +κ λ+κ ).

Note that if κ = 0, we come back to the

former specification (i.e the original IHMM of Table 2).

References Blackwell, D., and J. MacQueen (1973): “Ferguson distributions via P´olya urn schemes,” Annals of Statistics, 1, 353–355.

7

Chib, S., and S. Ramamurthy (2010): “Tailored randomized block MCMC methods with application to DSGE models,” Journal of Econometrics, 155(1), 19–38. Del Moral, P., A. Doucet, and A. Jasra (2006): “Sequential Monte Carlo samplers,” The Royal Statistical Society: Series B(Statistical Methodology), 68, 411–436. Fox, E., E. Sudderth, M. Jordan, and A. Willsky (2011): “A sticky HDP-HMM with Application to Speaker Diarization,” Annals of Applied Statistics, 5(2A), 1020– 1056. Geyer, C. J. (1992): “Practical Markov Chain Monte Carlo,” Statistical Science, 7(4), 473–511. Haas, M., S. Mittnik, and M. Paolella (2004): “A New Approach to MarkovSwitching GARCH Models,” Journal of Financial Econometrics, 2, 493–530. Jasra, A., D. A. Stephens, A. Doucet, and T. Tsagaris (2011): “Inference for Lvy-Driven Stochastic Volatility Models via Adaptive Sequential Monte Carlo,” Scandinavian Journal of Statistics, 38, 1–22. Klaassen, F. (2002): “Improving GARCH volatility forecasts with regime-switching GARCH,” Empirical Economics, 27(2), 363–394. Teh, Y., M. Jordan, M. Beal, and D. M. Blei (2006): “Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, 101, 1566–1581.

8