Limits of performance for the model reduction problem of Hidden Markov Models Georgios Kotsalis

Abstract— We introduce system theoretic notions of a Hankel operator, and Hankel norm for hidden Markov models. We show how the related Hankel singular values provide lower bounds on the norm of the difference between a hidden Markov model of order n and any lower order approximant of order n ˆ < n.

I. I NTRODUCTION Hidden Markov Models (HMMs) are one of the most basic and widespread modeling tools for discrete-time stochastic processes, which take values on a finite alphabet. A comprehensive review paper is [3]. Applications of HMMs are met across the spectrum of engineering and science including bio-informatics, econometrics, speech recognition and telecommunications, see for instance [5], [12], [9], [11], [8]. Very often the cardinality of the state space of the underlying Markov chain renders the use of a given HMM for statistical inference or decision making purposes as infeasible, motivating the investigation of possible algorithms that compress the state space without incurring much loss of information. In [15] it was suggested that the concept of approximate lumpability can be used in the context of model reduction of HMMs. Further work on aggregation based model reduction of HMMs can be found in [13], [2]. In contrast to aggregation based methods, in [6] the authors develop a balanced truncation based model reduction algorithm for HMMs, that is characterized by an a priori computable upper bound to the approximation error and does not suffer from certain limitations of aggregation based model reduction as explained in [7]. The system theoretic view of HMMs established in [6] is continued in the current work where concepts of a Hankel operator and Hankel norm of HMMs are introduced. The related Hankel singular values provide lower bounds on the norm of the difference between a HMM of order n and any lower order approximant of order n ˆ < n. While the upper and lower bound to the approximation error are expressed with regards to different norms they do provide the basis for rigorous model reduction of HMMs with a priori quantifiable certificates of fidelity for the low dimensional approximant. Georgios Computer

Kotsalis is Engineering,

with the School of Electrical and Georgia Institute of Technology, [email protected], Jeff S. Shamma is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, [email protected], and with King Abdullah University of Science and Technology (KAUST), [email protected]. Research was supported by AFOSR/MURI Projects FA9550-09-1-0538 and FA9550-10-1-0573.

Jeff S. Shamma

The paper is organized as follows. The next section contains preliminary notions including the statistical description of a HMM and the class of linear systems evolving on homogeneous trees, which will provide the basis for the construction of the input-output Hankel operator of HMMs. A stability concept for the latter class of systems is developed in section III. Section IV contains the derivation of the lower bound with respect to the Hankel norm. We conclude with an example and a summary with considerations for future work. II. P RELIMINARIES A. Notation The set of integers is denoted by Z, the set of positive integers by Z+ , and the set of real numbers by R. For n ∈ Z+ , Rn denotes the Euclidean n-space. The transpose of a column vector x ∈ Rn is xT . For x ∈ Rn , |x|2 = xT x denotes the square of the Euclidean norm. For A ∈ Rn×n , kAk2 = sup|x|=1 |Ax|. The identity matrix in Rn×n is written as In . Let V, U be Banach spaces. The Banach space of all bounded linear operators from V into U is denoted by B(V, U) and the associated induced norm is denoted by k · k. When V = U we write B(V) instead of B(V, V). For a Hilbert space V the inner product is denoted by h·; ·i. The adjoint of the operator L ∈ B(V) is denoted by L∗ . Let Sn = {A ∈ Rn×n | A = AT }. For P ∈ Sn , P  0, (P  0) indicates that it is a positive (semi-) definite matrix. Let Sn+ = {A ∈ Sn | A  0} and Sn++ = {A ∈ Sn | A  0}. 1 For P ∈ Sn+ the notation |x|2P stands for xT P x, and P 2 is the positive semi-definite square root of P . B. Alphabet, strings, language Let k ∈ Z+ and consider the strictly ordered, finite set A = {a1 , . . . , ak }. The set A will be referred to as the alphabet and its elements as letters. We denote by A∗ the set of all finite sequences of elements of A, including the empty sequence, denoted by ∅. The set A∗ is called the language. The finite sequences of letters are called words or strings. Let ∧ denote the concatenation operation, i.e. ∧ : A∗ × A∗ → A∗ . Words are read from right to left, i.e. in the word w = wr ∧. . .∧w1 , where w1 , . . . , wr ∈ A the letter w1 precedes w2 , etc.. The length of the word w is denoted by |w|. The set A∗ equipped with the aforementioned concatenation operation, forms a semi-group. The empty word ∅ is the identity element. The set of words of length r is A(r) = {w ∈ A∗ | |w| = r}. By convention A(0) = {∅}.

In this notation the language can be expressed as A∗ =

∞ [

A(r) .

r=0

One can think of the elements of A∗ as nodes of an infinite acyclic graph, i.e. a homogeneous tree, rooted at ∅. Each node of the graph is labelled by a word in the language. Each level of the graph, consists of words of same length. Let A∗→ correspond to a first, right to left, lexical ordering of A∗ and A∗← correspond to a last, left to right, lexical ordering of A∗ − {∅}. The ordered sets A∗→ , A∗← , are used to distinguish between the evolution of the system starting at the present towards the future and starting at the distant past towards the present. To make this distinction even more pronounced we will append the symbol ∅ to all words in A∗→ except ∅ as a suffix and to all words in A∗← as a prefix. For instance when A = {0, 1} one has A∗→ = {∅, 0∅, 1∅, 00∅, 10∅, 01∅, 11∅, . . .} and A∗← = {∅0, ∅1, ∅00, ∅01, ∅10, ∅11, . . .}. C. Statistical description of a HMM Hidden Markov Models can be defined in many equivalent ways. The basic definitions and notation introduced in the context of realization theory of HMMs will be used. One can find them for instance in slightly varying language in [10], [1], [14]. Let {Y (t)} be a discrete-time, stationary stochastic process over some fixed probability space {Ω, M, P}, with taking values on some alphabet A = {a1 , . . . , ak }. The strict future of the process after time t is denoted by Yt+ = {. . . , Y (t + 2), Y (t + 1)} and Yt− = {Y (t), Y (t − 1), . . .} denotes its past and present. Let v = vk . . . v1 ∈ A∗ the notation {Yt+ ≡ v} stands for the event {ω ∈ Ω | Y (t+k) = vk , . . . , Y (t + 1) = v1 }, by convention {Yt+ ≡ ∅} = Ω. Definition 2.1: The probability function of the process {Y (t)} is a map p : A∗ → R+ where p[v] = Pr[Yt+ ≡ v], ∀v ∈ A∗ , ∀t ∈ Z. Note that since the process is stationary, the value of p[v] in the above definition does not depend on t. It can be readily verified, that the probability function satisfies the properties: p[∅] p[v]

= 1 ∈ [0, 1], ∀v ∈ A∗ , X p[v] = p[vu], ∀v ∈ A∗ , r ∈ Z+ . Definition 2.2: Let {Y (t)}, {Y˜ (t)} be discrete-time, stationary stochastic processes over the same alphabet A. The two stochastic processes are equivalent if ∀t ∈ Z, ∀v ∈ A∗ ≡ v] = Pr[Y˜t+ ≡ v].

Pr[Xt+ ≡ σ, Y˜t+ ≡ v|X(t)]. The above definition insures that {X(t)} is by itself a Markov chain of order n, meaning Pr[Xt+ ≡ σ|Xt− ] = Pr[Xt+ ≡ σ|X(t)]. It also insures that {Y˜ (t)} is a probabilistic function of the Markov chain {X(t − 1)} in the sense that Pr[Y˜t+ ≡ v|Xt− , Y˜t− ] = Pr[Y˜t+ ≡ v|X(t)]. Consider the map M : A → Rn×n where + M [v]ij = Pr[X(t + 1) = i, Y˜ (t + 1) = v|X(t) = j], i, j ∈ X, v ∈ A, t ∈ Z. The state transition matrix of the underlying Markov process {X(t)} is given by X Π= M [v]. v∈A

Rn+ ,

such that Π π = π, 1Tn π = 1. The vector Let π ∈ π corresponds to an invariant distribution of {X(t)}, which is unique if the Markov process has a single ergodic class. Since the processes {Y (t)} and {Y˜ (t)} are equivalent, one has p[v] = Pr[Yt+ ≡ v] = Pr[Y˜t+ ≡ v], ∀t ∈ Z, ∀v ∈ A∗ . Lemma 2.1: Let r ∈ Z+ , v = vr vr−1 . . . v1 ∈ A(r) . The probability of that particular string can be computed recursively according to p[v] = 1Tn M [vr ] M [vr−1 ] . . . M [v1 ] π,

u∈Ar

Pr[Yt+

this work when referring to a stationary stochastic process over the alphabet A, one is thinking of an equivalence class of processes in the sense of (1). No explicit distinction between the members of the equivalence class is made, the concept of strong realization is not used, it is only the statistical description that matters. Definition 2.3: A discrete-time, stationary process {Y (t)} over the alphabet A has a realization as a stationary HMM of size n ∈ Z+ if there exists a pair of discrete-time, stationary stochastic processes {X(t)}, {Y˜ (t)} taking values on the finite sets X = {1, . . . , n} and A respectively, such that {Y (t)} and {Y˜ (t)} are equivalent, the joint process {X(t), Y˜ (t)} is a Markov process and ∀σ ∈ X∗ , ∀v ∈ A∗ , the following “splitting property” holds Pr[Xt+ ≡ σ, Y˜t+ ≡ v|Xt− , Y˜t− ] =

(1)

According to the definition above the two stochastic processes must only coincide in their probability laws in order to be equivalent. They don’t have to be defined on the same underlying probability space {Ω, M, P}. In the context of

Proof: See for instance [1], [14]. The preceding lemma shows that if a given stationary process {Y (t)} over the alphabet A has a realization as a stationary HMM of size n, then its probability function is completely encoded by the ordered triple H = (M, π, 1Tn ). Accordingly in the following discussion referring to a HMM of size n will be in terms of the ordered triple H = (M, π, 1Tn ) that contains its statistical parameters. The set of all HMMs of size n over the alphabet A is denoted by Hn,A .

D. Linear systems on homogeneous trees ∗

We will consider linear systems with evolution on A . This class of systems was introduced in [4], were various system theoretic properties where developed, including the notion of a Hankel operator. For the purposes of this work it suffices to focus on a particular subclass of the structured noncommutative multidimensional systems of [4], namely the noncommutative Fronasini-Marchesini systems. We will simply refer to them as a linear system with evolution on A∗ . In the following fix n, m, p ∈ Z+ and consider the signal spaces X = {x | x : A∗ → Rn }, U = {u | u : A∗ → Rm }, Y = {y | y : A∗ → Rp }. A linear system Σ of order n evolving on A∗ is defined by x(i ∧ w) y(w)

= A(i) x(w) + B(i) u(w), = C x(w)

for



w ∈ A , i ∈ A.

It follows from the definition that rσ [A] ≤ kAk and if A ∈ Sn , then rσ [A] = kAk2 . Let Tn = B[Sn ]. For F ∈ Tn , kFk = sup kF[X]k. kXk=1

The linear map F ∈ Tn is called monotonic if for all X ∈ Sn , X  0 ⇒ F[X]  0. Lemma 3.1: Consider F ∈ Tn , monotonic. For i ∈ Z+ , i F is monotonic and for any X ∈ Sn+ , F i (X)  0. Proof: The proof is immediate by induction. Lemma 3.2: Consider F ∈ Tn , monotonic. The following statements are equivalent: • (i) rσ [F] < 1, •

(ii) F i [X] → 0, for all X ∈ Sn+ ,

(2)

The state, input and output signals are x ∈ X , u ∈ U, y ∈ Y respectively. The parameters of the state space representation have appropriate dimensions, A : A → Rn×n , B : A → Rn×m , C ∈ Rp×n . If an initial condition x(∅) = x∅ ∈ Rn is specified then the output y ∈ Y is uniquely determined by the input u ∈ U. The state space system Σ in (2) will sometimes be denoted solely by its parameters,

(iii) F i (In ) → 0. Proof: (i) ⇒ (ii). This follows directly by considering the Jordan decomposition of the of the finite dimensional linear map F. (ii) ⇒ (iii). This is immediate given that In ∈ Sn+ . (iii) ⇒ (i). Let i ∈ Z+ , one has: •

rσ [F]i = rσ [F i ] ≤ kF i k2 = sup rσ [F i [X]]. kXk2 =1

Σ = (A, B, C). We will refer by Σ→ to the state space description (2) when w ∈ A∗→ . When w ∈ A∗← the associated state space description denoted by Σ← is given by: X x(w) = [A(i) x(w ∧ i) + B(i) u(w ∧ i)], i∈A

y(w)

=

C x(w)

for

w ∈ A∗← , i ∈ A.

(3)

The set of all linear systems on A∗ of order n is denoted establish an immersion of HA = S∞ by Sn,A . One can S∞ H into S = n,A A n=2 n=2 Sn,A in the obvious way. Let f : HA → SA , where f (H) = Σ = (M, π, 1Tn ). This rather innocuous step provides leverage since various system theoretic properties associated with linear systems on homogeneous trees can be transferred to HMMs. Furthermore consider D : SA × SA → R+ be a metric on the set of linear systems evolving on A∗ then one obtains automatically a metric on HMMs by restricting the domain onto f (HA )×f (HA ). On the grounds of this observation the subsequent discussion in regards to linear systems evolving on A∗ pertains to HMMs over the alphabet A as well.

For any X ∈ Sn with kXk2 = 1 it holds that In − X  0. Given that F i is linear and monotonic it follows that F i [In ]  F i [X] and therefore kF i [In ]k2 ≥ kF i [X]k2 . Since kIn k2 = 1 one obtains that kF i [In ]k2 = supkXk2 =1 rσ [F i [X]], and therefore rσ [F]i ≤ kF i [In ]k2 , and the result follows. Lemma 3.3: Let F ∈ Tn , monotonic, B ∈ Sn+ , and consider the equation X = F[X] + B. • •

If rσ [F ] < 1 ⇒ there exists a solution X  0. If there exists a solution X  0 and B  0 then rσ [F] < 1

and in fact X  0. Pk Proof: For k ∈ Z+ let Xk = i=0 F i [B]. Gelfand’s formula states 1 kF i k i → rσ [F]. This allows one to conclude that there exists λ ∈ (rσ [F ], 1) and N ∈ Z+ such that if i > N , kF i k2 ≤ λi . Since kXk k2 ≤

σ(A) = {λ ∈ C | N (λIn − A) 6= ∅},

kF [B]k2 ≤

k X

kF i k2 kBk2 ,

i=0

one can infer that the sequence of partial sums converges, say Xk → X. Furthermore X −F[X] = B and X  B  0. For the second part any solution X satisfies the relations

and the spectral radius of A is rσ [A] = max{|λ| ∈ R+ | λ ∈ σ(A)}.

i

i=0

III. S TABILITY AND SIGNAL SPACES For A ∈ Rn×n , the spectrum of A is defined as

k X

X=

k−1 X i=0

F i [B] + F k [X] 

k−1 X i=0

F i [B]  B  0.

Pk The sequence of partial sums Xk = i=0 F i [B] therefore converges and furthermore F i [B] → 0. By the previous lemma it follows that rσ [F] < 1. In the following we discuss a stability notion for a system evolving on an acyclic graph. We will consider the uncontrolled evolution of the state space recursion. x(i ∧ w) = A(i) x(w), w ∈ A∗ , i ∈ A. System Σ is said to be asymptotically stable if for u(w) = 0, ∀w ∈ A∗ , and x(∅) ∈ Rn , X lim |x(w)|2 = 0. k→∞

w∈A(k)

P Consider T ∈ Tn , where T [X] = i∈A A(i)T X A(i). For k ∈ Z+ , T k is linear and monotonic. Lemma 3.4: Let R in Sn and consider the uncontrolled evolution of the state space recursion. One has X xT (w) R x(w) = x[∅]T T k [R] x[∅]. w∈A(k)

Proof: The statement follows by induction. Lemma 3.5: A given system Σ is stable if and only if rσ [T ] < 1. Proof: The proof of the statement follows a series of equivalences. Consider the uncontrolled evolution of Σ. X ∀x(∅) |x(w)|2 → 0 ⇔ T k [In ] → 0 ⇔ rσ [T ] < 1. w∈A(k)

= {f : A∗→ → Rp |

X

|f (w)|2 < ∞},

w∈A∗ + ← l2,p

→ The adjoint of the observability operator Ψ∗o : l2,p → Rn is defined through the relationship

hΨ∗o (z), xi = hz, Ψo (x)i, P and as such Ψ∗o (z) = w∈A∗ A(w)T C T z(w). We refer to Yo = Ψ∗o Ψo as the observability grammian of the system Σ, and the system Σ will be called observable if Yo  0. The observability grammian satisfies the relationship Yo = P∞ T i [Yo ] + C T C, and furthermore Yo = i=0 T i [C T C]. Let ← Ψc : l2,m → Rn , ← denote the controllability operator where u ∈ l2,m is applied as an input to Σ← and produces X X x = Ψc (u) = A(w)B(i)u(w ∧ i). i∈A w∈A∗

For p ∈ Z+ we define → l2,p

and A(∅) = In . Given this notation and taking the lexicographic ordering of A∗→ in mind one can write the relationship Ψo (x) = y in matrix form,     C y(∅)  y(1∅)   CA(1)          .. ..  x(∅) = Ox(∅).    . .  =   y(w)   CA(w)      .. .. . .

= {f : A∗← → Rp |

X

|f (w)|2 < ∞}

w∈A∗ − → ← → and set l2,p = l2,p ∪ l2,p . For x, y ∈ l2,p , their inner product is given by X hx, yi = xT (w)y(w), w∈A∗ →

and similarly for square summable signals on A∗← . IV. A LOWER BOUND TO THE APPROXIMATION ERROR WITH RESPECT TO THE H ANKEL NORM A. Controllability, observability First we introduce an observability and a controllability concept for a stable system Σ. Consider the uncontrolled evolution of Σ. Let → Ψo : Rn → l2,p

denote the observability operator, where Ψo (x) = y with y(w), w ∈ A∗→ being the output of system Σ under the initial condition x(∅) = x and u(w) = 0, ∀w ∈ A∗→ . It is notationally convenient to extend the map A homomorphically from A to A∗ , i.e. for k ∈ Z+ , A(wk ∧ . . . ∧ w1 ) = A(wk ) . . . A(w1 ),

Taking the lexicographic ordering of can write  u(∅1)  ..  .   u(∅r)   x = C  u(∅11)  ..  .   u(w ∧ i)  .. .

A∗← into account one        ,     

where  C = B(1), . . . , B(r), A(1)B(1), . . . , A(w)B(i),

...



.

← The adjoint of the controllability operator Ψ∗c : Rn → l2,m is defined through the relationship

hΨ∗c (z), xi = hz, Ψc (x)i, and as such Ψ∗c (z) = (. . . , u(w ∧ i), . . .) with u(w ∧ i) = B(i)T A(w)T z,

w ∈ A∗ , i ∈ A.

We refer to Xc = Ψc Ψ∗c , as the controllability gramian of the system Σ, and the system Σ will be called controllable if Xc  0. The gramian satisfies the relationship P controllability T L[X ] + B(i)B(i) = Xc and furthermore Xc = P c P∞ i∈Ak T L [B(i)B(i) ], where L = T ∗ . i∈A k=0

B. Hankel Operator Consider a stable system Σ that is both controllable and observable. The Hankel singular values of Σ are the square roots of the eigenvalues of Xc Yo , i.e. p ith Hankel Singular Value = σi = λi (Xc Yo ) It will be assumed that the σi ’s are ordered such that σ1 ≥ σ2 ≥ . . . ≥ σn . Since the eigenvalues of Xc Yo are non-negative the Hankel singular values are well defined. They are also invariant under coordinate transformations of the particular realization of the system Σ. Let GΣ : l2,m → l2,p be an operator that maps input sequences to output sequences y = GΣ u where y, u are constrained by the state space model Σ← , Σ→ . Let → → denote the projection of a signal in l2,p onto l Πl2,p 2,p . The Hankel operator associated with Σ, denoted by ΓΣ is defined as ← → ΓΣ : l2,m → l2,p , → GΣ u. It follows from the definition that where ΓΣ (u) = Πl2,p if Σ1 and Σ2 are two systems then ΓΣ1 +Σ2 = ΓΣ1 + ΓΣ2 . The induced norm of the Hankel operator is defined as

kΓΣ k =

sup ← ,u6=0 u∈l2,m

kΓΣ uk2 . kuk2

This quantity will be referred to as the Hankel norm of Σ and denoted by kΣkH . The Hankel operator admits a decomposition in terms of the observability and controllability operator. Let y = ΓΣ u, for w ∈ A∗→ one has X X y(w) = CA(w) A(v)B(i)u(v ∧ i) = Ψo Ψc u. i∈A v∈A∗ ←

Expressing the above relation in matrix form gives   u(∅1)     .. y(∅)   .    y(1∅)    u(∅r)       ..   u(∅11)   = OC  .      .  y(w)  .   .     ..  u(v ∧ i)  .   .. .

It follows that xi , yi satisfy Xc yi = σi xi , Yo xi = σi yi . Furthermore hyj , xi i = 0 if i 6= j. The xi ’s can be normalized such that hyi , xi i = σ1i . Now define the functions χj ξj

=

Ψo xj = (. . . CA(w)xj . . .)T

=

Ψ∗c yj

T

w ∈ A∗→

T

= (. . . B(i) A(w) yj . . .)T

w ∈ A∗← , i ∈ A.

From the definition the collections {χj } and {ξj } form orthonormal sets respectively. The functions χj , ξj are called the Schmidt pair associated with σj . By direct substitution one has ΓΣ ξi = σi χi , Γ∗Σ χi = σi ξi , which implies that Γ∗Σ ΓΣ ξi = σi2 ξi ,

ΓΣ Γ∗Σ χi = σi2 χi .

This shows that every eigenvalue of Xc Yo is an eigenvalue of Γ∗Σ ΓΣ and vice versa. As a consequence of this result we obtain a singular value decomposition of ΓΣ and compute its norm. Lemma 4.1: The Hankel operator of system Σ can be written as n X σi hξi , uiχi . ΓΣ u = i=1

← Proof: Let {ξi+n , i ≥ 1} be a set of signals in l2,m ← such that {ξi , i ≥ 1} forms an orthonormal basis for l2,m . The dimension of R(Γ∗Σ ΓΣ ) is equal to n. From this, it follows that

R(Γ∗Σ ΓΣ ) = Span{ξi , i = 1, . . . , n}. ← Furthermore for any u ∈ l2,m

hu, Γ∗Σ ΓΣ ξi+n i = hΓ∗Σ ΓΣ u, ξi+n i = 0, ∀i ≥ 1. The last equality holds because {ξi+n , i ≥ 1} is orthogonal to R(Γ∗Σ ΓΣ ). As such ΓΣ ξi+n = 0 for all i ≥ 1. Then kΓΣ ξi+n k22 = hΓΣ ξi+n , ΓΣ ξi+n i = hΓ∗Σ ΓΣ ξi+n , ξi+n i = 0, ← ∀i ≥ 1. Given any u ∈ l2,m , u can be written as

u=

∞ X

hξi , uiξi .

i=1

The above decomposition shows that despite the Hankel operator being a map between two infinite dimensional spaces, it is a finite dimensional one, since the dimension of its range cannot exceed n. In fact the dimension equals n when the realization is both controllable and observable. Theorem 4.1: The Hankel singular values of Σ are the square roots of the non-zero eigenvalues of Γ∗Σ ΓΣ . Proof: From the decomposition of ΓΣ one has Γ∗Σ ΓΣ = Ψ∗c Ψ∗o Ψo Ψc . Let xi be a right eigenvector of Xc Yo associated with σi , i.e. Xc Yo xi = σi2 xi . Define the vector yi as 1 yi = Yo xi . σi

Then, it follows that ΓΣ u = ΓΣ

∞ X i=1

n n X X hξi , uiξi = hξi , uiΓΣ ξi = hξi , uiσi χi i=1

i=1

From the above calculation it follows that kΓΣ k = kΣkH = σ1 . Theorem 4.2: Let Σ be a stable system of order n that is ˆ k be any controllable and controllable and observable. Let Σ observable system of order k < n. Then: ˆ k kH ≥ σk+1 . kΣ − Σ

Proof: The rank of ΓΣˆ k is k. Consider the subspace M = {f | f =

k+1 X

αi ξi , for αi ∈ C}.

i=1

The dimension of this subspace is k + 1. This implies that there is an element f ∈ M such that ΓΣˆ k f = 0. Normalize f such that f=

k+1 X

k+1 X

αi ξi ,

i=1

|αi |2 = 1.

i=1

It follows that ˆ k kH kΣ − Σ



k(ΓΣ − ΓΣˆ k )f k2 ≥ kΓΣ f k2



k

k+1 X

k+1 X

αi χi k2

=(

i=1

The output process {Y (t)} is a deterministic function of the state {X(t)}, i.e. Y (t) = f (X(t)), with f (x1 ) = f (x2 ) = y1 and f (x3 ) = y3 . Under this condition one has   0.50 0.25 0.25 M [y1 ] =  0.25 0.50 0.25  , 0 0 0   0 0 0 0 0 . M [y2 ] =  0 0.25 0.25 0.50

 Xc = 

R EFERENCES

i=1

V. E XAMPLE We will consider an example of a HMM where X = {x1 , x2 , x3 } , Y = {y1 , y2 }. The underlying Markov chain {X(t)} has statistical parameters     0.50 0.25 0.25 1 1 Π =  0.25 0.50 0.25  , π =  1  . 3 0.25 0.25 0.50 1

and

We provide an immersion of HMMs into the class of linear systems on homogeneous trees which allows one to transfer various system theoretic properties from the latter class to the former, including input-output properties, such as the notion of a Hankel operator and Hankel norm. This allowed us to make use of the related Hankel singular values to provide lower bounds on the norm of the difference between a HMM of order n and any lower order approximant of order n ˆ < n. Future work will focus on relating the Hankel norm presented in this paper to an l2 induced gain norm presented in [6] in regards to the balanced truncation algorithm for HMMs, as well as the optimal Hankel norm model reduction problem for this class of systems.

1

|αi |2 σi2 ) 2

≥ σk+1 .

For this example ones has  2.60 Y0 =  2.60 2.25

VI. S UMMARY

2.60 2.60 2.25

4 5 4 5 2 9

4 5 4 5 2 9

 2.25 2.25  , 2.20 2 9 2 9 64 90

 .

The Hankel singular values are σ1 ≈ 3.43, σ2 ≈ 0.34, σ3 = 0. It should come to no surprise that the smallest Hankel singular value is 0. There exists a 2 state HMM that is statistically equivalent to the original model, in fact in can be obtained by aggregation based model reduction, one needs to form a cluster out of the states x1 , x2 . As such in this example our proposed bound is tight.

[1] B. D. O. Anderson, “The realization problem for hidden Markov models,” Math. Control Signals Syst., vol. 12, no. 1, pp. 80–122, Apr. 1999. [2] K. Deng, G. Mehta, and S. P. Meyn, “Aggregation based model reduction of a hidden Markov model,” in Proc. IEEE Conf. Decision Control, Atlanta, USA, Dec. 2010. [3] Y. Ephraim and N. Merhav, “Hidden Markov processes,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1518–1569, June 2002. [4] G. G. J. A. Ball and T. Malakorn, “Structured noncommutative multidimensional linear systems,” SIAM J. Control Optim., vol. 44, pp. 1474–1528, 2005. [5] T. Koski, Hidden Markov Models for Bioinformatics. Kluwer Academic Publishers, 2001. [6] G. Kotsalis, A. Megretski, and M. Dahleh, “Balanced truncation for a class of stochastic jump linear systems and model reduction for hidden Markov models,” IEEE Trans. on Automatic Control, vol. 53, pp. 2543–2558, 2008. [7] G. Kotsalis and J. S. Shamma, “A counterexample to aggregation based model reduction of hidden markov models,” in Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011, Orlando, FL, USA, December 12-15, 2011, 2011, pp. 6558–6563. [8] I. L. Macdonald and W. Zucchini, Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman and Hall, 1997. [9] E. M. O. Cappe and T. Ryden, Inference in Hidden Markov Models. Springer, 2005. [10] G. Picci, “On the internal structure of finite state stochastic processes,” in Recent Developments in Variable Structure Systems, ser. Lecture Notes in Economics and Mathematical Systems. Springer, 1978, vol. 162, pp. 288–304. [11] L. A. R. J. Elliot and J. B. Moore, Hidden Markov Models: Estimation and Control. Springer, 1995. [12] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257 – 286, Feb. 1989. [13] M. Vidyasagar, “Reduced-order modeling of Markov and Hidden Markov processes via aggregation,” in Proc. IEEE Conf. Decision Control, Atlanta, USA, Dec. 2010, pp. 1810 – 1815. [14] ——, “The complete realization problem for hidden markov models: A survey and some new results,” Math. Control Signals Syst., vol. 23, no. 1, pp. 1–65, 2011. [15] L. B. White, R. Mahony, and G. D. Brushe, “Lumpable hidden Markov models - model reduction and reduced complexity filtering,” IEEE Trans. Autom. Control, vol. 45, no. 12, pp. 2297–2306, Dec. 2000.

Limits of performance for the model reduction problem ...

bounds on the norm of the difference between a hidden Markov model of order n and any lower order approximant of order n

250KB Sizes 0 Downloads 230 Views

Recommend Documents

Performance limitations in sensitivity reduction for ... - ScienceDirect.com
ity optimization to a nonlinear time-varying setting. Keywords: Non-minimum phase systems; nonlinear systems; sensitivity reduction; performance limitations; disturbance re- jection. considered in [7,10,16]. For example it has been shown for plants w

Energy-Based Model-Reduction of Nonholonomic ... - CiteSeerX
provide general tools to analyze nonholonomic systems. However, one aspect ..... of the IEEE Conference on Robotics and Automation, 1994. [3] F. Bullo and M.

report for parameterized model order reduction of rcl ...
Jul 17, 2008 - ... any n x n matrix A and any vector r, the Krylov subspace Kr(A, r, q) is defined as. 8 ..... Programming Language : Python ([4]). Simulation Tool : ...

report for parameterized model order reduction of rcl ...
Consider a multi-input multi-output linear circuit and its time-domain MNA ..... Programming Language : Python ([4]) ...... CPUTIME. SPEEDUP. ERROR%. Height Variation Nominal = 5%, Width Variation Nominal = 5%, Gap Variation Nominal ...

Control relevant model reduction and controller synthesis for ... - Pure
Apr 1, 2010 - degree where computation time for a simulated scenario may take longer than the time ...... For years, the behavioral theory of dynamical systems has been ...... Safety in cars is an important topic in the automotive industry.

Interpolation-Based H_2 Model Reduction for Port-Hamiltonian Systems
reduction of large scale port-Hamiltonian systems that preserve ...... [25] J. Willems, “Dissipative dynamical systems,” Archive for Rational. Mechanics and ...

COLLABORATIVE NOISE REDUCTION USING COLOR-LINE MODEL ...
pose a noise reduction technique by use of color-line assump- .... N is the number of pixels in P. We then factorize MP by sin- .... IEEE Conference on. IEEE ...

Interpolation-Based H_2 Model Reduction for Port-Hamiltonian Systems
Abstract—Port network modeling of physical systems leads directly to an important class of passive state space systems: port-Hamiltonian systems. We consider here methods for model reduction of large scale port-Hamiltonian systems that preserve por

Model Reduction by Moment Matching for Linear Switched Systems
Mar 7, 2014 - tique et Automatique), École des Mines de Douai, 59508 Douai, France ... tation complexity of controller synthesis usually increase.

Model Reduction of Port-Hamiltonian Systems as ...
Rostyslav V. Polyuga is with the Centre for Analysis, Scientific computing and Applications ... Of course, the matrix D and the vector x in. (2) are different from ...

Model Reduction of Port-Hamiltonian Systems as ...
Hamiltonian systems, preserving the port-Hamiltonian structure ... model reduction methods for port-Hamiltonian systems. ..... Available from http://proteomics-.

Energy-Based Model-Reduction of Nonholonomic ...
System (PCHS) is a dynamical system that can be represented by a set of differential ... that this definition is slightly different from the traditional one, in which the constraint ..... can be modulated by some external signal, in this case by sin(

The Extraction and Complexity Limits of Graphical Models for Linear ...
graphical model for a classical linear block code that implies a de- ..... (9) and dimension . Local constraints that involve only hidden variables are internal ...

Enhanced Electrocatalysis for the Reduction of ...
reduction potential (ca. ... (ca. À395 mV). As an example, for a similar concentration of H2O2 (0.5 mM), the reduction peak ..... [11] S. B. Sinnott, J. Nanosci.

A Model for Applying Performance Technology ...
allocation also has to account for the maintenance and support of technology as well as for purchase and ... Support, the seventh element of the model, refers to the need to have a support system in place for faculty ... Phone: (251) 380-2861.

Internet Appendix for Pricing Model Performance and ...
Please note: Wiley-Blackwell is not responsible for the content or ... †University of Toronto, Joseph L. Rotman School of Management, 105 St. George Street, Toronto, Ontario,. Canada M5S 3E6 ...... 8In the empirical application in the paper, we use

pdf-1499\fundamental-performance-limits-in-cross-layer-wireless ...
... the apps below to open or edit this item. pdf-1499\fundamental-performance-limits-in-cross-lay ... and-energy-foundations-and-trendsr-in-communicat.pdf.