Metrics and Topology for Nonlinear and Hybrid ... - Semantic Scholar

Viewer
Transcript

Metrics and Topology for Nonlinear and Hybrid Systems Mih´aly Petreczky and Ren´e Vidal Center for Imaging Science, Johns Hopkins University, Baltimore MD 21218, USA {mihaly,rvidal}@cis.jhu.edu

Abstract. This paper presents an approach to defining distances between nonlinear and hybrid dynamical systems based on formal power series theory. The main idea is that the input-output behavior of a wide range of dynamical systems can be encoded by rational formal power series. Hence, a natural distance between dynamical systems is the distance between the formal power series encoding their input-output behavior. The paper proposes several computable distances for rational formal power series and discusses the application of such distances to various classes of nonlinear and hybrid systems. In particular, the paper presents a detailed discussion of distances for stochastic jump-linear systems.

1 Introduction In this paper, we present several possible definitions of a computable distance for rational formal power series and their representations. The main motivation for studying distances between rational formal power series is that the input-output behavior of various classes of dynamical systems can be encoded by rational formal power series. For example, linear and bilinear systems, switched linear and bilinear systems, finite state hidden Markov models, and linear and bilinear hybrid systems [1–5]. Therefore, one can define a distance between two dynamical systems as the distance between the formal power series that encode the input-output behavior of the systems. By construction, the proposed distances will be invariant under any transformation that preserves the input-output behavior of the systems. In addition, restricting attention to rational formal power series will enable us to compute the distances by using the rational representations of the formal power series. In general, such rational representations can easily be computed from the dynamical system. Another advantage of the proposed approach is that it connects well with identification and realization theory, because several identification methods are based on realization theory and hence on finding an appropriate rational representation of a family of formal power series. Endowing the space of dynamical systems with a topology and a metric not only is an interesting theoretical exercise, but also has several interesting applications. A classical application is in system identification, more precisely, in finding a continuous parameterization and suitable canonical forms of dynamical systems [6–9]. Another important application comes from the field of computer vision, where one is interested in automatically recognizing different types of motions in a video sequence. That is, given a sequence of images depicting moving objects and people at different time instances, we would like to determine automatically the object class, the person identify, and the type of motion we see in the video sequence. For instance, we would like to determine

whether the video sequence depicts a running person or a galloping deer. One of the traditional mathematical tools for recognition and classification is machine learning. However, many of the classical machine learning techniques require a metric on the observation space. Since our observations are video sequences depicting multiple motions, it is rather natural to model such videos as the output of one or more dynamical systems, where each dynamical system describes a particular motion. Our observations are then outputs of dynamical systems, or, after an identification procedure, dynamical systems themselves. Therefore, in order to apply machine learning algorithms for recognizing motions in video sequences, one needs to define a suitable metric and topology on the space of dynamical systems. The study of topological and metric properties of dynamical systems from this point of view is a relatively recent development, see [10–13]. The outline of the paper is as follows. Section 2 presents the background material on the theory of rational formal power series. Section 3 presents the definition of several possible distances for rational formal power series and their rational representations. Section 4 discusses the relationship between formal power series and various classes of dynamical systems. In particular, it presents a detailed description of this relationship as well as a distance for stochastic jump-linear systems. Section 5 discusses the relationship between the results of the current paper and earlier results in the literature, as well as issues concerning the practical computability of the defined distances.

2 Rational Power Series This section presents several results on formal power series that will be used throughout the rest of the paper. The material in Subsections 2.1 and 2.2 can be found in [1]. The results in Subsection 2.3 are, to the best of our knowledge, new. For more details on the classical theory of rational formal power series, the reader is referred to [14, 3, 15]. 2.1

Definition and Basic Theory

Let X be a finite set. We will refer to X as the alphabet. The elements of X will be called letters, and every finite sequence of letters will be called a word or string over X. Denote by X ∗ the set of all finite words from elements in X. An element w ∈ X ∗ of length |w| = k ≥ 0 is a finite sequence w = w1 w2 · · · wk with w1 , . . . , wk ∈ X. The empty word is denoted by ǫ and its length is zero, i.e. |ǫ| = 0. The concatenation of two words v = v1 · · · vk and w = w1 · · · wm ∈ X ∗ is the word vw = v1 · · · vk w1 · · · wm . For any two sets J and A, an indexed subset of A with the index set J is simply a map Z : J → A, denoted by Z = {aj ∈ A | j ∈ J}, where aj = Z(j) for all j ∈ J. Notice that we do not require the elements aj to be all different. A formal power series S with coefficients in Rp is a map S : X ∗ → Rp . We will call the values S(w) ∈ Rp , w ∈ X ∗ , the coefficients of S. We denote by Rp ≪ X ∗ ≫ the set of all formal power series with coefficients in Rp . Consider the indexed set of formal power series Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} with an arbitrary (not necessarily finite) index set J. We will call such an indexed set of formal power series a family of formal power series. A family of formal power series Ψ is called rational if there exists an integer n ∈ N, a matrix C ∈ Rp×n , a collection of matrices Aσ ∈ Rn×n where σ ∈ X

runs through all elements of X, and an indexed set B = {Bj ∈ Rn | j ∈ J} of vectors in Rn , such that for each index j ∈ J and for all sequences σ1 , . . . , σk ∈ X, k ≥ 0, Sj (σ1 σ2 · · · σk ) = CAσk Aσk−1 · · · Aσ1 Bj .

(1)

n

The 4-tuple R = (R , {Aσ }σ∈X , B, C) is called a representation of Ψ , and the number n = dim R is called the dimension of the representation R. If S ∈ Rp ≪ X ∗ ≫ is a single power series, then S will be called rational if the singleton set {S} is rational, and by a representation of S we will mean a representation of {S}. A representation Rmin of Ψ is called minimal if all representations R of Ψ satisfy dim Rmin ≤ dim R. Two e = (Rn , {A eσ }σ∈X , B, e C), e are representations of Ψ , R = (Rn , {Aσ }σ∈X , B, C) and R n×n eσ = called isomorphic, if there exists a nonsingular matrix T ∈ R such that T A e e Aσ T for all σ ∈ X, T Bj = Bj for all j ∈ J, and C = CT . Let R = (Rn , {Aσ }σ∈X , B, C) be a representation of Ψ . In the sequel, we will use . the following short-hand notation Aw = Awk Awk−1· · · Aw1 for w = w1 · · · wk ∈ X ∗ . Aǫ will be identified with the identity map. The representation R is called observable if OR = {0} and reachable if dim R = dim WR , where WR and OR are the following subspaces of Rn \ ker CAw . (2) WR = Span{Aw Bj | w ∈ X ∗ , |w| ≤ n − 1, j ∈ J} and OR = w∈X ∗ ,|w|≤n−1

Observability and reachability of representations can be checked numerically. One can formulate an algorithm for transforming any representation to a minimal representation of the same family of formal power series (see [1] and the references therein for details). Let Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} be a family of formal power series, ∗ ∗ and define the Hankel-matrix HΨ of Ψ as the matrix HΨ ∈ R(X ×I)×(X ×J) , where I = {1, 2, . . . , p} and (HΨ )(u,i)(v,j) = (Sj (vu))i . That is, the rows of HΨ are indexed by pairs (u, i) where u is a word over X and i is and integer in the range 1, . . . , p. The columns of HΨ are indexed by pairs (v, j) where v is a word over X and j is an element of the index set J. The element of HΨ whose row index is (u, i) and whose column index is (v, j) is simply the ith row of the vector Sj (vu) ∈ Rp . The following result on realization of formal power series can be found in [3, 15, 1]. Theorem 1 (Realization of formal power series) Let Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} be a set of formal power series indexed by J. Then the following holds. (i) Ψ is rational ⇐⇒ rank HΨ < +∞. (ii) R is a minimal representation of Ψ ⇐⇒ R is reachable and observable ⇐⇒ dim R = rank HΨ . (iii) All minimal representations of Ψ are isomorphic. (iv) If the rank of the Hankel matrix HΨ is finite, i.e. n = rank HΨ < +∞, then one can construct a representation R = (Rn , {Aσ }σ∈X , B, C) of Ψ using the columns of HΨ ( see [1] for details). 2.2

Realization Algorithm

In this subsection, we present an algorithm for computing a minimal representation of a family of formal power series Ψ from finite data, more precisely, from a finite left-upper

block of the infinite Hankel matrix HΨ . The theorem guaranteeing the correctness of the algorithm, Theorem 2, will also enable us to define a distance between families of rational formal power series. Let Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} be a family of formal power series indexed by a finite set J. Let HΨ,N,M ∈ RIM ×JN be a finite upper-left block of the infinite Hankel matrix HΨ obtained by taking all columns of HΨ indexed by words over X of length at most N , and all the rows of HΨ indexed by words of length at most M . More specifically, HΨ,N,M ∈ RIM ×JN is the matrix whose rows are indexed by elements of the set IM = {(u, i) | u ∈ X ∗ , |u| ≤ M, i = 1, . . . , p}, whose columns are indexed by elements of the set JN = {(v, j) | j ∈ J, v ∈ X ∗ , |v| ≤ N }, and whose entries are defined by (HΨ,N,M )(u,i),(v,j) = (Sj (vu))i . The following algorithm computes a representation of Ψ from HΨ,N +1,N .

Algorithm 1 [16]

(Rr , {Aσ }σ∈X , B, C) = ComputeRepresentation(HΨ,N +1,N )

1: Let r = rank HΨ,N,N and choose j1 , . . . , jr ∈ J, i1 , . . . , ir ∈ {1, . . . , p}, v1 , . . . , vr , u1 , . . . , ur ∈ X ∗ such that for all l = 1, . . . , r, |vl | ≤ N and |ul | ≤ N , and the minor T = ((Sjk (vk ul ))il ))l,k=1,...,r ∈ Rr×r of HΨ,N,N is of rank r. 2: For each symbol σ ∈ X let Aσ ∈ Rr×r be such that Aσ T = Zσ where Zσ = ((Sjk (vk σul ))il )l,k=1,...,r . Let B = {Bj | j ∈ J}, where for each index j ∈ J, the vector Bj ∈ Rr is given by

Let C ∈ Rp×r

Bj = T −1 ((Sj (u1 )i1 ), (Sj (u2 ))i2 , . . . (Sj (ur ))ir )T . ˜ ˆ be given by C = C1 · · · Cr , where Cl = Sjl (vl ), for l = 1, . . . , r.

eN of Theorem 2 ([1, 3, 16]) If rank HΨ,N,N = rank HΨ , then the representation R Ψ returned by ComputeRepresentation is minimal. Furthermore, if rank HΨ ≤ N , or, equivalently, there exists a representation R of Ψ , such that dim R ≤ N , then eN is a minimal representation of Ψ . rank HΨ = rank HΨ,N,N , hence R

From a computational point of view, algorithm ComputeRepresentation may not be the best way to compute a representation of Ψ . However, we have chosen to present it, because it makes theoretical reasoning easier. The algorithm is essentially a reformulation of the construction presented in [16]. An alternative algorithm, which uses the factorization of the finite Hankel-matrix HΨ,N,N +1 can be found in [1].

2.3

A Notion of Stability for Formal Power Series

Since our ultimate goal is to compare formal power series, we might want to restrict our attention to formal power series that are stable in some sense. In this subsection, we consider the notion of square summability for formal power series, and translate the requirement of square summability into algebraic properties of their representations.

Consider a formal power series S ∈ Rp ≪ X ∗ ≫, and denote by || · ||2 the Euclidean norm in Rp . Consider the following sequence, Ln =

n X X

k=0 σ1 ∈X

···

X

||S(σ1 σ2 · · · σk )||22 .

(3)

σk ∈X

The series S will be called square summable, if the limit limn→+∞ Ln exists and is finite. We will call the family Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} square summable, if for each j ∈ J, the formal power series Sj is square summable. We now characterize square summability of a family of formal power series in terms of the stability of its representation. Let R = (Rn , {Aσ }σ∈X , B, C) be an arbitrary representation of Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J}. Assume that X = {σ1 , . . . , σd }, where d is the number of elements of X, and consider the matrix 

Aσ1 Aσ1 eR =  A   Aσ1

⊗ Aσ1 Aσ2 ⊗ Aσ1 Aσ2 .. .

⊗ Aσ2 ⊗ Aσ2 .. .

· · · Aσd · · · Aσd .. .

 ⊗ Aσd ⊗ Aσd  2 2   ∈ Rn d×n d , ..  .

(4)

⊗ Aσ1 Aσ2 ⊗ Aσ2 · · · Aσd ⊗ Aσd

eR is stawhere ⊗ denotes the Kronecker product. We will call R stable, if the matrix A ble, i.e. if all its eigenvalues λ lie inside the unit disk (|λ| < 1). We have the following. Theorem 3 A rational family of formal power series is square summable if and only if all minimal representations are stable. Notice the analogy with the case of linear systems, where the minimal realization of a stable transfer matrix is also stable.

3 Distances for Rational Power Series The goal of this section is to present a notion of distance for families of rational formal power series, or equivalently, a distance between their minimal representations. The choice of a distance is by no means unique, in fact, we will suggest several different distances. The common feature of all these distances is that they all can be computed either from a minimal representation of the family, or from a big enough but finite set of values of the formal power series constituting the families. Through the section, we will fix the space of coefficients Rp and the alphabet X. Also, we will fix a finite index set J and consider the space of all rational families of formal power series indexed by J, i.e. PJ = {Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} | Ψ is rational }. Define the subset PJ,N = {Ψ ∈ PJ | rank HΨ ≤ N } of all rational families of formal power series whose Hankel-matrix is S of rank at most N . Then it is easy to see that PJ,N ⊆ PJ,K for all N ≤ K, and PJ = +∞ N =0 PJ,N .

3.1

Distances Based on Truncation

We will first consider distances based on truncation, that is distances that compare finitely many values of formal power series. Fix a natural number N , and denote by m = card(J) the cardinality of J. Since J is finite, without loss of generality, we can assume that J = {1, . . . , m}. Assume that the alphabet X is of the form X = {z1 , . . . , zd } where d is the number of elements of X. For each N ≥ 1, let FN,J : RmpN × RmpN → R be a distance on RmpN and denote by F = {FN,J | N ∈ N} the family of distances. We now define a pseudo-metric on PJ using the family of distances F . The main idea is the following. If S ∈ Rp ≪ X ∗ ≫ is a formal power series, then it can be P2N +1 viewed as a map S : X ∗ → Rp on words over X. There are M (N ) = j=0 dj words of length at most 2N + 1 over the alphabet X, if X has d elements. Hence, we can view the restriction of the map S to the set of all words of length at most 2N + 1 as a vector in RM(N )p . We can then define the distance dF,N,J (Ψ1 , Ψ2 ) between two families of formal power series indexed by J, Ψ1 and Ψ2 , as the distance FM(N ),J (φ1 , φ2 ) between the vectors φ1 and φ2 in RmpM(N ) representing the restriction of the elements of Ψ1 and Ψ2 , respectively, to the set of words of length at most 2N + 1. More formally, 1. Define an enumeration of all the words over the alphabet X as the bijective map ψ : X ∗ → N defined as follows. For the empty word ǫ, let ψ(ǫ) = 0 and for each letter zi , i = 1, . . . , d, let ψ(zi ) = i. Then, for each word of the form w = vzj , j = 1, . . . , d, v ∈ X ∗ define ψ(w) recursively as ψ(w) = d · ψ(v) + j. 2. Denote by X ≤2N +1 = {w ∈ X ∗ | |w| ≤ 2N + 1} the set of all words on X of length at most 2N + 1. Notice that the restriction of ψ to the set X ≤2N +1 yields a bijective map with the range [0, M (N ) − 1]. T T 3. For each S ∈ Rp ≪ X ∗ ≫ define πN (S) as the vector (Z0T , Z1T , . . . , ZM(N )−1 ) pM(N ) −1 p in R , where Zi = S(ψ (i)) ∈ R . Since the integer i goes through all the values [0, M (N ) − 1], ψ −1 (i) goes through all possible words of length at most 2N + 1. Hence πN (S) is just the vector of all values S(w) where |w| ≤ 2N + 1. 4. For each rational family of formal power series Ψ = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} define the vector πJ,N (Ψ ) = (πN (S1 )T , . . . , πN (Sm )T )T ∈ RmpM(N ) . That is, πJ,N (Ψ ) is obtained by stacking up the vectors πN (S1 ), . . . , πN (Sm ) representing the values of S1 , . . . , Sm on words of length at most 2N + 1. 5. For each Ψ1 , Ψ2 ∈ PJ , define the functions dF,N,J : PJ × PJ → R by dF,N,J (Ψ1 , Ψ2 ) = FM(N ),J (πJ,N (Ψ1 ), πJ,N (Ψ2 ))

(5)

We then have the following result. Lemma 1 (Properties of dF,N,J ) dF,N,J is a pseudo-distance in PJ . That is, for each Ψ1 , Ψ2 , Ψ3 ∈ PJ dF,N,J (Ψ1 , Ψ2 ) = dF,N,J (Ψ2 , Ψ1 ) ≥ 0, dF,N,J (Ψ1 , Ψ2 ) ≤ dF,N,J (Ψ1 , Ψ3 ) + dF,N,J (Ψ2 , Ψ3 ) Ψ1 = Ψ2 =⇒ dF,N,J (Ψ1 , Ψ2 ) = 0.

and

(6)

The following theorem formulates an important property of dF,N,J and it relies on the partial realization result of Theorem 2. Theorem 4 (Distance for Rational Formal Power Series) The restriction of dF,N,J to PN,J is a distance. That is, in addition to the properties listed in Lemma 1, the following holds. ∀ Ψ1 , Ψ2 ∈ PN,J :

Ψ1 = Ψ2 ⇐⇒ dF,N,J (Ψ1 , Ψ2 ) = 0

(7)

Proof. If Ψ1 and Ψ2 belong to PN,J , then by Theorem 2 rank HΨi ,N,N = rank HΨi holds for i = 1, 2. It is easy to see that dF,N,J (Ψ1 , Ψ2 ) = 0 if and only if πN,J (Ψ1 ) = πN,J (Ψ2 ), i.e. the values of the elements of Ψ1 and Ψ2 coincide for all the words of length at most 2N +1. Hence, HΨ1 ,N +1,N = HΨ2 ,N +1,N . Therefore, by Theorem 2, the eN produced by Algorithm 1 with the input HΨ1 ,N +1,N is a minimal representation R representation of both Ψ1 and Ψ2 , which implies that Ψ1 = Ψ2 . 3.2

The Hilbert Space of Square Summable Families of Formal Power Series

In what follows we will define a scalar product on the space of square summable rational families of formal power series. With this scalar product the space of square summable rational families becomes a Hilbert space, and the corresponding distance will take all values of the formal power series into account. Consider the set Ps,J = {Ψ ∈ PJ | Ψ is square summable } of square summable rational families of formal power series. Assume that J is finite. It is clear that Ps,J is a vector space, if we define addition and multiplication by a scalar as follows. Let Ψ1 = {Sj ∈ Rp ≪ X ∗ ≫| j ∈ J} and Ψ2 = {Tj ∈ Rp ≪ X ∗ ≫| j ∈ J} be two square summable rational families of formal power series. Then for each α, β ∈ R, let αΨ1 + βΨ2 = {αTj + βSj ∈ Rp ≪ X ∗ ≫| j ∈ J}, where αSj + βTj is just the usual point-wise linear combination on formal power series ([14]), i.e. for all w ∈ X ∗ , (αSj + βTj )(w) = αSj (w) + βTj (w). Now, for Ψ1 and Ψ2 defined as before, define the bilinear map < ·, · >J : Ps,J × Ps,J → R as X X . < Ψ1 , Ψ2 >J =< ·, · >J (Ψ1 , Ψ2 ) = Sj (w)T Tj (w) (8) j∈J w∈X ∗

Since J is finite and Sj , Tj are both square summable, the infinite sum in (8) is well defined and finite. The following lemma characterizes some properties of < ·, · >J . Lemma 2 The map < ·, · >J is a scalar product and the space Ps,J with the scalar product < ·, · >J is a Hilbert space. As a consequence, we can view Ps,J as a normed space with the norm ||.||J induced by < ·, · >J , i.e. sX X p . (9) ||Sj (w)||22 = < Ψ, Ψ >J . ||Ψ ||J = j∈J w∈X ∗

The following theorem gives a formula for computing < Ψ1 , Ψ2 >J for all Ψ1 , Ψ2 ∈ Ps,J , provided that the representations of Ψ1 and Ψ2 are available.

Theorem 5 For i = 1, 2, assume that Ri = (Rni , {Ai,σ }σ∈X , Ci , Bi ) is a stable representation of Ψi ∈ Ps,J and that Bi = {Bi,j ∈ Rni | j ∈ J}. Then there exists a unique solution P ∈ Rn1 ×n2 to the Sylvester equation X P = AT1,σ P A2,σ + C1T C2 (10) σ∈X

and the scalar product < Ψ1 , Ψ2 >J can be written explicitly X T < Ψ1 , Ψ2 > J = B1,j P B2,j .

(11)

j∈J

Notice from Theorem 3 that if R1 and R2 are minimal representations of Ψ1 and Ψ2 in Ps,J , respectively, then the condition of Theorem 5 holds. Hence we can use any minimal representation to compute < Ψ1 , Ψ2 >J . From this we may compute the distance between Ψ1 and Ψ2 as ||Ψ1 − Ψ2 ||2J =< Ψ1 , Ψ1 >J −2 < Ψ1 , Ψ2 >J + < Ψ2 , Ψ2 >J .

4

Rational Power Series and Input-Output Behavior of Dynamical Systems

The main motivation for introducing the framework of rational formal power series is that it provides a common algebraic framework for realization theory and system identification of a wide-variety of input/output systems. The classes of systems whose behaviors can be described in terms of rational formal power series include linear systems [17, 9], bilinear systems, [15, 3, 2, 18], multidimensional systems [4], finite state hidden Markov models [5], continuous-time linear and bilinear switched systems [1] and continuous-time linear and bilinear hybrid systems [1]. Hence, if we pick two dynamical systems Σ1 and Σ2 from any of the classes mentioned above we can compare them as follows. We can construct the families of formal power series Ψ1 and Ψ2 corresponding to the input-output behaviors of Σ1 and Σ2 , respectively. Then, we can compare Ψ1 and Ψ2 using one of the distances defined in Section 3. Note that the choice of the families Ψi , i = 1, 2 is unique if Σi belongs to one of the classes of systems described in the previous paragraph. Alternatively, we can construct the rational representations RΣ1 and RΣ2 of the families Ψ1 and Ψ2 respectively. In general, the representations RΣ1 and RΣ2 can be easily computed from the parameters of Σ1 and Σ2 . Then, we can use RΣ1 and RΣ2 to compute the distance between Ψ1 and Ψ2 . This approach is particularly appealing if Ψ1 and Ψ2 are square summable and one wants to use the norm (9). Notice, that even if Ψ1 and Ψ2 are square summable, RΣ1 or RΣ2 might fail to be stable. In this case we have to minimize RΣ1 and RΣ2 first, and use the resulting stable minimal representations (see Theorem 3 and Theorem 5) for computing the distance. Algorithms for minimizing representations can be found in [1]. Notice also that, in general, there need not be any connection between square summability of Ψi , i = 1, 2 and stability of the dynamical systems Σi , i = 1, 2. In what follows, we will demonstrate the use of rational formal power series for stochastic discrete-time jump-linear systems. This class of hybrid systems has a wide variety of applications including computer vision. To the best of our knowledge, the relationship between stochastic jump-linear systems and formal power series presented here is new, though there are some similarities between our approach and that in [18].

4.1

Stochastic Jump-Linear Systems

The terminology and notation used in this section is based on the conventions adopted in literature, see [19, 9]. A stochastic jump-linear systems [19] is a discrete-time stochastic system described by the equations ( x(k + 1) = Aθ(k) x(k) + Bθ(k) v(k) Σ: . (12) y(k) = Cθ(k) x(k) and o(k) = λ(θ(k)) Here, x, θ, y, o and v are stochastic processes of the following form. The process x is called the continuous state process and takes values in the continuous-state space Rn . The process θ is called the discrete state process and takes values in the set of discrete states Q = {1, 2, . . . , d}. The process y is the continuous output process and takes values in the set of continuous outputs Rp . The process o is the discrete output process and takes values in the set of discrete outputs O = {1, 2, . . . , l}. Finally, the process v is the continuous noise and takes values in Rm . The matrices Aq , Bq , Cq , q ∈ Q, are of the form Aq ∈ Rn×n , Bq ∈ Rn×m , and Cq ∈ Rp×n . The map λ : Q → O is called the readout map and it assigns a discrete output to each discrete state. We will assume that E[v(k)v(l)] = δk,l I and E[v(k)] = 0, for all k, l ∈ N, that is v is a zero mean process and v(k), k ∈ N are uncorrelated. Furthermore, we will assume that for each k, l ∈ N, x(0), v(k), θ(l) are mutually independent random variables. We will also assume that the state-transition of the Markov process θ is governed by the transition probabilities pq1 ,q2 , q1 , q2 ∈ Q, where pq1 ,q2 is the probability that θ changes its value from q2 to q1 , i.e. pq1 ,q2 = Prob(θ k+1 = q1 | θk = q2 ). In addition, we will assume that the initial probability distribution of θ is given by the vector π = (π1 , . . . , πd )T ∈ Rd , where πq = Prob(θ(0) = q) denotes the probability that the process θ is in state q at time 0. The evolution of system (12) is as follows. At each time instant k, the continuous state x and the continuous output y change according to the discrete-time stochastic linear system (Aθ(k) , Bθ(k) , Cθ(k) ). The discrete state process θ, together with the discrete output process o, form a finite state hidden Markov model [5]. In the next subsection, we study the concept of realization for stochastic jump-linear systems. To that end, we will assume that the stochastic processes x and y are widesense stationary and zero mean, which is guaranteed under the following assumptions. Assumption 1 The Markov process θ is stationary and ergodic, hence for all q ∈ Q, P p π = πq . q,s s s∈Q

Assumption 2 There exists n × n matrices Pq , q ∈ Q, such that for each q ∈ Q X Pq = pq,s As Ps ATs + Bs BsT pq,s πs , (13) s∈Q

E[x(0)] = 0, and E[x(0)x(0)T χ(θ(0) = q)] = Pq , where χ denotes the indicator function, i.e. χ(A) = 1 if the event A is true, and χ(A) = 0 otherwise. These assumptions are not particularly strong. For instance, under suitable conditions [19], there is a unique collection of positive semi-definite matrices Pq such that (13) holds.

4.2 Realization of Stochastic Jump-Linear Systems Recall the notion of weak realization for linear stochastic systems [9]. In this subsection, we will formulate a similar concept for stochastic jump-linear systems. Consider a stationary process e o taking values in the finite output space O, and a e taking values in the continuous wide-sense stationary, zero-mean stochastic process y output space Rp . Let O+ be the set of all nonempty words in O, i.e. O+ = O∗ \{ǫ}. For all o0 , o1 , . . . , ok ∈ O, k ≥ 0, define the maps Peo : O+ → R and Ceo,ey : O+ → Rp×p Poe(o0 o1 · · · ok ) = Prob(e o(i) = oi , i = 0, . . . , k) Ceo,ey (o0 o1 · · · ok ) = E[e y(k)e y(0)T χ(e o(i) = oi , i = 0, . . . , k)].

(14)

e, Notice that the map Poe gives the probability distribution of the stochastic process o e (k) and y e (0), provided that the proces o e while the map Ceo,ey gives the covariance of y takes values o0 , . . . , ok in the first k+1 time instances. That is, Ceo,ey collects information e .1 on the second-order moments of y Consider now a jump-linear system Σ of the form (12) and recall the definition of the processes y and o. If Assumption 1 and Assumption 2 hold, then y is wide-sense stationary and zero-mean and o is stationary. Hence, Co,y and Po are well-defined. In fact, they depend only on the matrices Aq , Bq , Cq , q ∈ Q, the discrete state-transition probabilities pq1 ,q2 , q1 , q2 ∈ Q, the probability distribution of the initial discrete-state π, and the readout map λ. To emphasize that Co,y and Po depend only on the parameters of Σ, we will denote Co,y by CΣ and Po by PΣ . These maps are important, because they contain information about the probability distribution of the output processes generated by Σ. In fact, the following is true. Proposition 1 If x(0) and v are Gaussian, Q = O, and λ = id, i.e. the discrete state is fully observed, then the map CΣ uniquely determines the distribution of y. The assumption that Q = O is critical here. Intuitively, the more information about the discrete state is preserved by the discrete output, i.e. the closer o is to θ, the better estimate of the probability distribution of y is provided by CΣ . With the notation above, we are now ready to define a notion of weak realization for e be a wide-sense stationary, zero mean Rp -valued stochastic jump-linear systems. Let y e be a stationary O-valued process. A stochastic-jump linear system Σ process, and let o o), if Peo = PΣ and Ceo,ey = CΣ . is said to be a weak stochastic realization of (e y, e e) imposes some Clearly, the fact that Σ is a weak-realization of the processes (e y, o e . We will now show o and y constraints on the probability distribution of the processes e e) has a weak realization by a stochastic jump-linear system, only if certain that (e y, o families of formal power series are rational. We will construct two families of formal power series Ψeo,ey and Seo based on the maps Ceo,ey and Peo , respectively, as follows. Let X = O = {1, 2, . . . , l} and Jeo,ey = {1, . . . , p}×O be, respectively, the alphabet and the index set over which the formal power series will be defined. For each integer i = 1, . . . , p, letter o ∈ O, and word w ∈ O∗ , let Ceo,ey,(i,o) (w) ∈ Rlp be the ith column of the matrix T T Ceo,ey (ow1), CeoT,ey (ow2), · · · , CeoT,ey (owl) ∈ Rlp×p . (15) 1

Notice the similarity between Coe,ey and the generalized covariances in [18].

We define the family of formal power series Ψeo,ey associated with Ceo,ey as Ψeo,ey = {Ceo,ey,(i,o) ∈ Rpl ≪ O∗ ≫| (i, o) ∈ Jeo,ey }

(16)

The construction of Seo is much simpler. We can simply identify the map Peo with a formal power series Seo ∈ R ≪ O∗ ≫ by defining Seo (ǫ) = 1 for the empty word and Seo (w) = Peo (w) for all w ∈ O+ . By abuse of notation we will denote Seo simply by e). y, o Peo . We will call (Ψeo,ey , Peo ) the pair of formal power series associated with (e The next step is to define a pair of representations (RΣ,C , RΣ,D ) associated with a jump-linear system Σ of the form (12). We define the representation RΣ,C as follows. For each o ∈ O and q, q1 , q2 ∈ Q, let Cqo = Cq χ(λ(q) = o) ∈ Rp×n , Aoq1 ,q2 = pq1 ,q2 χ(λ(q2 ) = o)Aq2 ∈ Rn×n , and Bqo = Pq (Cqo )T ∈ Rn×p , where Pq ∈ Rn×n is the matrix defined in (13). Using this eo ∈ Rnd×nd , C e ∈ Rlp×nd and B eo ∈ Rnd×p , o ∈ O, as notation, define the matrices A   1 1   o  o A1,1 Ao1,2 · · · Ao1,d C1 C2 · · · Cd1 B1 C12 C22 · · · Cd2  B2o  Ao2,1 Ao2,2 · · · Ao2,d     e= eo =  eo =  A  .. .. .. ..  , and B  ..  .  .. ..  , C .. ..      . . . . . .  . . . Aod,1 Aod,2 · · · Aod,d

C1l C2l · · · Cdl

Bdo

e(i,o) ∈ Rnd be the ith column of B eo and Then, for each o ∈ O, and i = 1, . . . , p let B nd e = {B e(i,o) ∈ R | (i, o) ∈ Jo,y } indexed by Jo,y = {1, . . . , p} × O. define the set B We define the representation RΣ,C as eo }o∈O , B, e C). e RΣ,C = (Rnd , {A

(17)

RΣ,D = (Rd , {Mo }o∈O , {π}, e),

(18)

As per the representation RΣ,D , we define it as

where e = (1, 1, . . . , 1) ∈ R1×d , and for each o ∈ O and q1 , q2 ∈ Q, the (q1 , q2 ) entry of the matrix Mo ∈ Rd×d is defined from the transition probabilities of the process θ as pq1 ,q2 χ(λ(q2 ) = o). Notice the similarity between the definition of RΣ,D and the definition of a quasi-realization for the finite state hidden Markov model formed by (θ, o) given in [5]. We these definitions, we have the following result. Theorem 6 (Weak Realization) A jump-linear system Σ of the form (12) is a weak e) if and only if RΣ,C is a representation of Ψeo,ey and RΣ,D is a realization of (e y, o e) admits a weak stochastic realization by a stochastic representation of Peo . Hence, (e y, o jump-linear system, only if Ψeo,ey is a rational family of formal power series and Peo is a rational formal power series. An important implication of the theorem above is the following. If we know that the e) admit a weak stochastic realization by a stochastic jump-linear sysprocesses (e y, o tem, then we can find representations of Ψeo,ey and Peo from finite data. More precisely, if rank HΨoe,ey ≤ N and rank HPoe ≤ N , then from Ceo,ey (o0 · · · ok ), Peo (o0 · · · ok )

k ≤ 2N + 1, o0 , . . . , ok ∈ O, we can construct the Hankel matrices HΨoe,ey ,N +1,N and HPoe ,N +1,N and compute a representation Reo,ey of Ψeo,ey and Reo of Peo respectively. e) has a weak stochastic realization by a jump-linear Note that if we know that (e y, o system Σ with a continuous state-space of dimension n and a discrete state-space of cardinality d, then we can take N ≥ nd > 0. Finally, recall that the problem of estimating Ceo,ey (o0 · · · ok ) and Peo (o0 · · · ok ) is a classical statistical problem. In particular, e and o e are ergodic, then these quantities can easily be estimated from a long enough if y sequence of measurements. 4.3 Distances between Stochastic Jump-Linear Systems Imagine we would like to compare the probability distributions of the output processes e1 ), and (e e2 ) of two stochastic jump-linear systems Σ1 and Σ2 , respectively. (e o1 , y o2 , y We can do that by using one of the distances defined in Section 3 to compare their associated pairs of families of formal power series: Ψeo1 ,ey1 with Ψeo2 ,ey2 and Peo1 with Peo2 . When Σ1 and Σ2 are known, we can construct the representations RΣi ,C and RΣi ,D , i = 1, 2. Then, we can use RΣi ,C , i = 1, 2 to compute the distance between Ψeo1 ,ey1 and Ψeo2 ,ey2 . Likewise, we can use RΣi ,D , i = 1, 2 to compute the distance between Peo1 and Peo2 . The advantage of using distances on formal power series is even more apparent if Σ1 and Σ2 are unknown, because the identification of stochastic jump-linear systems is poorly developed.2 Instead, one could use the estimates of finitely many values of Ceoi ,eyi and Peoi , i = 1, 2 to compute the minimal representations RC,i of Ψeoi ,eyi , i = 1, 2 and RD,i of Peoi , i = 1, 2, and use the computed representations to compare the behavior of the two systems. The procedure for computing such representations from their Hankel matrices is known [1, 15, 16] and it is likely to be computationally less costly than identifying the original jump-linear systems.

5

Discussion and Conclusion

In this paper several definitions of distances for rational formal power series and rational representations were presented. It was argued that the results can be used to define metrics and topology on the space of a wide variety of dynamical systems. The key argument is that for many classes of dynamical systems there is a correspondence between the input-output behaviors of the systems and rational formal power series. In particular, this is the case for a number of hybrid systems and some nonlinear systems. To the best of our knowledge, the problem of distances between hybrid systems had not been addressed so far. In the case of nonlinear systems, there are some results on the topological and geometric structure of the space of bilinear systems, see for example [16], where the algebraic variety structure of that space was described. In contrast, there is a fair amount of literature on distances between linear systems and on the topological and geometric structure of the space of linear systems. Note the relationship between input-output maps and output processes of linear systems and families of formal power 2

Even in the linear case, the full identification procedure for linear stochastic systems is computationally costly.

series over the one letter alphabet X = {z}. Because of this correspondence, any distance on rational families of formal power series will give us a distance between linear systems. Spaces of equivalence classes of minimal linear systems were already studied before, for both the stochastic and deterministic settings. In fact, it was shown in [7] m,p that, for each N , the set of all equivalence classes MN of minimal linear systems of dimension N with m inputs and p outputs forms both an analytic manifold and an algebraic variety and admits a natural topology. Here two minimal linear systems belong to m,p,a the same equivalence class if they are algebraically similar. Denote by MN the set of equivalence classes of stable minimal linear systems. Then it was shown in [8, 6] that m,p,a m,p m,p,a MN and MN are diffeomorphic as analytic manifolds and the topology of MN as an analytic manifold can be obtained by the metric induced by the H2 norm. It is easy to see that the distance induced by the H2 norm is a particular case of the distance induced by the norm (9), if we identify equivalence classes of minimal linear systems with equivalence classes of minimal rational representations (two minimal representations are equivalent, if they are isomorphic). More recent papers on distances between stochastic linear systems can be found in [20, 21, 12]. In particular, [12] introduces the trace distance between linear systems and gives a formula to compute it. Surprisingly, the distance induced by the norm (9) is closely related to the trace distance. In practical situations, the families of formal power series are likely to encode the external behaviors of some dynamical systems. In such cases, the available information is either a rational representation of each family of formal power series, or a finite collection of values of the formal power series. If we are given a rational representation RΨ of each family Ψ , then any of the distances described in Section 3 can easily be computed. Notice that if Ψ is square summable, then we can minimize RΨ and the obtained minimal representation of Rm,Ψ will be stable, due to Theorem 3. Hence, using Theorem 5 we can compute the distance induced by < ·, · >J by solving the corresponding Sylvester equation from Theorem 5. Hence, except for the computational complexity, there are few restrictions on using any of the distances. Therefore, one may choose different distances depending on the particular application and the computational costs. Note that the issue of computational complexity is still open for the distances we presented. However, the computational cost of computing distances of type dF,N,J between formal power series is exponential in N , if the underlying alphabet X has more than one element. If only the finite collection of values is available then the task of choosing the right distance is more complex. Assume that we know the values of the elements of the families for all words of length at most N . Then there are two cases to be considered. If N ≥ 2M + 1 and M = rank HΨ for all families Ψ involved, then we may apply the algorithm described in Theorem 2 to compute a minimal representation RΨ of Ψ . If Ψ is a square summable family, then the resulting representation RΨ is stable. Even in this ideal case when N is big enough several issues require attention. First of all, computing RΨ might be computationally expensive. If we want to compute one of the distances dF,M,J , M ≤ N , then we might do better by using the data directly, rather than computing the representations first and then computing the distance from the representations. However, computing the representations RΨ might be a good idea if we want to use the distances induced by the norm (9). Moreover, if the data are noisy, we

do not know whether our algorithm will still produce a stable representation, which is a prerequisite for the existence of the solution of the Sylvester equation. If we cannot ensure that N is big enough, then the algorithm from Theorem 2 might fail to produce a representation of Ψ . Acknowledgements. This work was supported by grants NSF EHS-05-09101, NSF CAREER IIS-04-47739, and ONR N00014-05-1083.

References 1. Petreczky, M.: Realization Theory of Hybrid Systems. PhD thesis, Vrije Universiteit, Amsterdam (2006) available online: http://www.cis.jhu.edu/˜mihaly. 2. Isidori, A.: Nonlinear Control Systems. Springer Verlag (1989) 3. Sontag, E.D.: Realization theory of discrete-time nonlinear systems: Part I – the bounded case. IEEE Transaction on Circuits and Systems 26(4) (1979) 4. Ball, J.A., Groenewald, G., Malakorn, T.: Structured noncommutative multidimensional linear systems. In: Proceedings Sixteenth International Symposium on Mathematical Theory of Networks and Systems. (2004) 5. Anderson, B.D.O.: The realization problem for hidden Markov models. Mathematical Control of Signals and Systems 12 (1999) 80–120 6. Peeters, R.: System Identification Based on Riemannian Geometry: Theory and Algorithms. PhD thesis, Free University, Amsterdam (1994) 7. Hazewinkel, M.: Moduli and canonical forms for linear dynamical systems II: The topological case. Mathematical Systems Theory 10 (1977) 363–385 8. Hanzon, B.: On the differentiable manifold of fixed order stable linear systems. Systems and Control Letters 13 (1989) 345–352 9. Caines, P.: Linear Stochastic Systems. John Wiley and Sons, New-York (1988) 10. Bissacco, A., Chiuso, A., Ma, Y., Soatto, S.: Recognition of human gaits. In: IEEE Conference on Computer Vision and Pattern Recognition. Volume 2. (2001) 52–58 11. Doretto, G., Chiuso, A., Wu, Y., Soatto, S.: Dynamic textures. International Journal of Computer Vision 51(2) (2003) 91–109 12. Vishwanathan, S., Smola, A., Vidal, R.: Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. International Journal of Computer Vision (2006) 13. Bissacco, A., Chiuso, A., Soatto, S.: Classification and recognition of dynamical models. Technical Report CSD-TR 060020, UCLA (2006) 14. Berstel, J., Reutenauer, C.: Rational series and their languages. EATCS Monographs on Theoretical Computer Science, Springer-Verlag (1984) 15. Sontag, E.D.: Polynomial Response Maps. Volume 13 of Lecture Notes in Control and Information Sciences. Springer Verlag (1979) 16. Sontag, D.E.: A remark on bilinear systems and moduli spaces of instantons. Systems and Control Letters 9(5) (1987) 361–367 17. Callier, M.F., Desoer, A.C.: Linear System Theory. Springer-Verlag (1991) 18. Desai, U.: Realization of bilinear stochastic systems. IEEE Transactions on Automatic Control 31(2) (1986) 19. Costa, O., Fragoso, M., Marques, R.: Discrete-Time Markov Jump Linear Systems. SpringerVerlag, London (2005) 20. Martin, A.: A metric for ARMA processes. IEEE Trans. on Signal Processing 48(4) (2000) 1164–1170 21. Cock, K.D., Moor, B.D.: Subspace angles and distances between ARMA models. System and Control Letters 46(4) (2002) 265–270