Seattle, WA March 5, 2010 joint work with Marc Deisenroth and Carl Edward Rasmussen

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

1

Outline

Motivation for dynamical systems Expectation Maximization (EM) Gaussian Processes (GP) Inference Learning Results

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

2

Motivation measurement device (sensor)

position, velocity g(position,noise)

system

filter

p(position, velocity)

throttle

controller

estimating (latent) states from noisy measurements Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

3

Setup xt−1

f

g

zt−1

g

zt

xt = f (xt−1 ) + w, yt = g(xt ) + v,

f

xt

xt+1 g

zt+1

w ∼ N (0, Q)

v ∼ N (0, R)

x: latent state, y: measurement learning: find f and g using y1:T

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

4

The Goal

Learn the NLDS in an nonparametric and probabilistic fashion EM algorithm. Requires inference (filtering and smoothing) and prediction in nonlinear dynamical systems (NLDS) using moment matching. filtering: find distribution p(xt |y1:t ) smoothing: find distribution p(xt |y1:T ) prediction: find distribution p(yt+1 |y1:t )

Gaussian process inference and learning (GPIL) algorithm

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

Expectation Maximization

EM iterates between two steps, the E-step and the M-step. E-step (or inference step): find a posterior distribution p(X|Y, Θ). M-step: maximize the expected log-likelihood Q = EX [log p(X, Y|Θ)] wrt Θ.

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

6

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Existing Methods for nonlinear systems

Extended Kalman Filter (EKF) [Maybeck, 1979]. Unscented Kalman Filter (UKF) [Julier and Uhlmann, 1997]. Assumed Density Filter (ADF) [Boyen and Koller, 1998, Opper, 1998]. Radial Basis Functions (RBF) [Ghahramani and Roweis, 1999]. Neural networks [Honkela and Valpola, 2005]. Other GP approaches [Wang et al., 2008, Ko and Fox, 2009b] GPDM and GPBF. GPs for filtering in the context of the UKF, the EKF [Ko and Fox, 2009a], and the ADF [Deisenroth et al., 2009].

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

8

The GP-ADF

f( · ) xτ −1

xτ

xτ +1

xt−1

xt

xt+1

yτ −1

yτ

yτ +1

yt−1

yt

yt+1

training

Turner (Engineering, Cambridge)

g( · )

test

State-Space Inference and Learning with Gaussian Processes

9

Advantages of GPIL Model f and g with GPs: f ∼ GP f , g ∼ GP g . GPs account for three uncertainties: system noise measurement noise model uncertainty

Integrates out the latent states (not MAP) unlike [Wang et al., 2008, Ko and Fox, 2009b]. Tractable algorithm for approximate inference (smoothing) in GP state-space models. Learning without ground-truth observations xi of the latent states. 4

f(x)

2

0

−2

−4 −5

Turner (Engineering, Cambridge)

0 x

5

State-Space Inference and Learning with Gaussian Processes

10

E-Step: Forward sweep

time update p(xt−1 |z1:t−1 ) xt−1

f

measurement update

p(xt |z1:t−1 )

p(xt |z1:t−1 )

p(xt |z1:t )

xt

xt

xt g zt

zt p(zt |z1:t−1 ) 1) predict next hidden state

2) predict measurement

measure zt

3) hidden state posterior

Backward sweep also analytic

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

11

Predictions Using Moment Matching

1

1

0.5

0.5 0 −0.5

−1

−1

xt+1

0 −0.5

−1.5

−1.5

−2

−2

−2.5

−2.5

−3

−3

−3.5

−3.5

−4

2

1.5

1

0.5

0

−4 −0.5

Turner (Engineering, Cambridge)

0

(xt,ut)

0.5

State-Space Inference and Learning with Gaussian Processes

1

12

M-Step

xt−1

f

g

zt−1

Turner (Engineering, Cambridge)

f

xt g

zt

xt+1 g

zt+1

State-Space Inference and Learning with Gaussian Processes

13

Pseudo-training data β6

2 β2

β3

1 0

α1

α5

α2 α3

α7 α6

β4

−1 −2

β5

α4

β7

β1

−2

−1

Turner (Engineering, Cambridge)

0

1

2

State-Space Inference and Learning with Gaussian Processes

14

Why We Need Pseudo-training Data

α, β xt−1

xt

xt+1

yt−1

yt

yt+1

ξ, υ GP f and GP g are not full GPs, but rather sparse GPs Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

15

Why We Need Pseudo-training Data

xt → xt+1 given α and β is a GP prediction.

xt−1 is (uncertain) test input. α and β are standard GP training set. xt+1 ⊥ xt−1 |xt , α, β

Markovian property.

Without using a pseudo training set, xt+1 ⊥ xt−1 |xt , f conditions on ∞-dimensional object f intractable

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

16

The Auxiliary Function

We decompose Q into Q = E [log p(X, Y|Θ)] = E[log p(x1 |Θ)] X X T T X X + E log p(xt |xt−1 , Θ)+ log p(yt |xt , Θ) X | {z } t=1 | {z } t=2 Transition

Turner (Engineering, Cambridge)

Measurement

State-Space Inference and Learning with Gaussian Processes

17

The Auxiliary Function

We decompose Q into Q = E [log p(X, Y|Θ)] = E[log p(x1 |Θ)] X X T T X X + E log p(xt |xt−1 , Θ)+ log p(yt |xt , Θ) X | {z } t=1 | {z } t=2 Transition

Measurement

using the factorization properties of the model.

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

17

The Transition Contribution

EX [log p(xt |xt−1 , Θ)] M (xti − µi (xt−1 ))2 1X EX +EX log σi2 (xt−1 ) =− 2 2 i=1 σi (xt−1 ) | {z } | {z } Complexity Term Data Fit Term

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

18

The Transition Contribution

EX [log p(xt |xt−1 , Θ)] M (xti − µi (xt−1 ))2 1X EX +EX log σi2 (xt−1 ) =− 2 2 i=1 σi (xt−1 ) | {z } | {z } Complexity Term Data Fit Term

We approximate the data fit EX (xti − µi (xt−1 ))2 (xti − µi (xt−1 ))2 EX ≈ σi2 (xt−1 ) EX [σi2 (xt−1 )] and lower bound the EM lower bound with EX log σi2 (xt−1 ) ≤ log EX σi2 (xt−1 ) .

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

18

Synthetic Data 8

ground truth posterior mean pseudo targets

6

f(x)

4 2 0 −2

−3

−2

−1

0

1

2

3

−3

−2

−1

0 1 2 3 xState-Space Inference and Learning with Gaussian Processes

0.5

0

Turner (Engineering, Cambridge)

19

Snow Data

4

4

posterior mean pseudo targets

3 snowfall in log−cm

3

2

1

2

1

0

0

−1

−1 1

0.1

0.01

−1 Turner (Engineering, Cambridge)

0

1

2 x

3

4

5

State-Space Inference and Learning with Gaussian Processes

20

Quantitative Results

Method TIM Kalman ARGP NDFA GPDM GPIL ? UKF EKF GP-UKF

NLL synth. 2.21±0.0091 2.07±0.0103 1.01±0.0170 2.20±0.00515 3330±386 0.917 ± 0.0185 4.55±0.133 1.23±0.0306 6.15±0.649

RMSE synth.

Turner (Engineering, Cambridge)

2.18 1.91 0.663 2.18 2.13 0.654 2.19 0.665 2.06

NLL real 1.47±0.0257 1.29±0.0273 1.25±0.0298 14.6±0.374 N/A 0.684 ± 0.0357 1.84±0.0623 1.46±0.0542 3.03±0.357

RMSE real

1.01 0.783 0.793 1.06 N/A 0.769 0.938 0.905 0.884

State-Space Inference and Learning with Gaussian Processes

21

Conclusions

GPs for flexible distribution over nonlinear dynamical systems. Filtering and smoothing based on moment matching Learning the dynamical system (even without ground-truth latent state)

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

22

References Boyen, X. and Koller, D. (1998). Tractable inference for complex stochastic processes. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI 1998), pages 33–42, San Francisco, CA, USA. Morgan Kaufmann. Deisenroth, M. P., Huber, M. F., and Hanebeck, U. D. (2009). Analytic moment-based Gaussian process filtering. In Bouttou, L. and Littman, M. L., editors, Proceedings of the 26th International Conference on Machine Learning, pages 225–232, Montreal, Canada. Omnipress. Ghahramani, Z. and Roweis, S. (1999). Learning nonlinear dynamical systems using an EM algorithm. In Advances in Neural Information Processing Systems 11, pages 599–605. Honkela, A. and Valpola, H. (2005). Unsupervised variational Bayesian learning of nonlinear models. In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 593–600. MIT Press, Cambridge, MA. Julier, S. J. and Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In Proceedings of AeroSense: 11th Symposium on Aerospace/Defense Sensing, Simulation and Controls, pages 182–193, Orlando, FL, USA. Ko, J. and Fox, D. (2009a). GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots, 27(1):75–90. Ko, J. and Fox, D. (2009b). Learning GP-BayesFilters via Gaussian Process Latent Variable Models. In Proceedings of Robotics: Science and Systems, Seattle, USA. Maybeck, P. S. (1979). Stochastic Models, Estimation, and Control, volume 141 of Mathematics in Science and Engineering. Academic Press, Inc. Opper, M. (1998).

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

23