Efficient Variational Inference for Gaussian Process Regression Networks

Efficient Variational Inference for Gaussian Process Regression Networks

Trung Nguyen and Edwin Bonilla Australian National University (ANU) National ICT Australia (NICTA) Presented by:Simon O'Callaghan

1

Outline Motivation

• Multi-output regression • Complex correlations

fig from Wilson et al 2

Outline Outline

• Gaussian process regression networks • Variational inference for GPRNs • Experiments • Summary

3

Preliminary: Gaussian Processes • A Gaussian process (GP) is specified by its mean and covariance function.

f (x) ⇠ GP(m(x), k(x, x0 ))

4

Gaussian process regression networks

• Motivation: prediction of outputs with complex correlations • Generative perspective

5

Inference for GPRNs (1) • Notations: X = {xn }N n=1

D = {yn = y(xn )}N n=1

✓ = {✓ f , ✓ w ,

y,

f

= 0}

• Bayesian formulation: • prior

p(f , w|✓ f , ✓ w ) =

i,j

• likelihood

p(D|f , w,

Y

y) =

Y n

N (fj ; 0, f )N (wij ; 0, w )

N (yn ; W(xn )f (xn ),

2 y I)

• Each latent and weight function is an independent GP 6

Inference for GPRNs (2) • Posterior

p(D|f , w)p(f , w) p(f , w|D, ✓) = R p(D|f , w)p(f , w)df dw

• Intractable so approximate inference is needed • Bayesian inference for f and w, maximum likelihood for hyperparameters • Variational messing passing was used in the original paper 7

Inference for GPRNs (3) • Variational inference: find the closest tractable approximation of the posterior (in KL divergence)

fig from Bishop 2006

• Optimization: minimizing the KL divergence is equivalent to maximizing the evidence lower bound (ELBO):

L(q) = Eq [log p(D|f , w)] + Eq [log p(f , w)] + Hq [q(f , w)] | {z } | {z } expected log joint

entropy

8

Inference for GPRNs (4) • Mean-field approximation q(f , w) =

Q Y

P Y

N (fj ; µfj , ⌃fj ) N (wij ; µwij , ⌃wij ) {z } i=1 | {z } j=1 | q(fj )

fj = [fj (x1 ), . . . , fj (xN )]T

q(wij )

wij = [wij (x1 ), . . . , wij (xN )]T

• O(N 2 ) variational parameters for covariance matrix of each factor

9

Inference for GPRNs (5) • Mean-field results • Exact ELBO and analytical solutions for the variational parameters ⌃ fj = K f 1 + 1

⌃wij = Kw +

P 1 X 2 y i=1

diag(µwij • µwij + Var(wij ))

1

diag(µfj • µfj + Var(fj )) 2

1

1

y

• Only O(N ) parameters needed for each factor 10

Inference for GPRN (6) • Nonparametric approximation z Q K Y X 1 (k) N (fj ; µfj , q(f , w) = K {z j=1 | k=1

(k)

q(fj

)

q (k) (f ,w)

}| P Y 2 N (wij ; µwij , k I)) {z } i=1 | (k)

q(wij )

{

2 k I))

}

• Each component is an isotropic Gaussian • O(KN ) variational parameters only (K < 5) 11

Inference for GPRN (7) • NPV results • Analytical lower bound for ELBO • Previous method used second-order approximation L(q)

K X 1 Eq(k) [log p(D|f , w)]Eq(k) [log p(f , w)] K {z } | k=1

analytically tractable as in MF

K K X X 1 1 log N (µ(k) ; µ(j) , ( K K j=1 k=1 | {z Hq [q(f ,w)]

2 k

+

2 j )I)

} 12

Inference for GPRNs (8): Summary • Two families of distributions for variational inference

• O(N ) variational parameters (c.f. O(N 2 ) for standard variational Gaussian)

• Approximations of relatively complex posteriors • Closed-form ELBO which allows model selection and learning of hyperparameters

13

Experiments (1)

• Datasets • Jura: prediction of heavy metal concentrations • Concrete: prediction of concrete qualities (slump, flow, compressive strength)

Cd

Ni

Zn

Slump

Flow

CS

?

?

?

?

14

Experiments (2)

Mean absolute error (MAE)

0.55 0.5 0.45 0.4 0.35 0.3 0.25

Standardized mean squared error (SMSE)

1.4

IGP MF NPV1 NPV2 NPV3

1.2 1 0.8 0.6 0.4 0.2 0

0.2

Jura (MAE)

IGP MF NPV1 NPV2 NPV3

Slump

Flow

Compressive Strength

Concrete (SMSE)

15

Summary • GPRNs for input-dependent (adaptive) correlations • Two tractable and statistically efficient families of variational distributions • Future work: • Simplify the GPRN model for less intensive inference without losing its flexibility • Extend/apply GPRN to other multi-task problems, e.g., classification, preferences • Scalability issues 16

Questions?

• Thank you!

17

Efficient Variational Inference for Gaussian Process ...

Intractable so approximate inference is needed. • Bayesian inference for f and w, maximum likelihood for hyperparameters. • Variational messing passing was ...

2MB Sizes 0 Downloads 267 Views

Recommend Documents

Efficient Variational Inference for Gaussian Process ...
covariance functions κw and κf evaluated on the test point x∗ wrt all of the ..... partment of Broadband, Communications and the Dig- ital Economy and the ...

Variational Program Inference - arXiv
If over the course of an execution path x of ... course limitations on what the generated program can do. .... command with a prior probability distribution PC , the.

Variational Program Inference - arXiv
reports P(e|x) as the product of all calls to a function: .... Evaluating a Guide Program by Free Energy ... We call the quantity we are averaging the one-run free.

Bagging for Gaussian Process Regression
Sep 1, 2008 - rate predictions using Gaussian process regression models. ... propose to weight the models by the inverse of their predictive variance, and ...

Bagging for Gaussian Process Regression
Sep 1, 2008 - A total of 360 data points were collected from a propylene polymerization plant operated in a continuous mode. Eight process variables are ...

Variational inference for latent nonlinear dynamics
work that is able to infer an approximate posterior representing nonlinear evolution in the latent space. Observations are expressed as a noise model, Poisson or Gaussian, operating on arbitrary ... We use VIND to develop inference for a Locally Line

Gaussian Process Factorization Machines for Context ...
the user is listening to a song on his/her mobile phone or ...... feedback of 4073 Android applications by 953 users. The ..... In SDM'10, pages 211–222, 2010.

Efficient Inference and Structured Learning for ... - Semantic Scholar
1 Introduction. Semantic role .... speech tag).1 For each lexical unit, a list of senses, or frames ..... start, and then subtract them whenever we assign a role to a ...

Efficient Inference and Structured Learning for ... - Research at Google
constraints are enforced by reverting to k-best infer- ..... edge e∗,0 between v−1 and v0. Set the weight ... does not affect the core role assignment, the signature.

State-Space Inference and Learning with Gaussian Processes
State-Space Inference and Learning with Gaussian Processes. Ryan Turner. Seattle, WA. March 5, 2010 joint work with Marc Deisenroth and Carl Edward Rasmussen. Turner (Engineering, Cambridge). State-Space Inference and Learning with Gaussian Processes

Normal form decomposition for Gaussian-to-Gaussian ...
Dec 1, 2016 - Reuse of AIP Publishing content is subject to the terms: ... Directly related to the definition of GBSs is the notion of Gaussian transformations,1,3,4 i.e., ... distribution W ˆρ(r) of a state ˆρ of n Bosonic modes, yields the func

Uncertainty Propagation in Gaussian Process Pipelines
Gaussian processes (GPs) constitute ideal candidates for building learning pipelines [1, 2, 3]. Their Bayesian, non-parametric ... Experiments on simulated and real-world data show the importance of correctly propagating the uncertainty between stage

Response to the discussion of “Gaussian Process ... - Semantic Scholar
of a Gaussian process regression model for the calibration of multiple response .... like to acknowledge the financial support from the EPSRC KNOW-HOW ...

Fast Allocation of Gaussian Process Experts
Jun 24, 2014 - code: https://trungngv.github.io/fagpe. • Future work: distributed implementation, adaptive number of experts, adaptive number of inducing ...

Some relations between Gaussian process binary ...
Gaussian process binary classification, Multivariate skew normal, and ... between the GP latent process and the probit data likelihood. ... to D is then Dr def.

Adversarial Images for Variational Autoencoders
... posterior are normal distributions, their KL divergence has analytic form [13]. .... Our solution was instead to forgo a single choice for C, and analyze the.

Geometry Motivated Variational Segmentation for Color Images
In Section 2 we give a review of variational segmentation and color edge detection. .... It turns out (see [4]) that this functional has an integral representation.

Geometry Motivated Variational Segmentation for ... - Springer Link
We consider images as functions from a domain in R2 into some set, that will be called the ..... On the variational approximation of free-discontinuity problems in.

DYNAMIC GAUSSIAN SELECTION TECHNIQUE FOR ...
“best” one, and computing the distortion of this Gaussian first could .... Phone Accuracy (%). Scheme ... Search for Continuous Speech Recognition,” IEEE Signal.

Additive Gaussian Processes - GitHub
This model, which we call additive Gaussian processes, is a sum of functions of all ... way on an interaction between all input variables, a Dth-order term is ... 3. 1.2 Defining additive kernels. To define the additive kernels introduced in this ...

Inference Protocols for Coreference Resolution - GitHub
R. 23 other. 0.05 per. 0.85 loc. 0.10 other. 0.05 per. 0.50 loc. 0.45 other. 0.10 per .... search 3 --search_alpha 1e-4 --search_rollout oracle --passes 2 --holdout_off.

LEARNING AND INFERENCE ALGORITHMS FOR ...
Department of Electrical & Computer Engineering and Center for Language and Speech Processing. The Johns ..... is 2 minutes, and the video and kinematic data are recorded at 30 frames per ... Training and Decoding Using SS-VAR(p) Models. For each ...

Deep Gaussian Processes - GitHub
Because the log-normal distribution is heavy-tailed and its domain is bounded .... of layers as long as D > 100. ..... Deep learning via Hessian-free optimization.