Bayesian Optimization for Likelihood-Free Inference Michael Gutmann https://sites.google.com/site/michaelgutmann University of Edinburgh

14th September 2016

Reference

For further information: M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research, 17(125): 1–47, 2016 J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander Fundamentals and Recent Developments in Approximate Bayesian Computation Systematic Biology, in press, 2016

Michael Gutmann

BOLFI

2 / 23

Overall goal I

Inference: Given data y o , learn about properties of its source

I

Enables decision making, predictions, . . .

Data space Data source

Observation

yo

Unknown properties

Inference

Michael Gutmann

BOLFI

3 / 23

Approach I

Set up a model with potential properties θ (hypotheses)

I

See which θ are in line with the observed data y o

Data space Data source

Observation

yo

Unknown properties

M(θ) Model

Inference

Michael Gutmann

BOLFI

4 / 23

The likelihood function L(θ) I

Measures agreement between θ and the observed data y o

I

Probability to generate data like y o if hypothesis θ holds

Data space Data source

Observation

yo

Unknown properties

y|θ

Data generation

ε

M(θ) Model

Michael Gutmann

BOLFI

5 / 23

Performing statistical inference

I

If L(θ) is known, inference is straightforward

I

Maximum likelihood estimation θˆ = argmaxθ L(θ)

I

Bayesian inference p(θ|y ) ∝ p(θ) × L(θ) posterior ∝ prior × likelihood Allows us to learn from data by updating probabilities

Michael Gutmann

BOLFI

6 / 23

Likelihood-free inference

Statistical inference for models where 1. the likelihood function is too costly to compute 2. sampling – simulating data – from the model is possible

Michael Gutmann

BOLFI

7 / 23

Importance of likelihood-free inference One reason: Such generative / simulator-based models occur widely I

Astrophysics: Simulating the formation of galaxies, stars, or planets

I

Evolutionary biology: Simulating the evolution of life

I

Neuroscience: Simulating neural circuits

I

Computer vision: Simulating natural scenes

I

Health science: Simulating the spread of an infectious disease

I

Simulated neural activity in rat somatosensory cortex (Figure from https://bbp.epfl.ch/nmc-portal)

... Michael Gutmann

BOLFI

8 / 23

Flavors of likelihood-free inference

I

There are several flavors of likelihood-free inference. In Bayesian setting e.g. I I

Approximate Bayesian computation (ABC) Synthetic likelihood (Wood, 2010)

I

General idea: Identify the values of the parameters of interest θ for which simulated data resemble the observed data

I

Simulated data resemble the observed data if some distance measure d ≥ 0 is small.

Here: Focus on ABC, see JMLR paper for synthetic likelihood

Michael Gutmann

BOLFI

9 / 23

Meta ABC algorithm

I I

Let y o be the observed data. Iterate many times: 1. Sample θ from a proposal distribution q(θ) 2. Sample y |θ according to the model 3. Compute distance d(y , y o ) between simulated and observed data 4. Retain θ if d(y , y o ) ≤ 

I

Different choices for q(θ) give different algorithms

I

Produces samples from the (approximate) posterior when  is small

Michael Gutmann

BOLFI

10 / 23

Implicit likelihood approximation Likelihood: Probability to generate data like y o if hypothesis θ holds yθ(1)

Model

M(θ)

Data space

yθ(2) yθ(3)

ε

yθ(4)

yo

yθ(5) Likelihood L(θ) ≈ proportion of green outcomes

yθ(6)

L(θ) ≈

1 N

(i) o i=1 1 d(yθ , y ) ≤ 

PN

Michael Gutmann





BOLFI

11 / 23

Example: Bacterial infections in child care centers I I

Likelihood intractable for cross-sectional data But generating data from the model is possible 5

Parameters of interest: - rate of infections within a center - rate of infections from outside - competition between the strains

Strain Strain

10 15 20 25 30 10

15 20 Individual

25

30

35

Individual

5

Strain

10 15 20

5

25

10

30 5

Strain

5

15

10 20

5 15 20 Individual

25

Strain

25 30 5

10

Time

10

30

35

15 20 15 20 25 Individual

25

30

35

10

15 20 Individual

30

(Numminen et al, 2013)

5

Michael Gutmann

BOLFI

25

30

35

12 / 23

Example: Bacterial infections in child care centers

I

Data: Streptococcus pneumoniae colonization for 29 centers

I

Inference with Population Monte Carlo ABC

I

Reveals strong competition between different bacterial strains 18

prior posterior

Expensive: I

I

4.5 days on a cluster with 200 cores More than one million simulated data sets

probability density function

16 14 12 10

strong

weak

6 4 2 0 0

Michael Gutmann

Competition

8

BOLFI

0.2

0.4 0.6 Competition parameter

0.8

1

13 / 23

Why is the ABC algorithm so expensive? 1. It rejects most samples when  is small 2. It does not make assumptions about the shape of L(θ) 3. It does not use all information available 4. It aims at equal accuracy for all parameters 6 5

L(θ) ≈

1 N

PN

i=1

1 d(yθ(i) , y o ) ≤  



Approximate likelihood function (rescaled)

Average distance

4 3

Approximate lik function for competition parameter. N = 300.

2 distances

Variability

1

Threshold ε

0 0

Michael Gutmann

BOLFI

0.05 0.1 0.15 Competition parameter

0.2

14 / 23

Proposed solution (Gutmann and Corander, 2016)

1. It rejects most samples when  is small ⇒ Don’t reject samples – learn from them 2. It does not make assumptions about the shape of L(θ) ⇒ Model the distances, assume average distance is smooth 3. It does not use all information available ⇒ Use Bayes’ theorem to update the model 4. It aims at equal accuracy for all parameters ⇒ Prioritize parameter regions with small distances equivalent strategy applies to inference with synthetic likelihood

Michael Gutmann

BOLFI

15 / 23

Modeling (points 1 & 2)

(i)

I

Data are tuples (θi , di ), where di = d(yθ , y o )

I

Model the conditional distribution of d given θ ˆ Estimated model yields approximation L(θ) for any choice of 

I

b (d ≤  | θ) ˆ L(θ) ∝ Pr

I

b is probability under the estimated model. Pr Here: Use (log) Gaussian process as model (with squared exponential covariance function)

I Approach not restricted to Gaussian processes.

Michael Gutmann

BOLFI

16 / 23

Data acquisition (points 3 & 4) I

Samples of θ could be obtained by sampling from the prior or some adaptively constructed proposal distribution

I

Give priority to regions in the parameter space where distance d tends to be small.

I

Use Bayesian optimization to find such regions

I

Here: Use lower confidence bound acquisition function

(e.g. Cox

and John, 1992; Srinivas et al, 2012)

s ηt2 vt (θ) At (θ) = µt (θ) − |{z} | {z } | {z } post mean

(1)

weight post var

t: number of samples acquired so far I Approach not restricted to this acquisition function.

Michael Gutmann

BOLFI

17 / 23

Bayesian optimization for likelihood-free inference 5

Model based on 2 data points 95%

6

90%

5

80%

distance

0

mean

4 3

50%

2

-5

20%

1

10% -10

Model based on 3 data points

0

5%

Acquisition function

-1 -2

-15 0

0.05

0.1

0.15

Competition parameter

6

0.2

Next parameter to try

-3 0

0.05

0.1

0.15

0.2

Competition parameter

Model based on 4 data points

5

Exploration vs exploitation

distance

4 3

Data

Model

2 1 0 -1 0

Bayes' theorem 0.05

0.1

0.15

0.2

Competition parameter

Michael Gutmann

BOLFI

18 / 23

Example: Bacterial infections in child care centers I

Comparison of the proposed approach with a standard population Monte Carlo ABC approach.

I

Roughly equal results using 1000 times fewer simulations. Developed Fast Method Standard Method

0.4

Posterior means: solid lines,

0.35 Competition parameter

4.5 days with 200 cores ↓ 90 minutes with seven cores .

credibility intervals: shaded areas or dashed lines

0.3 0.25 0.2 0.15 0.1 0.05 2

2.5

3 3.5 4 4.5 5 Computational cost (log10)

5.5

6

(Gutmann and Corander, 2016) Michael Gutmann

BOLFI

19 / 23

Example: Bacterial infections in child care centers I

Comparison of the proposed approach with a standard population Monte Carlo ABC approach.

I

Roughly equal results using 1000 times fewer simulations.

11 Developed Fast Method Standard Method

10

1.8

External infection parameter

Internal infection parameter

Developed Fast Method Standard Method

1.6

9 8 7 6 5 4 3

1.4 1.2 1 0.8 0.6

2 1

0.4

2

2.5

3 3.5 4 4.5 5 Computational cost (log10)

5.5

6

2

2.5

3 3.5 4 4.5 5 Computational cost (log10)

Posterior means are shown as solid lines, credibility intervals as shaded areas or dashed lines

Michael Gutmann

BOLFI

5.5

6

. 20 / 23

Further benefits

I

The proposed method makes the inference more efficient. I

I

Enables inference for models which were out of reach till now I

I

Allowed us to perform far more comprehensive data analysis than with standard approach (Numminen et al, 2016)

model of evolution where simulating a single data set took us 12-24 hours (Marttinen et al, 2015)

Enables easier assessment of parameter identifiability for complex models I

model about transmission dynamics of tuberculosis (Lintusaari et al, 2016)

Michael Gutmann

BOLFI

21 / 23

Open questions

I

Model: How to best model the distance between simulated and observed data?

I

Acquisition function: Can we find strategies which are optimal for parameter inference?

I

Efficient high-dimensional inference: Can we use the approach to infer the joint distribution of 1000 variables?

see JMLR paper for a discussion

Michael Gutmann

BOLFI

22 / 23

Summary

I

Topic: Inference for models where the likelihood is intractable but sampling is possible

I

Inference principle: Find parameter values for which the distance between simulated and observed data is small

I

Problem considered: Computational cost

I

Proposed approach: Combine statistical modeling of the distance with decision making under uncertainty (Bayesian optimization)

I

Outcome: Approach increases the efficiency of the inference by several orders of magnitude

Michael Gutmann

BOLFI

23 / 23

References I M.U. Gutmann and J. Corander. Bayesian optimization for likelihood-free inference of simulator-based statistical models, Journal of Machine Learning Research, 17(125): 1–47, 2016 I J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander. Fundamentals and Recent Developments in Approximate Bayesian Computation, Systematic Biology, in press, 2016 I E. Numminen, M.U. Gutmann, M. Shubin, et al. The impact of host metapopulation structure on the population genetics of colonizing bacteria Journal of Theoretical Biology, 396: 53–62, 2016 I J. Lintusaari, M.U. Gutmann, S. Kaski, and J. Corander. On the identifiability of transmission dynamic models for infectious disease Genetics, 202(3): 911–918, 2016 I P. Marttinen, N.J. Croucher, M.U. Gutmann, J. Corander, and W.P. Hanage. Recombination produces coherent bacterial species clusters in both core and accessory genomes, Microbial Genomics, 1(5), 2015 I Numminen et al. Estimating the Transmission Dynamics of Streptococcus pneumoniae from Strain Prevalence Data. Biometrics 9, 2013. I N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, 2012. I S.N. Wood. Statistical inference for noisy nonlinear ecological dynamic systems, Nature, 466: 1102–1104, 2010 I D. Cox and S. John. A statistical method for global optimization, Proc. IEEE Conference on Systems, Man and Cybernetics, 2: 1241–1246, 1992

Bayesian Optimization for Likelihood-Free Inference

Sep 14, 2016 - There are several flavors of likelihood-free inference. In. Bayesian ..... IEEE. Conference on Systems, Man and Cybernetics, 2: 1241–1246, 1992.

2MB Sizes 8 Downloads 301 Views

Recommend Documents

Bayesian Inference of Viral Recombination
Bayesian Inference of Viral Recombination: Topology distance between DNA segments and its distribution. DNA. Leonardo de Oliveira Martins ...

1 Bayesian Inference with Tears a tutorial workbook for natural ...
be less competitive when large amounts of data are available, anyway – prior knowledge is more powerful when ... I don't know that Bayesian techniques will actually deliver on these promises, on the big problems that we care about. ... Input: a cip

Appearance-Based Topological Bayesian Inference for ...
Based on this approach, the probabilistic Bayesian in- ference can work ...... loop-closing for SLAM in a highly unstructured cross-country environment where no ...

Approximate Inference for Infinite Contingent Bayesian ...
of practical applications, from tracking aircraft based on radar data to building a bibliographic database based on ci- tation lists. To tackle these problems, ..... Suppose it is usually Bob's responsi- bility to prepare questions for discussion, bu

Probabilistic Programming and Bayesian Inference ...
Inference (Addison-Wesley Data & Analytics). {Free Online|ebook pdf| ... Related. Hands-On Machine Learning with Scikit-Learn and TensorFlow · The Elements ...

1 Bayesian Inference with Tears a tutorial workbook for natural ...
I had Mr. Robinson for calculus in high school, but I mean, man, that was a long time ago. ... Does it work? Well, do Bayesian techniques really work, on real problems? If it works, then I should bother. Bayesian techniques seem to beat plain EM1 on

Bayesian Optimization for a Better Dessert - Research at Google
in ML models, with Bayesian Optimization [12], randomized search [2], and algorithms such as ... gave us a response, yielding 31±6 forms per tasting. ... Vizier's database. Mostly, recipes were edited early on; later, the chefs became more confident

Bayesian Optimization for a Better Dessert - Research at Google
The process was a mixed-initiative system where both human chefs, human raters, and a machine optimizer participated in 144 experiments. This process resulted in highly rated cookies that ... value. We seek to accelerate this process of cross-discipl

Towards Automated Bayesian Optimization
of many complex problems in science and engineering. ... machine learning algorithm, but the functional form of the objective function f is unknown and even a ...

Bayesian ART-Based Fuzzy Inference System: A New ...
Here, the antecedent label Ai,k of rule Rk is defined using .... The chosen (i.e., winning) rule Rkp is subsequently defined ... complies with (11) is conducted.

PDF Fundamentals of Nonparametric Bayesian Inference
Deep Learning (Adaptive Computation and Machine Learning Series) · Bayesian Data Analysis, Third Edition (Chapman & Hall/CRC Texts in Statistical Science).

Learning Click Models via Probit Bayesian Inference
Oct 26, 2010 - ural to handle very large-scale data set. The paper is organized as .... pose, we use an online learning scheme referred to as Gaus- sian density filtering ..... a default Gaussian before the first training epoch of xi. The weighted ..

Bayesian Inference Explains Perception of Unity ... - MIT Press Journals
Project, ERATO, Japan Science and Technology Agency, Tokyo 151-0064, Japan. Taro Toyoizumi [email protected]. RIKEN Brain Science Institute, ...

bayesian inference in dynamic econometric models pdf
bayesian inference in dynamic econometric models pdf. bayesian inference in dynamic econometric models pdf. Open. Extract. Open with. Sign In. Main menu.

Learning Click Models via Probit Bayesian Inference
Oct 26, 2010 - republish, to post on servers or to redistribute to lists, requires prior specific ... P e rp le xity. Query Frequency. UBM(Likelihood). UBM(MAP). Figure 1: The perplexity score on different query frequencies achieved by the UBM model

'relevant population' in Bayesian forensic inference?
Aug 21, 2011 - scene by the perpetrator; and (ii) a similar sample (recording, DNA .... where we have introduced the label πg for later reference to this prior.

Sparse Bayesian Inference of White Matter Fiber Orientations from ...
taxonomy and comparison. NeuroImage 59 (2012) ... in diffusion mri acquisition and processing in the human connectome project. Neu- roimage 80 (2013) ...

Learning Click Models via Probit Bayesian Inference
Oct 26, 2010 - web search ranking. ..... computation can be carried out very fast, as well as with ... We now develop an inference algorithm for the framework.