14th September 2016

Reference

For further information: M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research, 17(125): 1–47, 2016 J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander Fundamentals and Recent Developments in Approximate Bayesian Computation Systematic Biology, in press, 2016

Michael Gutmann

BOLFI

2 / 23

Overall goal I

Inference: Given data y o , learn about properties of its source

I

Enables decision making, predictions, . . .

Data space Data source

Observation

yo

Unknown properties

Inference

Michael Gutmann

BOLFI

3 / 23

Approach I

Set up a model with potential properties θ (hypotheses)

I

See which θ are in line with the observed data y o

Data space Data source

Observation

yo

Unknown properties

M(θ) Model

Inference

Michael Gutmann

BOLFI

4 / 23

The likelihood function L(θ) I

Measures agreement between θ and the observed data y o

I

Probability to generate data like y o if hypothesis θ holds

Data space Data source

Observation

yo

Unknown properties

y|θ

Data generation

ε

M(θ) Model

Michael Gutmann

BOLFI

5 / 23

Performing statistical inference

I

If L(θ) is known, inference is straightforward

I

Maximum likelihood estimation θˆ = argmaxθ L(θ)

I

Bayesian inference p(θ|y ) ∝ p(θ) × L(θ) posterior ∝ prior × likelihood Allows us to learn from data by updating probabilities

Michael Gutmann

BOLFI

6 / 23

Likelihood-free inference

Statistical inference for models where 1. the likelihood function is too costly to compute 2. sampling – simulating data – from the model is possible

Michael Gutmann

BOLFI

7 / 23

Importance of likelihood-free inference One reason: Such generative / simulator-based models occur widely I

Astrophysics: Simulating the formation of galaxies, stars, or planets

I

Evolutionary biology: Simulating the evolution of life

I

Neuroscience: Simulating neural circuits

I

Computer vision: Simulating natural scenes

I

Health science: Simulating the spread of an infectious disease

I

Simulated neural activity in rat somatosensory cortex (Figure from https://bbp.epfl.ch/nmc-portal)

... Michael Gutmann

BOLFI

8 / 23

Flavors of likelihood-free inference

I

There are several flavors of likelihood-free inference. In Bayesian setting e.g. I I

Approximate Bayesian computation (ABC) Synthetic likelihood (Wood, 2010)

I

General idea: Identify the values of the parameters of interest θ for which simulated data resemble the observed data

I

Simulated data resemble the observed data if some distance measure d ≥ 0 is small.

Here: Focus on ABC, see JMLR paper for synthetic likelihood

Michael Gutmann

BOLFI

9 / 23

Meta ABC algorithm

I I

Let y o be the observed data. Iterate many times: 1. Sample θ from a proposal distribution q(θ) 2. Sample y |θ according to the model 3. Compute distance d(y , y o ) between simulated and observed data 4. Retain θ if d(y , y o ) ≤

I

Different choices for q(θ) give different algorithms

I

Produces samples from the (approximate) posterior when is small

Michael Gutmann

BOLFI

10 / 23

Implicit likelihood approximation Likelihood: Probability to generate data like y o if hypothesis θ holds yθ(1)

Model

M(θ)

Data space

yθ(2) yθ(3)

ε

yθ(4)

yo

yθ(5) Likelihood L(θ) ≈ proportion of green outcomes

yθ(6)

L(θ) ≈

1 N

(i) o i=1 1 d(yθ , y ) ≤

PN

Michael Gutmann

BOLFI

11 / 23

Example: Bacterial infections in child care centers I I

Likelihood intractable for cross-sectional data But generating data from the model is possible 5

Parameters of interest: - rate of infections within a center - rate of infections from outside - competition between the strains

Strain Strain

10 15 20 25 30 10

15 20 Individual

25

30

35

Individual

5

Strain

10 15 20

5

25

10

30 5

Strain

5

15

10 20

5 15 20 Individual

25

Strain

25 30 5

10

Time

10

30

35

15 20 15 20 25 Individual

25

30

35

10

15 20 Individual

30

(Numminen et al, 2013)

5

Michael Gutmann

BOLFI

25

30

35

12 / 23

Example: Bacterial infections in child care centers

I

Data: Streptococcus pneumoniae colonization for 29 centers

I

Inference with Population Monte Carlo ABC

I

Reveals strong competition between different bacterial strains 18

prior posterior

Expensive: I

I

4.5 days on a cluster with 200 cores More than one million simulated data sets

probability density function

16 14 12 10

strong

weak

6 4 2 0 0

Michael Gutmann

Competition

8

BOLFI

0.2

0.4 0.6 Competition parameter

0.8

1

13 / 23

Why is the ABC algorithm so expensive? 1. It rejects most samples when is small 2. It does not make assumptions about the shape of L(θ) 3. It does not use all information available 4. It aims at equal accuracy for all parameters 6 5

L(θ) ≈

1 N

PN

i=1

1 d(yθ(i) , y o ) ≤

Approximate likelihood function (rescaled)

Average distance

4 3

Approximate lik function for competition parameter. N = 300.

2 distances

Variability

1

Threshold ε

0 0

Michael Gutmann

BOLFI

0.05 0.1 0.15 Competition parameter

0.2

14 / 23

Proposed solution (Gutmann and Corander, 2016)

1. It rejects most samples when is small ⇒ Don’t reject samples – learn from them 2. It does not make assumptions about the shape of L(θ) ⇒ Model the distances, assume average distance is smooth 3. It does not use all information available ⇒ Use Bayes’ theorem to update the model 4. It aims at equal accuracy for all parameters ⇒ Prioritize parameter regions with small distances equivalent strategy applies to inference with synthetic likelihood

Michael Gutmann

BOLFI

15 / 23

Modeling (points 1 & 2)

(i)

I

Data are tuples (θi , di ), where di = d(yθ , y o )

I

Model the conditional distribution of d given θ ˆ Estimated model yields approximation L(θ) for any choice of

I

b (d ≤ | θ) ˆ L(θ) ∝ Pr

I

b is probability under the estimated model. Pr Here: Use (log) Gaussian process as model (with squared exponential covariance function)

I Approach not restricted to Gaussian processes.

Michael Gutmann

BOLFI

16 / 23

Data acquisition (points 3 & 4) I

Samples of θ could be obtained by sampling from the prior or some adaptively constructed proposal distribution

I

Give priority to regions in the parameter space where distance d tends to be small.

I

Use Bayesian optimization to find such regions

I

Here: Use lower confidence bound acquisition function

(e.g. Cox

and John, 1992; Srinivas et al, 2012)

s ηt2 vt (θ) At (θ) = µt (θ) − |{z} | {z } | {z } post mean

(1)

weight post var

t: number of samples acquired so far I Approach not restricted to this acquisition function.

Michael Gutmann

BOLFI

17 / 23

Bayesian optimization for likelihood-free inference 5

Model based on 2 data points 95%

6

90%

5

80%

distance

0

mean

4 3

50%

2

-5

20%

1

10% -10

Model based on 3 data points

0

5%

Acquisition function

-1 -2

-15 0

0.05

0.1

0.15

Competition parameter

6

0.2

Next parameter to try

-3 0

0.05

0.1

0.15

0.2

Competition parameter

Model based on 4 data points

5

Exploration vs exploitation

distance

4 3

Data

Model

2 1 0 -1 0

Bayes' theorem 0.05

0.1

0.15

0.2

Competition parameter

Michael Gutmann

BOLFI

18 / 23

Example: Bacterial infections in child care centers I

Comparison of the proposed approach with a standard population Monte Carlo ABC approach.

I

Roughly equal results using 1000 times fewer simulations. Developed Fast Method Standard Method

0.4

Posterior means: solid lines,

0.35 Competition parameter

4.5 days with 200 cores ↓ 90 minutes with seven cores .

credibility intervals: shaded areas or dashed lines

0.3 0.25 0.2 0.15 0.1 0.05 2

2.5

3 3.5 4 4.5 5 Computational cost (log10)

5.5

6

(Gutmann and Corander, 2016) Michael Gutmann

BOLFI

19 / 23

Example: Bacterial infections in child care centers I

Comparison of the proposed approach with a standard population Monte Carlo ABC approach.

I

Roughly equal results using 1000 times fewer simulations.

11 Developed Fast Method Standard Method

10

1.8

External infection parameter

Internal infection parameter

Developed Fast Method Standard Method

1.6

9 8 7 6 5 4 3

1.4 1.2 1 0.8 0.6

2 1

0.4

2

2.5

3 3.5 4 4.5 5 Computational cost (log10)

5.5

6

2

2.5

3 3.5 4 4.5 5 Computational cost (log10)

Posterior means are shown as solid lines, credibility intervals as shaded areas or dashed lines

Michael Gutmann

BOLFI

5.5

6

. 20 / 23

Further benefits

I

The proposed method makes the inference more efficient. I

I

Enables inference for models which were out of reach till now I

I

Allowed us to perform far more comprehensive data analysis than with standard approach (Numminen et al, 2016)

model of evolution where simulating a single data set took us 12-24 hours (Marttinen et al, 2015)

Enables easier assessment of parameter identifiability for complex models I

model about transmission dynamics of tuberculosis (Lintusaari et al, 2016)

Michael Gutmann

BOLFI

21 / 23

Open questions

I

Model: How to best model the distance between simulated and observed data?

I

Acquisition function: Can we find strategies which are optimal for parameter inference?

I

Efficient high-dimensional inference: Can we use the approach to infer the joint distribution of 1000 variables?

see JMLR paper for a discussion

Michael Gutmann

BOLFI

22 / 23

Summary

I

Topic: Inference for models where the likelihood is intractable but sampling is possible

I

Inference principle: Find parameter values for which the distance between simulated and observed data is small

I

Problem considered: Computational cost

I

Proposed approach: Combine statistical modeling of the distance with decision making under uncertainty (Bayesian optimization)

I

Outcome: Approach increases the efficiency of the inference by several orders of magnitude

Michael Gutmann

BOLFI

23 / 23

References I M.U. Gutmann and J. Corander. Bayesian optimization for likelihood-free inference of simulator-based statistical models, Journal of Machine Learning Research, 17(125): 1–47, 2016 I J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander. Fundamentals and Recent Developments in Approximate Bayesian Computation, Systematic Biology, in press, 2016 I E. Numminen, M.U. Gutmann, M. Shubin, et al. The impact of host metapopulation structure on the population genetics of colonizing bacteria Journal of Theoretical Biology, 396: 53–62, 2016 I J. Lintusaari, M.U. Gutmann, S. Kaski, and J. Corander. On the identifiability of transmission dynamic models for infectious disease Genetics, 202(3): 911–918, 2016 I P. Marttinen, N.J. Croucher, M.U. Gutmann, J. Corander, and W.P. Hanage. Recombination produces coherent bacterial species clusters in both core and accessory genomes, Microbial Genomics, 1(5), 2015 I Numminen et al. Estimating the Transmission Dynamics of Streptococcus pneumoniae from Strain Prevalence Data. Biometrics 9, 2013. I N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, 2012. I S.N. Wood. Statistical inference for noisy nonlinear ecological dynamic systems, Nature, 466: 1102–1104, 2010 I D. Cox and S. John. A statistical method for global optimization, Proc. IEEE Conference on Systems, Man and Cybernetics, 2: 1241–1246, 1992