Asynchronous Parallel Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, Barnabás Póczos AutoML Workshop, ICML, Sydney, Aug 2017 Summary:

Parallelised Thompson Sampling for BO

• Setting: Bayesian optimisation with parallel evaluations. • A direct application of Thompson sampling in synchronous and asynchronous parallel settings does essentially as well as if the evaluations were made in sequence. • When evaluation time is factored in, the asynchronous version outperforms the synchronous and sequential versions. • Proposed methods are conceptually and computationally much simpler than existing methods for parallel BO.

A straightforward application of TS to parallel settings: Asynchronous: asyTS Synchronous: synTS

Input: Prior GP GP(0, κ). D1 ← ∅, GP1 ← GP(0, κ). for j = 1, 2, . . . 1. Wait for all workers to finish. 2. Dj ← Dj−1 ∪ {(xm, ym)}M m=1 where (xm, ym) are worker m’s query/observation. 3. Compute posterior GP with Dj. 4. Draw m samples gm ∼ GP, ∀m. 5. Deploy worker m at argmax gm, ∀m.

Gaussian Process (Bayesian) Optimisation Expensive Blackbox Function Examples: - Hyper-parameter Tuning - ML estimation in Astrophysics - Optimal policy in Autonomous Driving

Main Theoretical Results: Let f ∼ GP(0, κ). Then for seqTS, synTS, and asyTS, after n √ evaluations we have E[SR(n)] ≲ log(n)Ψn/n.

f (x∗ )

x∗

Let the time taken for an evaluation be random, Then the expected number of completed evaluations for seqTS, synTS, and asyTS satisfy nseq < nsyn < nasy.

x

f (x)

Minimise Simple Regret. SR(n) = f(x⋆) − max f(xt).

Therefore, asyTS achieves asymptotically better regret with time ′ E[SR (T)] than synTS and seqTS. .

t=1,...,n

x

Bayesian Optimisation via Thompson Sampling Model f ∼ GP(0, κ). At time t sample g from posterior GP. Choose xt = argmaxx∈X g(x). yt ← evaluate f at xt. f (x)

Distribution

f (x)

Unif(a, b) HN (ζ ) 2

x

x

xt

N ← # of completed evaluations. In parallel settings, this is ′ by all M workers. SR (T) is practically more relevant than SR(n) and leads to new results in parallel BO.

1 b−a for x ∈ √ x2 − √2 e 2ζ 2 for ζ π

seqTS

(a, b) nseq = x > 0 nseq =

λe−λx for x > 0

Exp(λ)

x

2T b+a √ T√π ζ 2

synTS

asyTS

nsyn = M T(M+1) a+bM nasy = Mnseq Mnseq √ nsyn ≍

log(M) Mnseq log(M)

nsyn ≍

nseq = λT

nasy = Mnseq nasy = Mnseq

Difference between synchronous and asynchronous bounds grows with M and is pronounced for heavy tailed distributions.

This work: Parallel evaluations with M workers Several methods for this setting, but they either, ▶ cannot handle asynchronicity. ▶ do not come with theoretical guarantees. ▶ are conceptually/computationally complex.

Experiments Hartmann18, d = 18, M = 25, halfnormal

Park2, d = 4, M = 10, halfnormal synRAND synBUCB synUCBPE synTS asyRAND asyUCB asyEI asyHUCB asyHTS asyTS

10 0

SR′ (T )

Simple regret on a time budget: After time T, { f(x⋆) − maxj≤N f(xj) if N ≥ 1 ′ SR (T) = . maxx∈X |f(x⋆) − f(x)| otherwise

pdf p(x)

CurrinExp-14, d = 14, M = 35, pareto(k=3)

6.5 6

25

5.5 5 20 4.5

SR′ (T )

f (x)

nseq, nsyn, nasy for different random completion time models.

SR′ (T )

f : X ≡ [0, 1] → R is a noisy, expensive, black-box function. x⋆ = argmaxx∈X f(x) is a maximiser. d

f (x)

Input: Prior GP GP(0, κ). D1 ← ∅, GP1 ← GP(0, κ). for j = 1, 2, . . . 1. Wait for a worker to finish. 2. Dj ← Dj−1 ∪ {(x′, y′)} where (x′, y′) are the worker’s previous query/observation. 3. Compute posterior GP with Dj. 4. Draw a sample g ∼ GP. 5. Deploy worker at argmax g.

4 3.5

15

3 2.5 10

10 -1 0

5

10

15

Time units (T )

20

25

0

5

10

15

20

Time units (T )

25

30

0

5

10

Time units (T )

See paper for more synthetic and real experiments. Selected References: • Russo D. et al 2014, Learning to Optimize via Posterior Sampling. • Srinivas N. et al. 2010, Gaussian Process Optimization in the Bandit Setting ….

15

20

Asynchronous Parallel Bayesian Optimisation via ...

Asynchronous Parallel Bayesian Optimisation via Thompson Sampling. Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, Barnabás Póczos.

2MB Sizes 1 Downloads 282 Views

Recommend Documents

Asynchronous Parallel Bayesian Optimisation via ...
Related Work: Bayesian optimisation methods start with a prior belief distribution for f and ... We work in the Bayesian paradigm, modeling f itself as a random quantity. ..... Parallel predictive entropy search for batch global optimization.

Asynchronous Parallel Coordinate Minimization ... - Research at Google
passing inference is performed by multiple processing units simultaneously without coordination, all reading and writing to shared ... updates. Our approach gives rise to a message-passing procedure, where messages are computed and updated in shared

Asynchronous Parallel Coordinate Minimization ... - Research at Google
Arock: An algorithmic framework for asynchronous parallel coordinate updates. SIAM Journal on Scientific Computing, 38(5):A2851–A2879, 2016. N. Piatkowski and K. Morik. Parallel inference on structured data with crfs on gpus. In International Works

BAYESIAN DEFORMABLE MODELS BUILDING VIA ...
Abstract. The problem of the definition and the estimation of generative models based on deformable tem- plates from raw data is of particular importance for ...

Deterministic Reductions in an Asynchronous Parallel ...
paper, we present a new reduction construct for Concur- rent Collections (CnC). CnC is a deterministic, asynchronous parallel programming model in which data ...

Learning Click Models via Probit Bayesian Inference
Oct 26, 2010 - republish, to post on servers or to redistribute to lists, requires prior specific ... P e rp le xity. Query Frequency. UBM(Likelihood). UBM(MAP). Figure 1: The perplexity score on different query frequencies achieved by the UBM model

Sensorimotor coupling via Dynamic Bayesian Networks
hand movements and a high degree of voluntary attentional control in directing fixations .... a function α(d) that associates the data with an hypothesis. A ..... Computer Science Division (2002). [6] A. De ... movements. Science, 265 540-543, 1994.

Learning Click Models via Probit Bayesian Inference
Oct 26, 2010 - ural to handle very large-scale data set. The paper is organized as .... pose, we use an online learning scheme referred to as Gaus- sian density filtering ..... a default Gaussian before the first training epoch of xi. The weighted ..

Parallel Dynamic Tree Contraction via Self-Adjusting ...
rithm for the dynamic trees problem, which requires computing ... This problem requires computing various prop- ..... Symposium on Cloud Computing, 2011.

Learning Click Models via Probit Bayesian Inference
Oct 26, 2010 - web search ranking. ..... computation can be carried out very fast, as well as with ... We now develop an inference algorithm for the framework.

Black Box Optimization via a Bayesian ... - Research at Google
It is fast and easy to implement, and has performance comparable to CMA–ES on a suite of benchmarks while spending less CPU in the optimization algorithm, and can exhibit better overall performance than Bayesian Optimization when the objective func

Scalable Parallel Debugging via Loop-Aware Progress ...
root-cause analysis of subtle errors with minimal manual ef- fort. Tools that embody this paradigm must automatically and scalably detect an error at run-time, ...

Semantics of Asynchronous JavaScript - Microsoft
ing asynchronous callbacks, for example Zones [26], Async. Hooks [12], and Stacks [25]. Fundamentally ..... {exp: e, linkCtx: currIdxCtx};. } bindCausal(linke) { return Object.assign({causalCtx: currIdxCtx}, linke); .... the callbacks associated with

Symmetry Breaking by Nonstationary Optimisation
easiest to find under the variable/value order- ing but dynamic ... the problem at each search node A is to find a .... no constraint programmer would use such a.

Static Deadlock Detection for Asynchronous C# Programs
contents at url are received,. GetContentsAsync calls another asynchronous proce- dure CopyToAsync .... tions are scheduled, and use it to define and detect deadlocks. ...... work exposes procedures for asynchronous I/O, network op- erations ...

Synchronous and Channel-Sense Asynchronous ...
Abstracr-Adaptive random-access schemes are introduced and analyzed to provide access-control supervision for a multiple-access communication channel. The dynamic group-random-access (DGRA) schemes introduced in this paper implement an adaptive GRA s

Asynchronous Byzantine Consensus - automatic ...
Jun 24, 2007 - A. B. C normal phase recovery phase normal phase recovery phase liveness: processes decide ... usually always safety: one decision per ... system state execution emphasis speed robustness number of steps small (fast) large (slow) solut

Asynchronous Stochastic Optimization for ... - Research at Google
for sequence training, although in a rather limited and controlled way [12]. Overall ... 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) ..... Advances in Speech Recognition: Mobile Environments, Call.

Organic Search Engine Optimisation Somerset.pdf
attorney for your company, you could end up with legal problems that. seriously impact your ability to do business (or sometimes even stay in. business).

Asynchronous Stochastic Optimization for ... - Vincent Vanhoucke
send parameter updates to the parameter server after each gradient computation. In addition, in our implementation, sequence train- ing runs an independent ...

Unsupervised Features Extraction from Asynchronous ...
Now for many applications, especially those involving motion processing, successive ... 128x128 AER retina data in near real-time on a standard desktop CPU.