2

Adaptive Consensus ADMM for Distributed Optimization Zheng

1 Xu ,

Gavin

2 Taylor ,

Abstract

Hao

1 Li ,

3 Figueiredo ,

Mario

4 Yuan ,

Xiaoming

ADMM and diagonal penalty

Ø Study the alternating direction method of multipliers (ADMM) for distributed model fitting problems Ø Boost the ADMM performance by using different fine-tuned algorithm parameters on each worker node. Ø Automatically tune parameters without user oversight by assuming Barzilai-Borwen style gradients. Ø Present O(1/k) convergence rate for adaptive ADMM methods with node-specific parameters.

Consensus problem

Au

u,v

Bvi +

1/2kb

2 BvkT

Au

u v

k+1

= arg min f (u) + h Au, = arg min g(v) + h Bv,

k

u

v

k+1

k

=

k

+ T (b

Au

k+1

i + 1/2kb

i+

Bv

1/2kb

k+1

Au

Ø Dual interpretation Ø Linear assumption

Ø Adaptive rule

Bv k

k+1

Tk T

2

)

4

6

6

8

=

10 12

i=1

u1 and f (u1 ) ui and f (ui ) uN and f (uN )

Ø Consensus ADMM k+1 ui

v

local nodes

k+1

= arg min fi (ui ) + h ui

= arg min g(v) + =

k i

+

N X

(h

v

k i,

i=1

k ⌧i

(v

k+1

k k ⌧ ui i + i /2 kv

v

k+1 ui i

ui k

k+1 2 ui k )

central serverserver v andvg(v) central and g(v)

Ø Bounded adaptivity

k=1

2

4

4

4

6

6

6

8

8

=

10 12



12

10 12

14

14

16

16

16

18

18

18

20

20 0.5 0.6 0.7 0.8 0.9 1 1. 1.2 1.3 1.4 1.5

14

20 2

4

6

8

10

12

14

16

18

20

0.5 0.6 0.7 0.8 0.9 1 1. 1.2 1.3 1.4 1.5

1 f (u ) = kD u i i i i • Example: 2

k 1 k 1, ⌧i /⌧i

⇢2 2 ci k , g(v) = ⇢1 |v| + kvk . 2 2

• Others: sparse logistic regression, support vector machines (SVMs), semidefinite programming (SDP)

Ø Gradient descent xk+1 = xk ⌧k rF (xk ) Ø Spectral (Barzilai-Borwen) stepsize ⌧k = 1/↵

solving rF (xk )

rF (xk

=

12 14

k ↵cor,i

18

14 16

k cor,i

18 20

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

v

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

v

k0

k0 i

k i

k C ⌧ cg k+1 i k = max{min{ˆ ⌧i , (1 + 2 )⌧i } , } k 1 + Ccg/k2

Dataset

CADMM

RB-ADMM

AADMM

(Boyd et al., 2011)

(He et al., 2000)

(Xu et al., 2017a)

100+(1.49e4) 88(1.29e3) 310(700) 152(402) 1000+(930) 172(287) 100+(2.01e3) 100+(2.14e3)

Proposed (Song et al., 2016) ACADMM 87(1.27e4) 14(2.18e3) 149(368) 44(118) 285(340) 41(88.0) 100+(2.14e3) 30(703) CRB-ADMM

40(5.99e3) 310(727) 73(127) 35(860)

3

10

ENRegression-Synthetic2

2

10

CADMM RB-ADMM AADMM CRB-ADMM ACADMM 0

10

10

2

10

4

CADMM RB-ADMM AADMM CRB-ADMM ACADMM

1

10

1

10

10

10

3

10

2

10

1

2

ENRegression-Synthetic2

CADMM RB-ADMM AADMM CRB-ADMM ACADMM

104

105

Number of samples

Number of cores

Ø Acceleration by distributed computing

1)

= ↵(xk

k0 i

12

20

k

k i

10

18

ˆ k0 i

v k0

8

16

Initial penalty parameter

10

xk xk+1

ˆ k0 i

16

102

1

⌧k = 1/↵

(Id ; . . . ; Id ),

14

ENRegression-Synthetic2

3

10

10 10-2

min f (u) + g(v), subject to Au + Bv = b, where u = (u1 ; . . . ; uN ), A = IdN , B = PN and f (u) = i=1 fi (ui )

ˆk i

12

k i·

vk

Ø Robust to initial penalty, #cores, #data

assuming rF (x) = ↵x + a

u,v

6

8

EN regression MNIST Sparse logreg CIFAR10 SVM MNIST SDP Ham-9-5-6

1}.

Background: spectral stepsize

Ø Constrained problem in general form

6

10

ˆk i

Application

k 2 {(⌘i ) },

Ø The norm of the residuals converges to zero Ø The worst-case ergodic O(1/k) convergence rate in the variational inequality sense

8

10

i2{1,...,p}

4

Iterations (and runtime in seconds);128 cores are used; absence of convergence after n iterations is indicated as n+.

Iterations

2

=

k 1 k max{⌧i /⌧i

max

2

4

Ø Fast convergence on synthetic and benchmark dataset. More results in paper.

Iterations

2

(⌘ ) < 1, where (⌘ ) =

k 2 (⌘i )

Ø Classification/regression problem

k 2

2

Experiments

O (1/k) convergence with adaptivity k 2

uki 0

10

k+1 ⌧i

k+1 ui )

1 X

8i = 1, . . . , N

Ø Safeguarding convergence

2

k ⌧ + i /2 kv

i,

8

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

uki 0

uki

k

uki

20

0.5 0.6 0.7 0.8 0.9 1 1. 1.2 1.3 1.4 1.5

v

k+1 i

k ↵i ·

20

k i,

+ ,

xk

1)

SVM-Synthetic2

3

10

SVM-Synthetic2

4

103 10

Seconds

ui ,v

g ˆ( )

and @ˆ g( ) = M

p k ⌧i = 1/ ↵i

2

4

18

fi (ui ) + g(v), subject to ui = v.

fˆ( )

Iterations

min

min f (A ) h , bi + g (B ) | {z } | {z }

k

16

Ø Objective

T

Ø Curvature estimation and safeguarding linear assumption

2

Bvk



T

Split equations about T, M↵ , M into blocks, and apply the spectral penalty proposition (Xu et al. AISTATS 2017) for each block

k 2

Au



where Consensus M↵ , MADMM arefordiagonal matrices. Adaptive Distributed Optimization

Ø Alternating direction method of multipliers (ADMM) k

1 Goldstein

@ fˆ( ˆ ) = M↵ ˆ +

where T = diag(⌧1 Id , . . . , ⌧N Id ) is a diagonal matrix, 2 and kxkT = xT x

k+1

4

Adaptive Consensus ADMM

Ø Saddle point problem by augmented Lagrangian max min f (u) + g(v) + h , b

and Tom

14

N X

3

Iterations

1

2

101

10

CADMM RB-ADMM AADMM CRB-ADMM ACADMM

10

1

10

Number of cores

2

10

2

1

CADMM RB-ADMM AADMM CRB-ADMM ACADMM

101

102

Number of cores

Abstract Consensus problem ADMM and diagonal ...

Example: • Others: sparse logistic regression, support vector machines .... (b Au k+1. Bv k+1. ) u k+1 i. = arg min ui fi(ui) + hλ k i. , v k uii + т k i /2 kv k uik. 2 v k+1.

4MB Sizes 1 Downloads 215 Views

Recommend Documents

Supplementary Material: Adaptive Consensus ADMM ...
inequalities together. We conclude. (Bvk+1 − Bvk)T (λk+1 − λk) ≥ 0. (S3). 1.2. Proof of ..... matrix Di ∈ Rni×d with ni samples and d features using a standard ...

Adaptive Consensus ADMM for Distributed Optimization
defined penalty parameters. We study ... (1) by defining u = (u1; ... ; uN ) ∈ RdN , A = IdN ∈. R. dN×dN , and B ..... DR,i = (αi + βi)λk+1 + (ai + bi), where ai ∈.

Adaptive Consensus ADMM for Distributed Optimization
Adaptive Consensus ADMM for Distributed Optimization. Zheng Xu with Gavin Taylor, Hao Li, Mario Figueiredo, Xiaoming Yuan, and Tom Goldstein ...

Solvability problem for strong nonlinear non- diagonal ...
the heat operator, function b = 0 satisfies d) and has a special structure. It is known that in this problem singularities may appear in time inside of Q (see. [2]). As we see, there exist two reasons of nonsmoothness of global solution for (1), (2).

WZ Diagonal Weaving.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

Local Diagonal Extrema Pattern: A New and Efficient ...
[8] Murala and Wu, “Local ternary co-occurrence patterns: A new feature descriptor for. MRI and CT image retrieval,” Neurocomputing, 119: 399-412, 2013. 20. 40. 60. 80. 100. 40. 50. 60. 70. 80. Number of Top Matches. A. R. P. (%. ) LBP. LTP. CSLB

Abstract
Location: Biogeografía de Medios Litorales: Dinámicas y conservación (2014), ISBN 978-84-617-. 1068-3, pages 185-188. Language: Spanish. Near the town of ...

Consensus, cohesion and connectivity
Jun 23, 2017 - ity increases the predictive power of social influence theory, shown by re-using experimental data ... sciences—social cohesion (Section 4)—that was defined consider- ing a multiplicity of independent ..... but in actuality there a

JURNAL DIAGONAL (2).pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. JURNAL DIAGONAL (2).pdf. JURNAL DIAGONAL (2).pdf. Open.

Definitely Diagonal Scarf 3.2016.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Definitely ...

Evaluation and management of postpartum hemorrhage consensus ...
Evaluation and management of postpartum hemorrhage consensus from an international expert panel.pdf. Evaluation and management of postpartum ...

Combining Coregularization and Consensus-based ...
Jul 19, 2010 - Self-Training for Multilingual Text Categorization. Massih-Reza .... text classification. Section 4 describes the boosting-based algorithm we developed to obtain the language-specific clas- sifiers. In Section 5, we present experimenta

Consensus, Communication and Knowledge: an ...
Jan 26, 2006 - Keywords: Consensus, Common knowledge, Pairwise Communication. 1 Introduction. Aumann [1976] proved that if two individuals have the ...

BLOCK-DIAGONAL GEOMETRIC MEAN DECOMPOSITION (BD-GMD ...
the geometric mean decomposition (GMD) can be used to cre- ... all users. Computer simulations have shown that the proposed ..... Signal Processing, vol. 53,.

Crafting Consensus
Nov 30, 2013 - (9) for small ϵ and ∀i ∈ N. We call these voting functions with minimal ...... The details of the procedure, the Mathematica notebook, are.

BLOCK DIAGONAL LINEAR DISCRIMINANT ...
For example, in a Linear Discriminant Analysis (LDA) setting, a modeling assumption is typically made first (e.g., a full or a diagonal covariance matrix can be ...

Definitely Diagonal Scarf 11.3.pdf
Gauge: 3.5 sts per inch in Garter St. Abbreviations: ... (Lightly) Blocked measurement: 70” L x 14” W. Page 2 of 2. Definitely Diagonal Scarf 11.3.pdf. Definitely ...

Supplementary Material for Adaptive Relaxed ADMM
1Department of Computer Science, University of Maryland, College Park, MD. 2Instituto ..... data matrix D ∈ R1000×200 and a true low-rank solution given by X ...

abstract - GitHub
Terms of the work were reported and discussed on: 1. VII International scientific conference of graduate students, undergraduates, students of thermal engineering faculty of NTUU "KPI" (Kyiv, April 21-25, 2009). 2. VII International scientific confer

Adaptive ADMM with Spectral Penalty Parameter ...
Page 7 .... (BT λ). ︸ ︷︷ ︸. ˆG(λ). ,. Define ˆλk+1 = λk + τk(b − Auk+1 − Bvk),. ADMM is equivalent to Douglas-Rachford Splitting (DRS) for dual. (u, v, λ) ⇔ (ˆλ, λ).

Consensus and ordering in language dynamics
Aug 13, 2009 - We consider two social consensus models, the AB-model and the Naming ..... sity, 〈ρ〉, in a fully connected network of N = 10 000 agents for.

02 Bitcoin Protocol and Consensus - A High Level Overview.pdf ...
Whoops! There was a problem loading more pages. Retrying... 02 Bitcoin Protocol and Consensus - A High Level Overview.pdf. 02 Bitcoin Protocol and Consensus - A High Level Overview.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying 02 Bitc

Bayes and Big Data: The Consensus Monte Carlo ... - Semantic Scholar
Oct 31, 2013 - the parameters, or reduction of the sufficient statistics, or a long ... The idea behind consensus Monte Carlo is to break the data into groups ...