Supplementary Material: Adaptive Consensus ADMM for Distributed Optimization

Zheng Xu 1 Gavin Taylor 2 Hao Li 1 M´ario A. T. Figueiredo 3 Xiaoming Yuan 4 Tom Goldstein 1

This is the supplemental material for Adaptive Consensus ADMM (ACADMM) (Xu et al., 2017c). We provide details of proofs and experimental settings, in addition to more results. Our proof generalizes the variational inequality approach in (He et al., 2000; He & Yuan, 2012; 2015; Xu et al., 2017b).

1. Proof of lemmas

Let y = y ∗ , z = z ∗ in VI (S4), and y = y k+1 , z = z k+1 in VI (13), and sum the two equalities together to get ∗ (∆zk+1 )T Ω(∆zk+ , T k ) ≥

(S5)

∗ (∆zk+1 )T (F (z ∗ ) − F (z k+1 )).

Since F (z) is monotone, the right hand side is nonnegative. Now, substitute Ω(∆zk+ , T k ) into (S5) to get − (A∆u∗k+1 )T T k (B∆vk+ )

1.1. Proof of Lemma 1 (17)

(S6)

+ (∆λ∗k+1 )T (T k )−1 ∆λ+ k ≥ 0.

Proof. By using the updated dual variable λk+1 in (10), VI (15) can be rewritten as ∀v, g(v) − g(v k+1 ) − (Bv − Bv k+1 )T λk+1 ≥ 0. (S1)

If we use the feasibility constraint of optimal solution (Au∗ + Bv ∗ = b) and the dual update formula (10), we have k ∗ T k A∆u∗k+1 = ∆λ+ (S7) k − T B∆vk+1 . Substitute this into (S6) yields

Similarly, in the previous iteration,

∗ (B∆vk+1 )T T k B∆vk+ + (∆λ∗k+1 )T (T k )−1 ∆λ+ k

∀v, g(v) − g(v k ) − (Bv − Bv k )T λk ≥ 0.

(S8)

≥ (B∆vk+ )T ∆λ+ k

(S2)

The proof (18) is concluded by applying (17) to (S8). Let v = v k in (S1) and v = v k+1 in (S2), and sum the two inequalities together. We conclude (Bv k+1 − Bv k )T (λk+1 − λk ) ≥ 0.

(S3)

1.3. Proof of Lemma 1 (19) Proof. k∆zk∗ k2H k = kz ∗ − z k k2H k ∗

=

1.2. Proof of Lemma 1 (18)

= Proof. VI (16) can be rewritten as ≥

φ(y) − φ(y k+1 )+ (z − z

k+1 T

)

F (z

k+1

)+

Ω(∆zk+ , T k )



≥ 0, (S4)

where Ω(∆zk+ , T k ) = (−AT T k B∆vk+ ; 0; (T k )−1 ∆λ+ k ). 1 University of Maryland, College Park 2 United States Naval Academy, Annapolis, 3 Universidade de Lisboa, Portugal 4 Hong Kong Baptist University, Hong Kong. Correspondence to: Zheng Xu .

Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s).

(S9) z k k2H k

(S10)

∗ k∆zk+1 + ∆zk+ k2H k ∗ k∆zk+1 k2H k + k∆zk+ k2H k ∗ + 2(∆zk+1 )T H k ∆zk+ ∗ k∆zk+1 k2H k + k∆zk+ k2H k .

(S11)

= kz − z

k+1

+z

k+1



(S12) (S13)

Eq. (18) is used for the inequality in (S13), and Eq. (19) is derived by rearranging the order of k∆zk∗ k2H k ≥ ∗ k∆zk+1 k2H k + k∆zk+ k2H k . 1.4. Proof of Lemma 2 Proof. Applying the observation 1 (ka − dk2H − ka − ck2H ) 2 (S14) 1 + (kc − bk2H − kc − dk2H ), 2

(a − b)T H(c − d) =

Supplementary Material for Adaptive Consensus ADMM

we have

Hence ∗ k∆zk+1 k2H k ≤(1 + (η k )2 )k∆zk∗ k2H k−1

(˜ z k+1 − z)T H k ∆zk+ = (˜ z k+1 − z)H k (z k+1 − z k ) (S15) 1 z k+1 − z k k2H k − k˜ z k+1 − z k+1 k2H k )+ = (k˜ 2 (S16) 1 (kz k+1 − zk2H k − kz k − zk2H k ). 2

≤ ≤

We now consider z k+1 − z k + z k − z k+1 k2H k (S17) k˜ z k+1 − z k+1 k2H k = k˜ =k˜ z k+1 − z k k2H k + k∆zk+ k2H k − 2(˜ z k+1 − z k )T H k ∆zk+ ,

=2(˜ z

k T

−z ) H

k

∆zk+



k∆zk+ k2H k .

(S19)



(S20)

(S21)

=(˜ z k+1 − z k )T (2I − M k )T H k M k (˜ z k+1 − z k ) (S22) ˆ k+1 − λk k2 k −1 ≥ 0. =kλ (S23) (T )

=

k Y

(1 + (η t )2 )kz − z ∗ k2H 0

t=1 ∞ Y

(1 + (η t )2 )kz − z ∗ k2H 0

t=1 CηΠ

kz − z ∗ k2H 0 < ∞.

kz − z k k2H k ≤ (1 + (η k )2 )kz − z k k2H k−1 .

(S33) (S34)

(S35) (S36) (S37) (S38)

(S39)

Then we have l X

1 (kz k+1 −zk2H k −kz k −zk2H k ). (S24) 2

(S32)

Let z 0 = z k in Lemma 3, we have

Combining (S16) and (S23), we conclude (˜ z k+1 −z)T H k ∆zk+ ≥

k∆z1∗ k2H 0 < ∞.

kz − z ∗ k2H k ≤ (1 + (η k )2 )kz − z ∗ k2H k−1

(S18)

We then substitute ∆zk+ with M k (˜ z k+1 − z k ) in (12), k˜ z k+1 − z k k2H k − k˜ z k+1 − z k+1 k2H k

(1 + (η t )2 )k∆z1∗ k2H 0

(S31)

Let z 0 = z ∗ in Lemma 3, we have



k+1

(1 + (η t )2 )k∆z1∗ k2H 0

t=1 ∞ Y

t=1 = CηΠ

and get k˜ z k+1 − z k k2H k − k˜ z k+1 − z k+1 k2H k

k Y

(kz − z k k2H k − kz − z k k2H k−1 )

(S40)

k=1



l X

(η k )2 kz − z k k2H k−1

(S41)

k=1

1.5. Proof of Lemma 3

=

Proof. Assumption 1 implies (22), which suggests the diagonal matrices T k and T k−1 satisfy k

k 2

T ≤(1 + (η ) )T k −1

(T )



=kB(v − v 0 )k2T k + kλ − λ0 k2(T k )−1 ≤(1 + (η k )2 )(kB(v − v 0 )k2T k−1 + kλ − λ0 k2(T k−1 )−1 ) ≤(1 + (η k )2 )kz − z 0 k2H k−1 .

(S42)

l X

2(η k )2 (kz − z ∗ k2H k−1 + k∆zk∗ k2H k−1 )

(S43)

k=1

(S25) ≤

Then we have kz − z 0 k2H k

(η k )2 kz − z ∗ + z ∗ − z k k2H k−1

k=1

k−1

≤(1 + (η k )2 )(T k−1 )−1 .

l X

(S26) (S27) (S28)



l X k=1 ∞ X

2(η k )2 (CηΠ kz − z ∗ k2H 0 + CηΠ k∆z1∗ k2H 0 ) (S44) 2(η k )2 (CηΠ kz − z ∗ k2H 0 + CηΠ k∆z1∗ k2H 0 ) (S45)

k=1

=2CηΣ (CηΠ kz − z ∗ k2H 0 + CηΠ k∆z1∗ k2H 0 ) =2CηΣ CηΠ (kz



z ∗ k2H 0

+

k∆z1∗ k2H 0 )

< ∞.

(S46) (S47)

(S29)

The inequality (S25) is used to get from (S27) to (S28).

1.7. Proof of equivalence of generalized ADMM and DRS in Section 5.1

1.6. Proof of Lemma 4

Proof. The optimality condition for ADMM step (8) is

Proof. From (27) we know ∗ k∆zk+ k2H k +k∆zk+1 k2H k ≤ (1+(η k )2 )k∆zk∗ k2H k−1 . (S30)

0 ∈ ∂f (uk+1 ) − AT (λk + T k (b − Auk+1 − Bv k )), | {z } ˆ k+1 λ

(S48)

Supplementary Material for Adaptive Consensus ADMM 3

3

10

ENReg-S1 ENReg-S2 Logreg-S1 Logreg-S2 SVM-S1 SVM-S2

Iterations

Iterations

10

2

10

101

2

10

ENReg-S1 ENReg-S2 Logreg-S1 Logreg-S2 SVM-S1 SVM-S2

0

0.2

0.4

0.6

0.8

1

1

10 102

Correlation threshold

104

106

108

1010

Convergence constant paramter

parameter cor .

ˆ k+1 ∈ ∂f (uk+1 ). By exwhich is equivalent to AT λ ploiting properties of the Fenchel conjugate (Rockafellar, ˆ k+1 ). A similar argu1970), we get uk+1 ∈ ∂f ∗ (AT λ ment using the optimality condition for (9) leads to v k+1 ∈ ∂g ∗ (B T λk+1 ). Recalling the definition of fˆ, gˆ in (42), we arrive at ˆ k+1 ) and Bv k+1 ∈ ∂ˆ Auk+1 − b ∈ ∂ fˆ(λ g (λk+1 ). (S49) k

1.8. Proposition for proof in Section 5.2 Proposition 1 (Spectral DRS (Xu et al., 2017a)). Suppose the Douglas-Rachford splitting steps are used, ˆ k+1 − λk )/τ k + ∂ fˆ(λ ˆ k+1 ) + ∂ˆ 0 ∈ (λ g (λk ) ˆ k+1 ) + ∂ˆ 0 ∈ (λk+1 − λk )/τ k + ∂ fˆ(λ g (λk+1 ),

ENRegression-Synthetic1

102

CADMM RB-ADMM AADMM CRB-ADMM ACADMM

∂ˆ g (λ) = β λ + Φ,

101 10-2

10-1

100

101

102

Regularizer

Figure 3: ACADMM is robust to regularizer parameter ρ in EN regression problem.

(S50)

3.1. Sampling data matrices from Gaussian(s)

(S51)

For Synthetic1, on each compute node i, we create a data matrix Di ∈ Rni ×d with ni samples and d features using a standard normal distribution. For Synthetic2, we build 10 Gaussian feature sets {Di }. On each node, we then randomly choose an index ji , and randomly select two Gaussian parameters µ1 , . . . , µ10 ∈ R and σ1 , . . . , σ10 ∈ R. We then introduce heterogeneity across nodes by computing

and assume the subgradients are locally linear, and

103

ˆ k+1

We can then use simple algebra to verify λ , λ in (10) ˆ k+1 ), ∂ˆ and ∂ fˆ(λ g (λk+1 ) in (S49) satisfy the generalized DRS steps (43, 44).

ˆ = αλ ˆ+Ψ ∂ fˆ(λ)

Figure 2: ACADMM is robust to the convergence threshold Ccg .

Iterations

Figure 1: ACADMM is robust to the correlation threshold hyper-

(S52)

where α, β ∈ R, Ψ, Φ ⊂ Rp . Then, the minimal residual √ of fˆ(λk+1 )+ gˆ(λk+1 ) is obtained by setting τ k = 1/ α β.

Di ← Di ∗ σji + µji .

(S53)

2. More experimental results We provide more experimental results demonstrating the robustness of ACADMM in Fig. 1, Fig. 2 and Fig. 3.

3. Synthetic problems in experiments We provide the details of the synthetic data used in our experiments.

3.2. Correlation for Elastic Net regression Following standard method used to test elastic net regression in (Zou & Hastie, 2005), we introduce correlations into the datasets. We start by building a random Gaussian dataset Di on each node. We then select the number of active features as 0.6d. Then we randomly select three

Supplementary Material for Adaptive Consensus ADMM

Gaussian vectors vi,1 , vi,2 , vi,3 ∈ Rni . We then compute ∀j ∈{1, 2, . . . , 0.2d}, Di [:, j] ← Di [:, j] + vi,1 , ∀j ∈{0.2d + 1, 0.2d + 2, . . . , 0.4d}, Di [:, j] ← Di [:, j] + vi,2 , ∀j ∈{0.4d + 1, 0.4d + 2, . . . , 0.6d}, Di [:, j] ← Di [:, j] + vi,3 ,

(S54) (S55) (S56)

where Di [:, j] denotes the jth column of Di . 3.3. Regression measurement We use a groundtruth vector x ∈ Rd , where the first 0.6d features are 1 and the rest are 0, and generate measurements for the regression problem as Di x = ci

(S57)

where Di is random Gaussian. 3.4. Classification labels For classification problems, we add a constant dconst to the active features on half of the feature vectors stored on each node. This means we compute Di [0.5ni : ni , 1 : 0.6d] ← Di [0.5ni : ni , 1 : 0.6d]+dconst . We then create a ground truth label vector ci ∈ Rni , which contains 1 for the permuted feature vectors, and −1 for the rest.

References He, Bingsheng and Yuan, Xiaoming. On the o(1/n) convergence rate of the douglas-rachford alternating direction method. SIAM Journal on Numerical Analysis, 50(2): 700–709, 2012. He, Bingsheng and Yuan, Xiaoming. On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numerische Mathematik, 130: 567–577, 2015. He, Bingsheng, Yang, Hai, and Wang, Shengli. Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. Jour. Optim. Theory and Appl., 106(2):337–356, 2000. Rockafellar, R. Convex Analysis. Princeton University Press, 1970. Xu, Zheng, Figueiredo, Mario AT, and Goldstein, Tom. Adaptive ADMM with spectral penalty parameter selection. AISTATS, 2017a.

Xu, Zheng, Figueiredo, Mario AT, Yuan, Xiaoming, Studer, Christoph, and Goldstein, Tom. Adaptive relaxed ADMM: Convergence theory and practical implementation. CVPR, 2017b. Xu, Zheng, Taylor, Gavin, Li, Hao, Figueiredo, Mario AT, Yuan, Xiaoming, and Goldstein, Tom. Adaptive consensus ADMM for distributed optimization. ICML, 2017c. Zou, Hui and Hastie, Trevor. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320, 2005.

Supplementary Material: Adaptive Consensus ADMM ...

inequalities together. We conclude. (Bvk+1 − Bvk)T (λk+1 − λk) ≥ 0. (S3). 1.2. Proof of ..... matrix Di ∈ Rni×d with ni samples and d features using a standard ...

273KB Sizes 1 Downloads 227 Views

Recommend Documents

Supplementary Material for Adaptive Relaxed ADMM
1Department of Computer Science, University of Maryland, College Park, MD. 2Instituto ..... data matrix D ∈ R1000×200 and a true low-rank solution given by X ...

Adaptive Consensus ADMM for Distributed Optimization
defined penalty parameters. We study ... (1) by defining u = (u1; ... ; uN ) ∈ RdN , A = IdN ∈. R. dN×dN , and B ..... DR,i = (αi + βi)λk+1 + (ai + bi), where ai ∈.

Adaptive Consensus ADMM for Distributed Optimization
Adaptive Consensus ADMM for Distributed Optimization. Zheng Xu with Gavin Taylor, Hao Li, Mario Figueiredo, Xiaoming Yuan, and Tom Goldstein ...

Supplementary Material
Jan Heufer ∗. *TU Dortmund University, Department of Economics and Social Science, 44221 Dortmund,. Germany. .... 3 More Details on the Data Analysis.

Supplementary Material for Adaptive Relaxed ... - CVF Open Access
1Department of Computer Science, University of Maryland, College Park, MD. 2Instituto de ..... use a loop to simulate the distributed computing of consen-.

Supplementary Material for Adaptive Relaxed ... - CVF Open Access
1Department of Computer Science, University of Maryland, College Park, MD ... 3Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, ...

Supplementary Material
gaze fixation point is detected by an eye tracker. ∗indicates equal contribution. (a) Random ... 1) confirms quanti- tatively our intuition, that the location of the hands are in- deed important for learning about hand-object interactions. Based on

Supplementary Material - Arkivoc
General Papers. ARKIVOC 2015 (vii) S1-S13. Page S3. ©ARKAT-USA, Inc. δ /ppm. 1. H Assignment. 8.60 (brs, 1H). 5.35 (t, J = 7.3 Hz, 1H). 5.08 (t, J = 6.1 Hz, ...

Supplementary Material
By the definition of conjunctive pattern, since S does not satisfy pc, then S ... Then there exists at least one instance of p o in S: {I1,I2,...,Im},. I1 ∈ I(p c. 1 ,S),...,Im ∈ I(p c m,S), such that ∀t1 ∈. I1,..., ∀tm ∈ Im,t1 < t2 < ...

Supplementary Material
and Business Cycles Conference, the Bank of Canada 2014 Fellowship ..... CGH exhibited an upward trend in the frequency of sales that could explain why they find ...... 12Fox and Sayed (2016) use the IRI scanner dataset to document the ...

Supplementary Material
The data are provided a for 10 subsectors, which I aggregate up to three sectors as follows. ... CAN Canada ...... Structural change in an open economy. Journal ...

Supplementary Material
SVK Slovakia. CYP Cyprus. ITA. Italy. SVN Slovenia. CZE Czech Republic JPN Japan. SWE Sweden. DEU Germany. KOR South Korea. TUR Turkey. DNK Denmark. LTU Lithuania. TWN Taiwan. ESP Spain. LUX Luxembourg. USA United States. EST Estonia. LVA Latvia. RoW

Abstract Consensus problem ADMM and diagonal ...
Example: • Others: sparse logistic regression, support vector machines .... (b Au k+1. Bv k+1. ) u k+1 i. = arg min ui fi(ui) + hλ k i. , v k uii + т k i /2 kv k uik. 2 v k+1.

Efficient Repeated Implementation: Supplementary Material
strategy bi except that at (ht,θt) it reports z + 1. Note from the definition of mechanism g∗ and the transition rules of R∗ that such a deviation at (ht,θt) does not ...

Supplementary Material for
Aug 3, 2016 - alternatives are “autocracy and risk-sharing” and “democracy and investment.” In other words, the .... whether seizing power. A coup is .... Our main sources are Galasso (1976), Stearns (2001), and Ortu (2005). References.

Supplementary Online Material
Branstetter, M.G. (2009) The ant genus Stenamma Westwood (Hymenoptera: Formicidae) redefined, with a description of a new genus Propodilobus. Zootaxa,.

Supplementary Material for
Fujitsu Research & Development Center Co., Ltd, Beijing, China. {wangzhengxiang,rjliu}@cn.fujitsu.com. Abstract. This supplementary material provides the ...

Electronic supplementary material
Jun 22, 2009 - ... of two Hill functions (a-b) with sufficiently distant midpoints is equivalent to the Hill function with the largest midpoint (c). Namely: Hill(x, K 1) · Hill(x, K 2) ≈ Hill(x, K 2) where K 1

Adaptive ADMM with Spectral Penalty Parameter ...
Page 7 .... (BT λ). ︸ ︷︷ ︸. ˆG(λ). ,. Define ˆλk+1 = λk + τk(b − Auk+1 − Bvk),. ADMM is equivalent to Douglas-Rachford Splitting (DRS) for dual. (u, v, λ) ⇔ (ˆλ, λ).

Supplementary Material - HEC Montréal
... the ONS website: http://www.ons.gov.uk/ons/datasets-and-tables/index.html. .... sitivity of sales to the unemployment rate for independent stores and chains.

Supplementary Material for
Sep 25, 2015 - Using archived and field-collected worker bumble bees, we tested for temporal changes in the tongue ... (Niwot Ridge Long Term Ecological Research Site, 40°3.567'N, 105°37.000'W), and by. P. A. Byron (17) ... these transects did not

Supplementary Material for ``Observations on ...
Nov 19, 2017 - C.2 Steady State in a Perturbed Environment. In this subsection we formally adapt the definitions of a consistent signal profile and of a steady state to perturbed environments. Let f((1−ϵ)·σ+ϵ·λ) : OS → OS be the mapping bet

Supplementary Material Methods Tables
αF was added (25nM final) to exponentially growing MATa bar1Δ cells in YPD (OD600 ≈ 0.3) and the culture incubated for two hours at. 30oC. >95% G1 arrest, as indicated by cells with small buds (schmoo formation), was confirmed by microscopy. Cell

Price Selection – Supplementary Material
†Email: Departamento de Economia, Pontifıcia Universidade Católica do Rio de .... and-tables/index.html, the Statistics Canada's Consumer Price Research ...