An Empirical Study of ADMM for Nonconvex Problems

Viewer
Transcript

An Empirical Study of ADMM for Nonconvex Problems Zheng Xu1 , Soham De1 , Mário A. T. Figueiredo2 , Christoph Studer 3 , Tom Goldstein1 1 Department of Computer Science, University of Maryland, College Park, MD 2 Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Portugal 3 Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY

Abstract The alternating direction method of multipliers (ADMM) is a common optimization tool for solving constrained and non-differentiable problems. We provide an empirical study of the practical performance of ADMM on several nonconvex applications, including `0 regularized linear regression, `0 regularized image denoising, phase retrieval, and eigenvector computation. Our experiments suggest that ADMM performs well on a broad class of non-convex problems. Moreover, recently proposed adaptive ADMM methods, which automatically tune penalty parameters as the method runs, can improve algorithm efficiency and solution quality compared to ADMM with a non-tuned penalty.

1

Introduction

The alternating direction method of multipliers (ADMM) has been applied to solve a wide range of constrained convex and nonconvex optimization problems. ADMM decomposes complex optimization problems into sequences of simpler subproblems that are often solvable in closed form. Furthermore, these sub-problems are often amenable to large-scale distributed computing environments [14, 23]. ADMM solves the problem min

u∈Rn ,v∈Rm

H(u) + G(v),

subject to Au + Bv = b,

(1)

¯ G : Rm → R, ¯ A ∈ Rp×n , B ∈ Rp×m , and b ∈ Rp , by the following steps, where H : Rn → R, τk uk+1 = arg min H(u) + hλk , −Aui + kb − Au − Bvk k22 (2) u 2 τk (3) vk+1 = arg min G(v) + hλk , −Bvi + kb − Auk+1 − Bvk22 v 2 λk+1 =λk + τk (b − Auk+1 − Bvk+1 ), (4) where λ ∈ Rp is a vector of dual variables (Lagrange multipliers), and τk is a scalar penalty parameter. The convergence of the algorithm can be monitored using primal and dual “residuals,” both of which approach zero as the iterates become more accurate, and which are defined as rk = b − Auk − Bvk ,

and

dk = τk AT B(vk − v k−1 ),

(5)

respectively [2]. The iteration is generally stopped when krk k2 ≤ tol max{kAuk k2 , kBvk k2 , kbk2 } and kdk k2 ≤ tol kAT λk k2 ,

(6)

ZX, SD, and TG were supported by US NSF grant CCF-1535902 and by US ONR grant N00014-15-1-2676. CS was supported in part by Xilinx Inc., and by the US NSF under grants ECCS-1408006 and CCF-1535897.

where tol > 0 is the stopping tolerance. ADMM was introduced by Glowinski and Marroco [11] and Gabay and Mercier [10], and convergence has been proved under mild conditions for convex problems [9, 7, 15]. The practical performance of ADMM on convex problems has been extensively studied, see [2, 13, 28] and references therein. For nonconvex problems, the convergence of ADMM under certain assumptions are studied in [24, 20, 17, 25]. The current weakest assumptions are given in [25], which requires a number of strict conditions on the objective, including a Lipschitz differentiable objective term. In practice, ADMM has been applied on various nonconvex problems, including nonnegative matrix factorization [27], `p -norm regularization (0 < p < 1)[1, 5], tensor factorization [21, 29], phase retrieval [26], manifold optimization [19, 18], random fields [22], and deep neural networks [23]. The penalty parameter τk is the only free choice in ADMM, and plays an important role in the practical performance of the method. Adaptive methods have been proposed to automatically tune this parameter as the algorithm runs. The residual balancing method [16] automatically increase or decrease the penalty so that the primal and dual residuals have approximately similar magnitudes. The more recent AADMM method [28] uses a spectral (Barzilai-Borwein) rule for tuning the penalty parameter. These methods achieve impressive practical performance for convex problems and are guaranteed to converge under moderate conditions (such as when adaptivity is stopped after a finite number of iterations). In this manuscript, we study the practical performance of ADMM on several nonconvex applications, including `0 regularized linear regression, `0 regularized image denoising, phase retrieval, and eigenvector computation. While the convergence of these applications may (not) be guaranteed by the current theory, ADMM is one of the (popular) choices to solve these nonconvex problems. The following questions are addressed using these model problems: (i) does ADMM converge in practice, (ii) does the update order of H(u) and G(v) matter, (iii) is the local optimal solution good, (iv) does the penalty parameter τk matter, and (v) is an adaptive penalty choice effective?

2

Nonconvex applications

`0 regularized linear regression. Sparse linear regression can be achieved using the non-convex, `0 regularized problem 1 min kDx − ck22 + ρkxk0 , (7) x 2 where D ∈ Rn×m is the data matrix, c is a measurement vector, and x is the regression coefficients. ADMM is applied to solve problem (7) using the equivalent formulation 1 min kDu − ck22 + ρkvk0 subject to u − v = 0. (8) u,v 2 `0 regularized image denoising. The `0 regularizer [6] can be substituted for the `1 regularizer when computing total variation for image denoising. This results in the formulation [4] 1 min kx − ck22 + ρk∇xk0 (9) x 2 where c represents a given noisy image, ∇ is the linear discrete gradient operator, and k · k2 /k · k0 is the `2 /`0 norm. We solve the equivalent problem 1 min ku − ck22 + ρkvk0 subject to ∇u − v = 0. (10) u,v 2 The resulting ADMM sub-problems can be solved in closed form using fast Fourier transforms [12]. Phase retrieval. Ptychographic phase retrieval [30, 26] solves the problem 1 min ||abs(Dx) − c||22 , (11) x 2 where x ∈ Cn , D ∈ Cm×n , and abs(·) denotes the elementwise magnitude of a complex vector. ADMM is applied to the equivalent problem 1 min ||abs(u) − c||22 subject to u − Dv = 0. (12) u,v 2 2

Eigenvector problem. The eigenvector problem is a fundamental problem in numerical linear algebra. The leading eigenvalue of a matrix D is found by computing max kDxk22

subject to kxk2 = 1.

(13)

ADMM is applied to the equivalent problem min −kDuk22 + ι{z: kzk2 =1} (v) subject to u − v = 0,

(14)

where ιS is the characteristic function defined by ιS (v) = 0, if v ∈ S, and ιS (v) = ∞, otherwise.

3

Experiments & Observations

Experimental setting. We implemented “vanilla ADMM” (ADMM with constant penalty), and fast ADMM with Nesterov acceleration and restart [13]. We also implemented two methods for automatically selecting penalty parameters: residual balancing [16], and the spectral adaptive method [28]. For `0 regularized linear regression, the synthetic problem in [31, 13, 28] and realistic problems in [8, 31, 28] are investigated with ρ = 1. For `0 regularized image denoising, a one-dimensional synthetic problem was created by the process described in [31], and is shown in Fig. 3. For the total-variation experiments, the "Barbara" , "Cameraman", and "Lena" images are investigated, where Gaussian noise with zero mean and standard deviation 20 was added to each image (Fig. 4). ρ = 1 and ρ = 500 are used for the synthetic problem and image problems, respectively. For phase retrieval, a synthetic problem is constructed with a random matrix D ∈ C15000×500 , x ∈ C500 , e ∈ C15000 and c = abs(Dx + e). Three images in Fig. 4 are used. Each image is measured with 21 octanary pattern filters as described in [3]. For the eigenvector problem, a random matrix D ∈ R20×20 is used. L0 LinReg

L0 ImgRes

Eigenvector

Phase Retrieval 3

3

10

10

2

2

10

10

2

1

10

Iterations

Iterations

10

Iterations

Iterations

2

1

10

1

1

10

10

Vanilla ADMM Residual balance Adaptive ADMM

Vanilla ADMM Residual balance Adaptive ADMM

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

3

10

4

10

0

0

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

5

10

3

10

4

5

10

10

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

5

10

5

10

10

70 60 50

27

3

10

2

26 25

1

10

24

10

23 1

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

4

10

Vanilla ADMM Residual balance Adaptive ADMM

PSNR

Objective

28

3

10

2

29

Vanilla ADMM Residual balance Adaptive ADMM

PSNR

4

10

Vanilla ADMM Residual balance Adaptive ADMM 0 10 -5 -4 -3 -2 -1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 Initial penalty parameter

Vanilla ADMM Residual balance Adaptive ADMM

Objective

0

10

3

10

4

10

5

10

22 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

0

3

10

4

10

5

10

40 30 20 10

Vanilla ADMM Residual balance Adaptive ADMM

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

Vanilla ADMM Residual balance Adaptive ADMM

0 3

10

4

10

5

10

-10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

3

10

4

10

5

10

Figure 1: Sensitivity to the (initial) penalty parameter τ0 for the `0 regularized linear regression, eigenvector computation, "cameraman" denoising, and phase retrieval. (top) Number of iterations needed as a function of initial penalty parameter. (bottom) The objective/PSNR of the minima found for each non-convex problem.

Does ADMM converge in practice? The convergence of vanilla ADMM is quite sensitive to the choice of penalty parameter. For vanilla ADMM, the iterates may oscillate, and if convergence occurs it may be very slow when the penalty parameter is not properly tuned. The residual balancing method converges more often than vanilla ADMM, and the spectral adaptive ADMM converges the most often. However, none of these methods uniformly beats all others, and it appears that vanilla ADMM with a highly tuned stepsize can sometimes outperform adaptive variants. Does the update order of H(u) and G(v) matter? In Fig. 1, ADMM is performed by first minimizing with respect to the smooth objective term, and then the nonsmooth term. We repeat the experiments with the update order swapped, and report the results in Fig. 2 of the appendix. When updating the non-smooth term first, the convergence of ADMM for the phase retrieval problem becomes less reliable. However, for some problems (like image denoising), convergence happened a bit faster than with the original update order. Although the behavior of ADMM changes, there is no predictable difference between the two update orderings.

3

Is the local optimal solution good? The bottom row of Fig. 1 presents the objective/PSNR achieved by the ADMM variants when varying the (initial) penalty parameter. In general, the quality of the solution depends strongly on the penalty parameter chosen. There does not appear to be a predictable relationship between the best penalty for convergence speed and the best penalty for solution quality. Does the adaptive penalty work? In Table 1, we see that adaptivity not only speeds up convergence, but for most problem instances it also results in better minimizers. This behavior is not uniform across all experiments though, and for some problems a slightly lower objective value can be achieved using a finely tuned constant stepsize. Table 1: Iterations (with runtime in seconds) and objective (or PSNR) for the various algorithms and applications described in the text. Absence of convergence after n iterations is indicated as n+.

#samples × #features1

Vanilla Residual Adaptive ADMM balance [16] ADMM [28] 2000+(.621) 2000+(.604) 39(.018) Synthetic 50 × 40 1.71e4 1.71e4 15.2 2000+(.598) 2000+(.570) 1039(.342) Boston 506 × 13 1.50e5 1.50e5 1.34e5 2000+(.751) 2000+(.708) 28(.014) Diabetes 768 × 8 `0 regularized 384 648 285 linear regression 2000+(15.3) 78(.578) 63(.477) Leukemia 38 × 7129 19.0 19.0 19.0 2000+(.413) 2000+(.466) 29(.013) Prostate 97 × 8 1.14e3 380 324 2000+(.426) 2000+(.471) 45(.014) Servo 130 × 4 267 267 198 2000+(.701) 1171(.409) 866(.319) Synthetic1D 100 × 1 40.6 45.4 45.4 200+(35.5) 200+(35.1) 18(3.33) Barbara 512 × 512 `0 regularized 24.7 24.7 24.7 image restoration 200+(5.75) 200+(5.60) 6(.190) Cameraman 256 × 256 25.9 25.9 27.8 200+(35.5) 200+(35.8) 11(1.98) Lena 512 × 512 25.9 25.9 27.9 Synthetic 15000 × 500 200+(19.4) 94(9.01) 46(4.45) 59(91.1) 59(89.6) 50(88.1) Barbara 512 × 512 × 21 81.5 81.5 81.5 59(29.6) 55(19.4) 48(20.8) phase retrieval Cameraman 256 × 256 × 21 75.7 75.7 75.7 59(90.1)) 57(87.4) 52(92.0) Lena 512 × 512 × 21 81.4 81.5 81.5 1 width × height for image restoration; width × height × filters for phase retrieval Application

4

Dataset

Conclusion

We provide a detailed discussion of the performance of ADMM on several nonconvex applications, including `0 regularized linear regression, `0 regularized image denoising, phase retrieval, and eigenvector computation. In practice, ADMM usually converges for those applications, and the penalty parameter choice has a significant effect on both convergence speed and solution quality. Adaptive penalty methods such as AADMM [28] automatically select the penalty parameter, and perform optimization with little user oversight. For most problems, adaptive stepsize methods result in faster convergence or better minimizers than vanilla ADMM with a constant non-tuned penalty parameter. However, for some difficult non-convex problems, the best results can still be obtained by fine-tuning the penalty parameter.

4

References [1] S. Bouaziz, A. Tagliasacchi, and M. Pauly. Sparse iterative closest point. In Computer graphics forum, volume 32, pages 113–123. Wiley Online Library, 2013. [2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. and Trends in Mach. Learning, 3:1–122, 2011. [3] E. J. Candes, X. Li, and M. Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015. [4] R. Chartrand. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 14(10):707–710, 2007. [5] R. Chartrand and B. Wohlberg. A nonconvex ADMM algorithm for group sparsity with sparse groups. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6009–6013. IEEE, 2013. [6] B. Dong and Y. Zhang. An efficient algorithm for `0 minimization in wavelet frame based image restoration. Journal of Scientific Computing, 54(2-3):350–368, 2013. [7] J. Eckstein and D. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1-3):293–318, 1992. [8] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of statistics, 32(2): 407–499, 2004. [9] D. Gabay. Applications of the method of multipliers to variational inequalities. Studies in mathematics and its applications, 15:299–331, 1983. [10] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):17–40, 1976. [11] R. Glowinski and A. Marroco. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problémes de Dirichlet non linéaires. ESAIM: Modélisation Mathématique et Analyse Numérique, 9:41–76, 1975. [12] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM Journal on Imaging Sciences, 2(2):323–343, 2009. [13] T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk. Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences, 7(3):1588–1623, 2014. [14] T. Goldstein, G. Taylor, K. Barabin, and K. Sayre. Unwrapping ADMM: efficient distributed computing via transpose reduction. In AISTATS, 2016. [15] B. He and X. Yuan. On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numerische Mathematik, 130:567–577, 2015. [16] B. He, H. Yang, and S. Wang. Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. Jour. Optim. Theory and Appl., 106(2):337–356, 2000. [17] M. Hong, Z.-Q. Luo, and M. Razaviyayn. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM Journal on Optimization, 26(1):337–364, 2016. [18] A. Kovnatsky, K. Glashoff, and M. M. Bronstein. Madmm: a generic algorithm for non-smooth optimization on manifolds. arXiv preprint arXiv:1505.07676, 2015. [19] R. Lai and S. Osher. A splitting method for orthogonality constrained problems. Journal of Scientific Computing, 58(2):431–449, 2014. [20] G. Li and T. K. Pong. Global convergence of splitting methods for nonconvex composite optimization. SIAM Journal on Optimization, 25(4):2434–2460, 2015. [21] A. P. Liavas and N. D. Sidiropoulos. Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. IEEE Transactions on Signal Processing, 63(20):5450–5463, 2015. [22] O. Miksik, V. Vineet, P. Pérez, P. H. Torr, and F. Cesson Sévigné. Distributed non-convex admm-inference in large-scale random fields. In British Machine Vision Conference, BMVC, 2014.

5

[23] G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein. Training neural networks without gradients: A scalable ADMM approach. arXiv preprint arXiv:1605.02026, 2016. [24] F. Wang, Z. Xu, and H.-K. Xu. Convergence of bregman alternating direction method with multipliers for nonconvex composite problems. arXiv preprint arXiv:1410.8625, 2014. [25] Y. Wang, W. Yin, and J. Zeng. Global convergence of admm in nonconvex nonsmooth optimization. arXiv preprint arXiv:1511.06324, 2015. [26] Z. Wen, C. Yang, X. Liu, and S. Marchesini. Alternating direction methods for classical and ptychographic phase retrieval. Inverse Problems, 28(11):115010, 2012. [27] Y. Xu, W. Yin, Z. Wen, and Y. Zhang. An alternating direction algorithm for matrix completion with nonnegative factors. Frontiers of Mathematics in China, 7(2):365–384, 2012. [28] Z. Xu, M. A. Figueiredo, and T. Goldstein. Adaptive ADMM with spectral penalty parameter selection. arXiv preprint arXiv:1605.07246, 2016. [29] Z. Xu, F. Huang, L. Raschid, and T. Goldstein. Non-negative factorization of the occurrence tensor from financial contracts. NIPS tensor workshop, 2016. [30] C. Yang, J. Qian, A. Schirotzek, F. Maia, and S. Marchesini. Iterative algorithms for ptychographic phase retrieval. arXiv preprint arXiv:1105.5628, 2011. [31] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.

6

5

Appendix: more experimental results L0 LinReg

L0 ImgRes 2

10

Eigenvector

Phase Retrieval 3

Vanilla ADMM Residual balance Adaptive ADMM

3

10

10

2

10

2

1

10

Iterations

Iterations

Iterations

Iterations

2

10

1

10

1

1

10

10

Vanilla ADMM Residual balance Adaptive ADMM

0

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

3

10

4

10

0

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

5

10

0

3

10

4

10

Vanilla ADMM Residual balance Adaptive ADMM

75 70

Vanilla ADMM Residual balance Adaptive ADMM

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

5

10

3

10

4

10

5

10

Vanilla ADMM Residual balance Adaptive ADMM 0 10 -5 -4 -3 -2 -1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 Initial penalty parameter

2

29

80

10

70

Vanilla ADMM Residual balance Adaptive ADMM

28

60 50

60

26 25

Objective

65

Objective

27 PSNR

Objective

10

1

10

55 24 50 23

45 40 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

3

10

4

10

5

10

22 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

0

3

10

4

10

5

10

3

10

4

10

30 20 10

Vanilla ADMM Residual balance Adaptive ADMM

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

40

5

10

Vanilla ADMM Residual balance 0 Adaptive ADMM -10 -5 -4 -3 -2 -1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 Initial penalty parameter

Figure 2: Convergence results when the non-smooth objective term is updated first, and the smooth term is updated second. Sensitivity to the (initial) penalty parameter τ0 is shown for the synthetic problem of `0 regularized linear regression, eigenvector computation, the "cameraman" denoising problem, and phase retrieval. The top row shows the convergence speed in iterations. The bottom row shows the objective/PSNR achieved by the final iterates.

6 6.1

Appendix: implementation details `0 regularized linear regression

`0 regularized linear regression is a nonconvex problem 1 min kDx − ck22 + ρkxk0 x 2

(15)

where D ∈ Rn×m is the data matrix, c is the measurement vector, and x is the regression coefficients. ADMM is applied to solve problem (15) by solving the equivalent problem 1 min kDu − ck22 + ρkvk0 u,v 2

subject to u − v = 0.

(16)

The proximal operator of the `0 norm is the hard-thresholding, hard(z, t) = arg min kxk0 + x

1 kx − zk22 = z I{z:|z|>√2t} (z), 2t

(17)

where represents element-wise multiplication, and IS is the indicator function of the set S: IS (v) = 1, if v ∈ S, and IS (v) = 0, otherwise. Then the steps of ADMM can be written uk+1

= =

6.2

vk+1

=

λk+1

=

τ arg min kDu − ck22 + k0 − u + vk + λk /τ k22 u 2 ( (DT D + τ In )−1 (τ vk + λk + DT c) if n ≥ m (In − DT (τ Im + DDT )−1 D)(vk + λk /τ + DT c/τ ) if n < m τ arg min ρkvk0 + k0 − uk+1 + v + λk /τ k22 = hard(uk+1 − λk /τ, ρ/τ ) v 2 λk + τ (0 − uk+1 + vk+1 ).

(18) (19) (20) (21)

`0 regularized image denoising

The `0 regularizer [6] is an alternative to the `1 regularizer when computing total variation [12, 13]. `0 regularized image denoising solves the nonconvex problem 1 min kx − ck22 + ρk∇xk0 x 2

7

(22)

where c represents a given noisy image, ∇ is the linear gradient operator, and k · k2 /k · k0 denotes the `2 /`0 norm of vectors. The steps of ADMM for this problem are τ 1 uk+1 = arg min ku − ck22 + kvk + λk /τ − ∇uk22 u 2 2 = (I + τ ∇T ∇)−1 (c + τ ∇T (vk + λk /τ )) τ vk+1 = arg min ρkvk0 + k0 − ∇uk+1 + v + λk /τ k2 = hard(∇uk+1 − λk /τ, ρ/τ ) v 2 λk+1 = λk + τ (0 − ∇uk+1 + vk+1 )

(23) (24) (25) (26)

where the linear systems can be solved using fast Fourier transforms.

6.3

Phase retrieval

Ptychographic phase retrieval [30, 26] solves problem 1 min ||abs(Dx) − c||22 , x 2

(27)

where x ∈ Cn , D ∈ Cm×n , and abs(·) denotes the elementwise magnitude of a complex-valued vector. ADMM is applied to the equivalent problem 1 min ||abs(u) − c||22 subject to u − Dv = 0. u,v 2 Define the projection operator of a complex valued vector as 1 t t 1 absProj(z, c, t) = min kabs(x) − ck22 + kx − zk22 = abs(z) + c sign(z), x 2 2 1+t 1+t

(28)

(29)

where sign(·) denotes the elementwise phase of a complex-valued vector. In the following ADMM steps, notice that the dual variable λ ∈ Cm is complex, and the penalty parameter τ ∈ R is a real non-negative scalar, τ 1 uk+1 = arg min kabs(u) − ck22 + kDvk + λk /τ − uk22 = absProj(Dvk + λk /τ, c, τ ) u 2 2 τ vk+1 = arg min 0 + k0 − uk+1 + Dv + λk /τ k22 = D−1 (uk+1 − λk /τ ) v 2 λk+1 = λk + τ (0 − uk+1 + Dvk+1 ).

6.4

(30) (31) (32)

Eigenvector problem

The eigenvector problem is a fundamental problem in numerical linear algebra. The leading eigenvector of a matrix can be recovered by solving the Rayleigh quotient maximization problem max kDxk22

subject to kxk2 = 1.

(33)

ADMM is applied to the equivalent problem min −kDuk22 + ι{z: kzk2 =1} (v) subject to u − v = 0,

(34)

where ιS is the characteristic function of the set S: ιS (v) = 0, if v ∈ S, and ιS (v) = ∞, otherwise. ADMM steps are τ uk+1 = arg min −kDuk22 + k0 − u + vk + λk /τ k22 = (τ I − 2DT D)−1 (τ vk + λk ) u 2 uk+1 − λk /τ τ vk+1 = arg min ι{z: kzk2 =1} (v) + k0 − uk+1 + v + λk /τ k2 = v 2 kuk+1 − λk /τ k2 λk+1 = λk + τ (0 − uk+1 + vk+1 ).

The

7

(35) (36) (37)

Appendix: synthetic and realistic datasets

We provide the detailed construction of the synthetic dataset for our linear regression experiments. The same synthetic dataset has been used in [31, 13, 28]. Based on three random normal vectors νa , νb , νc ∈ R50 , the data matrix D = [d1 . . . d40 ] ∈ R50×40 is defined as   νa + ei , i = 1, . . . , 5,   ν + e , i = 6, . . . , 10, i b di = (38) νc + ei , i = 11, . . . , 15,   ν ∈ N (0, 1), i = 16, . . . , 40, i

8

where ei are random normal vectors from N (0, 1). The problem is to recover the vector ( 3, i = 1, . . . , 15, ∗ x = 0, otherwise

(39)

from noisy measurements of the form c = Dx∗ + eˆ, with eˆ ∈ N (0, 0.1) 50 Groundtruth Noisy Recovered

40 30 20 10 0 -10 0

20

40

60

80

100

Figure 3: The synthetic one-dimensional signal for `0 regularized image denoising. The groundtruth signal, noisy signal (PSNR = 37.8) and recovered signal by AADMM (PSNR = 45.4) are shown.

Groundtruth

Noisy

Recovered

Figure 4: The groundtruth image (left), noisy image (middle), and recovered image by AADMM (right) for `0 regularized image denoising. The PSNR of the noisy/recovered images are 21.9/24.7 for "Barbara", 22.4/27.8 for "Cameraman", 21.9/27.9 for "Lena".

9

An Empirical Case Study - STICERD

An Empirical Study

An Empirical Study of Non-Expert Curriculum Design for Machine ...

An empirical study of the efficiency of learning ... - Semantic Scholar

An Empirical Study of Firefighting Sensemaking ...

On the Effectiveness of Aluminium Foil Helmets: An Empirical Study ...

Fixing Performance Bugs: An Empirical Study of ... - NCSU COE People

An Empirical Study of Memory Hardware Errors in A ... - cs.rochester.edu

Dynamics of Peer Grading: An Empirical Study

Culture's Influence on Emotional Intelligence - An Empirical Study of ...

An Empirical Study of Auction Revenue Rankings

An Empirical Study of Firefighting Sensemaking Practices

Fixing Performance Bugs: An Empirical Study of ... - NCSU COE People

When the Network Crumbles: An Empirical Study of ...

An Empirical Study of Memory Hardware Errors in A ... - cs.rochester.edu

An Empirical Study on Uncertainty Identification in ... - Semantic Scholar

An Empirical study of learning rates in deep neural networks for ...

Screening for Internet Addiction: An Empirical ... - ScienceDirect.com

Broken Promises: An Empirical Study into Evolution ...

An empirical test of patterns for nonmonotonic inference

An Empirical Study on Uncertainty Identification in Social Media Context