Conditional Gradient with Enhancement and Thresholding for Atomic Norm Constrained Minimization

Nikhil Rao with Parikshit Shah Stephen Wright

University of Wisconsin - Madison

(Large) data can be modeled as made up of a few “simple” components

wavelet coefficients

unit rank matrices

... and many more

sparse overlapping sets paths/cliques

Atoms and the atomic norm We assume the variables can be represented as a combination of a small number of atoms

x2R

a2A

atoms

atomic set

p

x=

Pk

i=1 ci ai

kxkA = inf

(

ci

0 8i atomic norm

X

ca : x =

a2A

X

ca a, ca

a2A

)

0

atomic norm is the ‘L1’ analog for general structurally constrained signals

ai = ±ei ) kxkA = kxk1

ai = uv T ) kxkA = kxk⇤

ai =

uG kuG k

) kxkA =

P

G2G

kxG k2

The atoms can be edges/cliques in a graph, individual Fourier/ wavelet components ...

Chandrasekaran, V. et al. "The convex geometry of linear inverse problems." Foundations of Computational Mathematics 12.6 (2012)

Atoms and the atomic norm We assume the variables can be represented as a combination of a small number of atoms

x2R

p

a2A

atoms

atomic set

x=

Pk

i=1 ci ai

kxkA = inf

(

ci

0 8i atomic norm

X

ca : x =

a2A

X

ca a, ca

a2A

)

0

atomic norm is the ‘L1’ analog for general structurally constrained signals

ai = ±ei ) kxkA = kxk1

ai = uv T ) kxkA = kxk⇤

We would like to have a method that lets us solve

ai =

uG kuG k

) kxkA =

P

G2G

kxG k2

The atoms can be edges/cliques in a graph, individual Fourier/ wavelet components ...

min f (x) = 12 ky

xk2

subject to

kxkA  ⌧

Chandrasekaran, V. et al. "The convex geometry of linear inverse problems." Foundations of Computational Mathematics 12.6 (2012)

Motivation Frank-Wolfe / Conditional Gradient

(Frank and Wolfe. ’56, Clarkson ’08, Jaggi ’13, Jaggi and Sulovski ’10, Bach et. al., ’12, Harchaoui et al, .’12, Dudik et. al, ’12, Tewari et. al. ’11 ...)

Solve a (relatively) simple optimization at each iteration at = arg minhrft , ai a2A

line search ˆ = arg min f ([1 2[0,1]

update x = (1

ˆ )x + ˆ ⌧ at

]x + at )

Motivation Frank-Wolfe / Conditional Gradient

(Frank and Wolfe. ’56, Clarkson ’08, Jaggi ’13, Jaggi and Sulovski ’10, Bach et. al., ’12, Harchaoui et al, .’12, Dudik et. al, ’12, Tewari et. al. ’11 ...)

Solve a (relatively) simple optimization at each iteration 4

at = arg minhrft , ai a2A

3

line search ˆ = arg min f ([1 2[0,1]

update x = (1

true CG recovery

2

]x + at )

1 0 −1

ˆ )x + ˆ ⌧ at

−2 −3

500

1000

1500

2000

p = 2048, m = 512 Gaussian measurements, AWGN 0.01 The greedy update steps might choose suboptimal atoms to represent the solution, and/or lead to less parsimonious solutions and/or miss some components

Motivation Frank-Wolfe / Conditional Gradient

(Frank and Wolfe. ’56, Clarkson ’08, Jaggi ’13, Jaggi and Sulovski ’10, Bach et. al., ’12, Harchaoui et al, .’12, Dudik et. al, ’12, Tewari et. al. ’11 ...)

Solve a (relatively) simple optimization at each iteration 4

at = arg minhrft , ai a2A

3

line search ˆ = arg min f ([1 2[0,1]

update x = (1

true CG recovery

2

]x + at )

1 0 −1

ˆ )x + ˆ ⌧ at

−2 −3

500

1000

1500

2000

p = 2048, m = 512 Gaussian measurements, AWGN 0.01 The greedy update steps might choose suboptimal atoms to represent the solution, and/or lead to less parsimonious solutions and/or miss some components

Our goal is to develop a greedy scheme that retains the computational advantages of FW, and also incorporates a “self correcting” mechanism to purge suboptimal atoms

CoGEnT Solve

Conditional Gradient with Enhancement and Truncation

min f (x) = 12 ky

rt = y

xt

1

xk2 Subject to

wt = y

kxkA  ⌧

⌧ at

FORWARD STEP

at = arg mina2A hrf (xt A˜t = [At 1 at ]

1 ), ai

Conditional Gradient

very efficient in most cases t

=

hrt ,rt wt i krt wt k2

c˜t = [(1

t )ct 1

Line search parameter for the L2 loss t]

OPTIONALLY

O(k log(k)) projection onto simplex c˜t = min 12 ky A˜t ck2 c 0 kck1  ⌧ with c = [(1 t )ct 1 t ] as a warm start

Solve

x˜t = A˜t c˜t

The warm start makes enhancement efficient

Enhancement

CoGEnT

Conditional Gradient with Enhancement and Truncation BACKWARD STEP

1 2 abad = arg min{f (x˜t ) ca hrf (x˜t ), ai + ca k ak2 } ˜ 2 a2A Truncation ¯ ˜ set A = A\abad and find corresponding c¯t x ¯ = A¯c¯t compute and store this scalar when atom is selected

if

f (¯ x)  ⌘f (xt

xt = x ¯

1)

+ (1

⌘)f (x˜t )

At = A¯ ct = c¯t

Zhang ’08, Jain et. al. ’11 for OMP

Update

otherwise

xt = x˜t At = A˜t ct = c˜t

can remove multiple atoms at a single iteration Zhang, T. "Adaptive forward-backward greedy algorithm for learning sparse representations." IEEE Trans. Info Theory 57.7 (2011) Jain, P., Tewari, A. and Dhillon, I. “Orthogonal matching pursuit with replacement”. NIPS 2011

CoGEnT

Conditional Gradient with Enhancement and Truncation BACKWARD STEP

1 2 abad = arg min{f (x˜t ) ca hrf (x˜t ), ai + ca k ak2 } ˜ 2 a2A Truncation ¯ ˜ set A = A\abad and find corresponding c¯t x ¯ = A¯c¯t compute and store this scalar when atom is selected

if

f (¯ x)  ⌘f (xt

xt = x ¯

1)

+ (1

⌘)f (x˜t )

At = A¯ ct = c¯t

otherwise

xt = x˜t At = A˜t ct = c˜t

Zhang ’08, Jain et. al. ’11 for OMP

Update f (xt

1)

f (x)

can remove multiple atoms at a single iteration Zhang, T. "Adaptive forward-backward greedy algorithm for learning sparse representations." IEEE Trans. Info Theory 57.7 (2011) Jain, P., Tewari, A. and Dhillon, I. “Orthogonal matching pursuit with replacement”. NIPS 2011

CoGEnT

Conditional Gradient with Enhancement and Truncation BACKWARD STEP

1 2 abad = arg min{f (x˜t ) ca hrf (x˜t ), ai + ca k ak2 } ˜ 2 a2A Truncation ¯ ˜ set A = A\abad and find corresponding c¯t x ¯ = A¯c¯t compute and store this scalar when atom is selected

if

f (¯ x)  ⌘f (xt

xt = x ¯

1)

+ (1

⌘)f (x˜t )

Zhang ’08, Jain et. al. ’11 for OMP

Update

At = A¯ ct = c¯t

f (xt

otherwise

1)

f (x)

xt = x˜t At = A˜t ct = c˜t ⌘f (xt

1)

can remove multiple atoms at a single iteration

+ (1

⌘)f (˜ xt ) f (˜ xt )

Zhang, T. "Adaptive forward-backward greedy algorithm for learning sparse representations." IEEE Trans. Info Theory 57.7 (2011) Jain, P., Tewari, A. and Dhillon, I. “Orthogonal matching pursuit with replacement”. NIPS 2011

CoGEnT

Conditional Gradient with Enhancement and Truncation BACKWARD STEP

1 2 abad = arg min{f (x˜t ) ca hrf (x˜t ), ai + ca k ak2 } ˜ 2 a2A Truncation ¯ ˜ set A = A\abad and find corresponding c¯t x ¯ = A¯c¯t compute and store this scalar when atom is selected

if

f (¯ x)  ⌘f (xt

xt = x ¯

1)

+ (1

⌘)f (x˜t )

Zhang ’08, Jain et. al. ’11 for OMP

Update

At = A¯ ct = c¯t

f (xt

otherwise

1)

f (x)

xt = x˜t At = A˜t ct = c˜t ⌘f (xt

1)

can remove multiple atoms at a single iteration

+ (1

⌘)f (˜ xt )

f (¯ x) f (˜ xt )

Zhang, T. "Adaptive forward-backward greedy algorithm for learning sparse representations." IEEE Trans. Info Theory 57.7 (2011) Jain, P., Tewari, A. and Dhillon, I. “Orthogonal matching pursuit with replacement”. NIPS 2011

Revisiting the l1 example 4 3

true CG recovery

p = 2048, m = 512 Gaussian measurements, AWGN 0.01

2 1 0 −1 −2 −3

500

1000

1500

2000

4 3

true CoGEnT recovery

2 1 0 −1 −2 −3

500

1000

1500

2000

A Comparison with Frank-Wolfe Away steps (Guelat and Marcotte ’86) : At each iteration, choose an atom as follows:

af wd = arg minhrft , ai a2A

dt = af wd

xt

aaway = arg maxhrft , ai a2A

da = xt

aaway

if hdt , rft i > hda , rft i , at = aaway else at = af wd gives better solutions than vanilla FW, but does not always make solutions more sparse.

Truncation steps : At each iteration, choose an atom:

af wd = arg minhrft , ai a2A

choose an atom to remove based on the quadratic form

1 abad = arg min{ ca hrf (x˜t ), ai + c2a k ak2 } ˜ 2 a2A explicitly deletes atoms, and tries to represent the solution using remaining atoms. Guelat J. and Marcotte P. "Some comments on Wolfe's ‘away step’." Mathematical Programming 35.1 (1986): 110-119.

A Comparison with Frank-Wolfe Away steps (Guelat and Marcotte ’86) :

p = 3000 m=700 noisy CS measurements

At each iteration, choose an atom as follows:

af wd = arg minhrft , ai a2A

dt = af wd

xt

aaway = arg maxhrft , ai a2A

da = xt

aaway

true sparsity = 100 estimated sparsity = 344 L2 error = 0.0011 L1 error = 0.0020

if hdt , rft i > hda , rft i , at = aaway else at = af wd gives better solutions than vanilla FW, but does not always make solutions more sparse.

Truncation steps : At each iteration, choose an atom:

af wd = arg minhrft , ai a2A

choose an atom to remove based on the quadratic form

true sparsity = 100 estimated sparsity = 154 L2 error = 0.0006 L1 error = 0.0009

1 abad = arg min{ ca hrf (x˜t ), ai + c2a k ak2 } ˜ 2 a2A explicitly deletes atoms, and tries to represent the solution using remaining atoms. Guelat J. and Marcotte P. "Some comments on Wolfe's ‘away step’." Mathematical Programming 35.1 (1986): 110-119.

Convergence Noise-Free measurements: y = Suppose

x?

9x# : kx# kA < ⌧, y =

Then CoGEnT converges at a linear rate:

Noisy measurements: y = Suppose

x# f (xT )  f (x0 ) exp( CT )

x? + n

x# is the optimal solution

Then CoGEnT converges at a sub- linear rate:

f (xT )

f (x# ) 

C T

The proofs closely follow classical proofs for convergence of CG methods (Beck and Teboulle, ’04, Tewari et. al. ’11)

Tewari, A. et. al. "Greedy algorithms for structurally constrained high dimensional problems." NIPS 2011 Beck, A. and Teboulle, M.. "A conditional gradient method with linear rate of convergence for solving convex linear systems." Mathematical Methods of Operations Research 59.2 (2004): 235-247.

Results

length 2048 signal, 512 Gaussian measurements, 95% sparse, AWGN 0.01

L1 recovery 40

CoGEnT CG

CoGEnT CoGEnT !E CG

Objective

30

Objective

15

20

10

5

10

0

50

100 # Iterations

150

0 0

2

4

time

6

8

10

Results

length 2048 signal, 512 Gaussian measurements, 95% sparse, AWGN 0.01

L1 recovery

15

40

Objective

30

Objective

CoGEnT CG

CoGEnT CoGEnT !E CG

20

10

5

10

0

50

100 # Iterations

150

0 0

Latent group lasso: Proximal point methods require the replication of variables overlapping groups of size 50. # groups

true replicated dimension dimension

CoGEnT

SpaRSA

100

2030

5000

14.9

22.2

1000

20030

50000

210.9

461.6

1200

24030

60000

358.64

778.2

1500

30030

75000

574.9

1376.6

2000

40030

100000

852.02

2977

2

4

time

6

8

10

Results

length 2048 signal, 512 Gaussian measurements, 95% sparse, AWGN 0.01

L1 recovery

15

40

Objective

30

Objective

CoGEnT CG

CoGEnT CoGEnT !E CG

20

10

5

10

0

50

100 # Iterations

150

0 0

2

4

Latent group lasso: Proximal point methods require the replication of variables

# groups

6

8

10

Matrix completion 120

overlapping groups of size 50. true replicated dimension dimension

time

CoGEnT

SpaRSA

True singular values Conditional Gradient CoGEnT

100 80

100

2030

5000

14.9

22.2

1000

20030

50000

210.9

461.6

60

1200

24030

60000

358.64

778.2

40

1500

30030

75000

574.9

1376.6

2000

40030

100000

852.02

2977

20 0

5

10

15

20

More results CS off the grid (Tang, et al. ’12) CoGEnT 1

CG 1

True Recovered

True Recovered 0.8

0.6

0.6

Atoms: complex sinusoids employ an adaptive gridding procedure to refine our iterates, and backward step allows for the removal of suboptimal atoms

a0

a0

0.8

0.4

0.4

0.2

0.2

0 0

0.2

0.4 0.6 Frequency

0.8

1

0 0

0.2

0.4 0.6 Frequency

0.8

1

Sparse Overlapping Sets lasso (NR. et al, NIPS 13) Check if a group is “active” and select an index from the “most correlated” group 1 True Signal CoGEnT Recovery 0.5

convergence results hold even for approximate atom selections

0

0.5

1

200

400

600

800

1000

1200

Conclusions CoGEnT allows us to solve very general high dimensional inference problems Backward step yields sparse(r) solutions when compared to standard CG Identical convergence rates to CG

Extensions online settings demixing applications (preliminary results seem promising)

Thank You

Conditional Gradient with Enhancement and ... - cmap - polytechnique

1000. 1500. 2000. −3. −2. −1. 0. 1. 2. 3. 4 true. CG recovery. The greedy update steps might choose suboptimal atoms to represent the solution, and/or lead to less parsimonious solutions and/or miss some components p = 2048, m = 512 Gaussian measurements, AWGN 0.01. Motivation. Frank-Wolfe / Conditional Gradient.

5MB Sizes 1 Downloads 238 Views

Recommend Documents

Remarks on Frank-Wolfe and Structural Friends - CMAP, Polytechnique
Outline of Topics. Review of Frank-Wolfe ... Here is a simple computational guarantee: A Computational Guarantee for the Frank-Wolfe algorithm. If the step-size ...

Remarks on Frank-Wolfe and Structural Friends - cmap - polytechnique
P ⊂ Rn is compact and convex f (·) is convex on P let x∗ denote any optimal solution of CP f (·) is differentiable on P it is “easy” to do linear optimization on P for any c : ˜x ← arg min x∈P. {cT x}. Page 5. 5. Topics. Review of FW.

Marginal Inference in MRFs using Frank-Wolfe - CMAP, Polytechnique
Dec 10, 2013 - Curvature + Convergence Rate. Cf = sup x,s∈D;γ∈[0,1];y=x+γ(s−x). 2 γ2. (f (y) − f (x) − 〈y − x,∇f (x)〉). ˜iMAP it it+1. 0. 0.2. 0.4. 0.6. 0.8. 1. 0. 0.1. 0.2. 0.3. 0.4. 0.5. 0.6. 0.7 entropy prob x = 1. December 1

CONDITIONAL MEASURES AND CONDITIONAL EXPECTATION ...
Abstract. The purpose of this paper is to give a clean formulation and proof of Rohlin's Disintegration. Theorem (Rohlin '52). Another (possible) proof can be ...

Causal Conditional Reasoning and Conditional ...
judgments of predictive likelihood leading to a relatively poor fit to the Modus .... Predictive Likelihood. Diagnostic Likelihood. Cummins' Theory. No Prediction. No Prediction. Probability Model. Causal Power (Wc). Full Diagnostic Model. Qualitativ

CONDITIONAL STATEMENTS AND DIRECTIVES
window: 'If you buy more than £200 in electronic goods here in a single purchase, .... defined by the Kolmogorov axioms, usually take as their domain a field of subsets of an ..... The best place to begin is the classic presentation Lewis (1973).

CONDITIONAL STATEMENTS AND DIRECTIVES
always either true or false, but never both; as it is usually put, that they are two-valued. ... window: 'If you buy more than £200 in electronic goods here in a single .... defined by the Kolmogorov axioms, usually take as their domain a field of s

Routability enhancement through unidirectional standard cells with ...
Keywords: Unidirectional cell, Standard cell layout, Floating metal. 1. .... As an example of nine-track cell architecture in Figure 5(a), standard cells are ...

SECURITY ENHANCEMENT WITH FOREGROUND ... - Anirban Basu
a working interconnected system of systems, they are not people-oriented, and they are ... and foreground trust both enhance security for devices and increase the under- standing of .... models should not assume what is 'best' for the user.

Multicarrier CDMA with OFDM as a Data Enhancement Strategy and ...
impulse response of this channel is described using ℎ, = , ,. Where, β k,n(t) is the independent and identically distributed (i.i.d) Rayleigh random variable for kth ...

Decentralized Supervisory Control with Conditional ...
S. Lafortune is with Department of Electrical Engineering and Computer. Science, The University of Michigan, 1301 Beal Avenue, Ann Arbor, MI. 48109–2122, U.S.A. ...... Therefore, ba c can be disabled unconditionally by supervisor. 1 and bc can be .

Speech Recognition with Segmental Conditional Random Fields
learned weights with error back-propagation. To explore the utility .... [6] A. Mohamed, G. Dahl, and G.E. Hinton, “Deep belief networks for phone recognition,” in ...

Multicarrier CDMA with OFDM as a Data Enhancement Strategy and ...
Multicarrier CDMA with OFDM as a Data. Enhancement Strategy and rate matching using. OVSF code. 1 Srilakshmi P.R, 2 Mrs. Sapna B.A. 1 II M.E. Communication systems R.V.S College of Engineering and Technology. 2Asst.professor, R.V.S College of Enginee

disability, status enhancement, personal enhancement ...
This paper was first presented at a conference on Disability and Equal Opportunity at. Bergen in 2006. .... or companies, rather than the tax payer. Those who ..... on what we might call liberal democratic, rather than strict egalitarian, grounds.

Decentralized Supervisory Control with Conditional ...
(e-mail: [email protected]). S. Lafortune is with Department of Electrical Engineering and. Computer Science, The University of Michigan, 1301 Beal Avenue,.

GRADIENT IN SUBALPINE VVETLANDS
º Present address: College of Forest Resources, University of Washington, Seattle .... perimental arenas with alternative prey and sufficient habitat complexity ...... energy gain and predator avoidance under time constraints. American Naturalist ..

jordan-dykstra_pitch-gradient-with-noise-in-a#.pdf
There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. jordan-dykstra_pitch-gradient-with-noise-in-a#.p

Extracting Baseline Electricity Usage with Gradient Tree Boosting - SDM
Nov 15, 2015 - advanced metering infrastructure (AMI) captures electricity consumption in unprecedented spatial and tem- ... behavioral theories, enables 'behavior analytics:' novel insights into patterns of electricity consumption and ... Energy man

Enhancement of electronic transport and ...
Jun 15, 2007 - mable read-only memory), flash memories in computer applications and uncooled infrared imaging systems, became possible due to the ...

Limited Feedback and Sum-Rate Enhancement
Nov 3, 2012 - obtained by considering the effect of feedback overhead on the total throughput of the MIMO IMAC model. I. INTRODUCTION. Interference ...