Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons Jingu Kim and Haesun Park Georgia Tech 2008.12.16 2008 Eighth IEEE International Conference on Data Mining (ICDM’08) Pisa, Italy

Jingu Kim and Haesun Park (Georgia Tech)

1 / 35

Outline ❖ Outline Introduction

GOAL: present a new algorithm for NMF and provide related experimental evidences about computational efficiency

Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results

1. Introduction 2. Algorithms for NMF

Summary

3. Block principal pivoting algorithm 4. Comparison results 5. Summary

Jingu Kim and Haesun Park (Georgia Tech)

2 / 35

❖ Outline Introduction ❖ Nonnegative Matrix Factorization ❖ NMF Formulation Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results

Introduction

Summary

Jingu Kim and Haesun Park (Georgia Tech)

3 / 35

Nonnegative Matrix Factorization [Paatero and Tapper, 1994, Lee and Seung, 1999]

❖ Outline Introduction ❖ Nonnegative Matrix Factorization

● Given a matrix A ∈ Rm×n with nonnegative elements and a

desired rank k, find W ∈ Rm×k and H ∈ Rk×n such that

❖ NMF Formulation Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results Summary

A ≈ WH where W and H have nonnegative elements only. ● Nonnegativity constraints are often physically meaningful and

provide natural interpretation: additive linear combinations of nonnegative parts. Successful applications include: ✦ ✦

Pixels in digital image [Lee and Seung, 1999, Li et al., 2001]

✦ ✦

Term-document matrix for text analysis [Xu et al., 2003, Pauca et al., 2004]

Bioinformatics - microarray data analysis [Brunet et al., 2004, H. Kim and Park, 2007] and many more. See references in [Devarajan, 2008]

Speech and audio processing

[Behnke, 2003, Smaragdis and Brown, 2003]

✦ ···

Jingu Kim and Haesun Park (Georgia Tech)

4 / 35

NMF Formulation ❖ Outline Introduction ❖ Nonnegative Matrix Factorization

● Formulation: how to assert A ≈ W H

✦ Minimize the Frobenious norm

❖ NMF Formulation

min kA − W Hk2F s.t.W ≥ 0, H ≥ 0 W,H

Algorithms for NMF and preparation Block principal pivoting algorithm



Alternative formulation that minimizes KL-divergence min D(A||W H) s.t.W ≥ 0, H ≥ 0  X Aij where D(A||B) = Aij log − Aij + Bij B ij ij

Comparison results

W,H

Summary

● Better Approximation vs. Better Representation/Interpretation



SVD: Better Approximation → min kA − W Hk2F



NMF: Better Representation/Interpretation → minkA − W Hk2F where W ≥ 0 and H ≥ 0

Jingu Kim and Haesun Park (Georgia Tech)

5 / 35

❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares

Algorithms for NMF and preparation

❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

6 / 35

Algorithms for NMF ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm

● Given a matrix A ∈ Rm×n with nonnegative elements and a

desired rank k, min kA − W Hk2F , s.t. W ≥ 0 and H ≥ 0.

W,H

✦ Non-convex optimization ˆ = W D, H ˆ = D−1 H). ✦ W and H are not unique (think of W ● Algorithms developed

✦ Multiplicative update rules [Lee and Seung, 2001] ✦ Alternating Least Squares (ALS) [Berry et al., 2007] ✦ Alternating Nonnegative Least Squares (ANLS) [Paatero and Tapper, 1994]

Comparison results ■ Summary

Several algorithms using this framework:

[Lin, 2007, Kim et al., 2007, H.

Kim and Park, 2008]



Other algorithms and variants: [Li et al., 2001, Hoyer, 2004, Pauca et al., 2004, Gao and Church, 2005, Chu and Lin, 2008]· · ·

Jingu Kim and Haesun Park (Georgia Tech)

7 / 35

Previous algorithms and drawbacks ❖ Outline Introduction

● Multiplicative Updating Rules: [Lee and Seung, 2001]

Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results

Hqj

(W T A)qj (AH T )iq ← Hqj and Wiq ← Wiq ((W T W )H))qj (W (HH T ))iq



Under this updating, the distance kA − W Hk2F is monotonically decreasing.



Simple implementation, but a monotonically decreasing property may not imply the convergence to a stationary point [Gonzalez and Zhang, 2005].

● Alternating Least Squares (ALS) [Berry et al., 2007]

T T

T 2

✦ Fix H and solve for W in min H W − A F , and set all negative elements in W to 0.

✦ Fix W and solve for H in min kW H − Ak2F , and set all negative elements in H to 0.

Summary



No claim is made for the convergence to a stationary point.

● →Alternating Nonnegative Least Squares (ANLS) Jingu Kim and Haesun Park (Georgia Tech)

8 / 35

Alternating Nonnegative Least Squares [Paatero and Tapper, 1994]

❖ Outline Introduction Algorithms for NMF and preparation

1. Initialize W (or H) with non-negative values. 2. Iterate the following ANLS until convergence: (a) Fixing W , solve minH≥0 kW H − Ak2F

T T T 2

(b) Fixing H, solve minW ≥0 H W − A F 3. The columns of W are normalized to unit L2 -norm

❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares

● Block coordinate descent method in bound-constrained optimization

❖ NMF/ANLS Algorithms

● Convergence analysis

❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

✦ No matter how many blocks, if the sub problems have unique solutions, then the limit point of the sequence is a stationary point [Bertsekas, 1999]

✦ For two block problems, any limit point of the sequence is a stationary point [Grippo and Sciandrone, 2000]

✦ It is important to find an optimal solution of 2-(a),(b) at each iteration! ● It remains to provide the algorithm for solving subproblems in 2-(a),(b). How to design a fast algorithm for this?

Jingu Kim and Haesun Park (Georgia Tech)

9 / 35

NMF/ANLS Algorithms 2

Problem to solve : min kCX − BkF

❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

X≥0

● Active Set [H. Kim and Park, 2008]

✦ Classical algorithm for  NNLS with single right hand side minh≥0 kW h − ak2 is an active set algorithm by [Lawson and Hanson, 1995].

✦ Faster algorithms for multiple right hand side problems by [Bro and Jong, 1997],

and [Van Benthem and Keenan, 2004].

● Projected Gradient [Lin, 2007] xk+1 ← P+ (xk − αk ∇f (xk ))

✦ Improved selection of step constant αk ● Projected Quasi-Newton [Kim et al., 2007]  k   i  h k k ¯ y = P+ y − αD ∇f (y ) xk+1 ← z k 0

✦ Gradient scaling only for inactive variables

Jingu Kim and Haesun Park (Georgia Tech)

10 / 35

Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation

● Recognizing the long and thin structure is very important for

developing a fast algorithm for NMF.

❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

11 / 35

Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares

● Recognizing the long and thin structure is very important for

developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F

❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

11 / 35

Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares

● Recognizing the long and thin structure is very important for

developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F

❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

11 / 35

Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares

● Recognizing the long and thin structure is very important for

developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F

❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

T T

T 2

● minW ≥0 H W − A F

Jingu Kim and Haesun Park (Georgia Tech)

11 / 35

Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares

● Recognizing the long and thin structure is very important for

developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F

❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary

T T

T 2

● minW ≥0 H W − A F

Jingu Kim and Haesun Park (Georgia Tech)

11 / 35

❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case

Block principal pivoting algorithm

❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

12 / 35

Block principal pivoting algorithm [Portugal et al., 1994]

❖ Outline Introduction

● Consider single right-hand side problem: for x ∈ Rq min kCx − bk22

Algorithms for NMF and preparation Block principal pivoting algorithm

x≥0

● KKT condition for (1):

❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions

(1)

y

=

C T Cx − C T b

(2a)

y



0

(2b)

x



0

(2c)

xi yi

=

0, i = 1, · · · , q

(2d)

● Find x and y that satisfy (2). ● Repeat:

Comparison results Summary

✦ Guess two index sets F and G that partition {1, · · · , q} ✦ Force xG = 0 and yF = 0. Solve xF = arg minxF kCF xF − bk22 T and set yG = CG (CF xF − b).

✦ Check if xF ≥ 0 and yG ≥ 0, optimal values are found. Otherwise, update F and G. Jingu Kim and Haesun Park (Georgia Tech)

13 / 35

How block principal pivoting works ❖ Outline Introduction

T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

14 / 35

How block principal pivoting works ❖ Outline Introduction

T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

15 / 35

How block principal pivoting works ❖ Outline Introduction

T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

16 / 35

How block principal pivoting works ❖ Outline Introduction

T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

17 / 35

How block principal pivoting works ❖ Outline Introduction

T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

18 / 35

How block principal pivoting works ❖ Outline Introduction

T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

19 / 35

How block principal pivoting works ❖ Outline Introduction

T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

20 / 35

Refining exchange rules ❖ Outline Introduction Algorithms for NMF and preparation

● Previous example: block exchange rule. One can also

exchange only subset of infeasible variables. ✦ Exchange only one variable → single principal pivoting

Block principal pivoting algorithm

✦ Exchange several variables → block principal pivoting

❖ Block principal pivoting algorithm

● Active set algorithm is a special instance of single principal

❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

pivoting algorithm. ● Block exchange rule is not always safe.

✦ The residual is not guaranteed to monotonically decrease. Block exchange rule may lead to a cycle and fail to find an optimal solution (although it occurs rarely).

✦ Modification: if the block exchange rule fails to decrease the number of feasible variables, use a backup exchange rule

✦ With this modification, block principal pivoting algorithm finds the solution of NNLS in finite number of iterations.

[Portugal et al.,

1994]

Jingu Kim and Haesun Park (Georgia Tech)

21 / 35

Multiple right-hand side case min kCX − Bk2F

❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm

X≥0

● It is possible to seperately solve for each column of X. →SLOW ● Two improvements [Bro and de Jong, 1997, Van Benthem and Keenan, 2004]

✦ Precompute C T C and C T B: updates of xF and yG is given by

❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case

CFT CF xF

=

CFT b

yG

=

T T CG CF xF − CG b.

All coefficients can be directly retrieved from C T C and C T B!

✦ Exploiting common F and G sets.

❖ Extensions Comparison results Summary

● Let us see why these improvements are effective for our problem. Jingu Kim and Haesun Park (Georgia Tech)

22 / 35

Multiple right-hand side case min kCX − Bk2F

❖ Outline Introduction

X≥0

● Remind the long and thin structure.

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

23 / 35

Multiple right-hand side case min kCX − Bk2F

❖ Outline Introduction

X≥0

● Remind the long and thin structure.

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

23 / 35

Multiple right-hand side case min kCX − Bk2F

❖ Outline Introduction

X≥0

● Remind the long and thin structure.

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case



❖ Extensions Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

23 / 35

Multiple right-hand side case min kCX − Bk2F

❖ Outline Introduction

X≥0

● Remind the long and thin structure.

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions

→ ✦ C T C and C T B is small. → Storage is not a problem.

Comparison results Summary

Jingu Kim and Haesun Park (Georgia Tech)

23 / 35

Multiple right-hand side case min kCX − Bk2F

❖ Outline Introduction

X≥0

● Remind the long and thin structure.

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results

→ ✦ C T C and C T B is small. → Storage is not a problem. ✦ X is flat and wide. → More common cases of F and G sets.

Summary

Jingu Kim and Haesun Park (Georgia Tech)

23 / 35

Multiple right-hand side case min kCX − Bk2F

❖ Outline Introduction

X≥0

● Remind the long and thin structure.

Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results

→ ✦ C T C and C T B is small. → Storage is not a problem. ✦ X is flat and wide. → More common cases of F and G sets.

Summary

● This completes the description of our algorithm for NMF: ANLS framework + Block principal pivoting algorithm with improvements for multiple right-hand sides

Jingu Kim and Haesun Park (Georgia Tech)

23 / 35

Extensions ❖ Outline Introduction

● As other ANLS algorithms, easily extended to other formulations. ● Sparse NMF [H. Kim and Park, 2007]:

Algorithms for NMF and preparation

min

Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results

W,H

  

kA − W Hk2F + η kW k2F + β

n X

j=1

kH(:, j)k21

subject to ∀ij, Wij , Hij ≥ 0. ANLS reformulation [H. Kim and Park, 2007]: alternate the followings

   

2

W A

H− 0 min √

βe1×k 1×n H≥0 F



  

2 T H T A

√ min W −

ηI 0 k

W ≥0

k×m

 

(3)



F

● Similar reformulation for regularized NMF: [Pauca et al., 2006]

Summary

n

min kA −

W,H

W Hk2F

+

α kW k2F

+

β kHk2F

o

(4)

subject to ∀ij, Wij , Hij ≥ 0.

Jingu Kim and Haesun Park (Georgia Tech)

24 / 35

❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results ❖ Experimental Setup

Comparison results

❖ Synthetic dataset ❖ Text dataset ❖ Image dataset Summary

Jingu Kim and Haesun Park (Georgia Tech)

25 / 35

Experimental Setup ● Stopping criterion: normalized KKT optimality condition as defined in [H. Kim ❖ Outline

and Park,2007]

∆ ≤ ∆0 , where ∆ =

Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results ❖ Experimental Setup ❖ Synthetic dataset ❖ Text dataset ❖ Image dataset Summary

δW

δ + δH

● Datasets



Synthetic: 300 × 200, create sparse W and H and produce A = W H with noise



Text: Topic Detection and Tracking 2, randomly select 20 topics, 12617 × 1491



Image: Olivetti Research Laboratory face image, 10304 × 400.

● Compared algorithms

✦ ✦ ✦ ✦ ✦ ✦ ✦

(mult) Lee and Seung’s multiplicative updating algorithm (als) Berry et al.’s alternating least squares algorithm (lsqnonneg) ANLS with Lawson and Hanson’s algorithm (projnewton) ANLS with Kim et al.’s projected quasi-Newton algorithm (projgrad) ANLS with Lin’s projected gradient algorithm (activeset) ANLS with Kim and Park’s active set algorithm (blockpivot) ANLS with block principal pivoting algorithm which is proposed in this paper

Jingu Kim and Haesun Park (Georgia Tech)

26 / 35

Synthetic dataset time (sec)

k

multi

als

lsqnonneg

projnewton

projgrad

activeset

blockpivot

5

35.336

36.697

23.188

5.756

0.976

0.262

0.252

10

47.132

52.325

82.619

13.43

4.157

0.848

0.786

20

72.888

83.232

45.007

9.32

4.41

4.004

127.33

62.317

17.252

14.384

40

81.445

22.246

16.132

60

128.76

37.376

21.368

80

276.29

65.566

30.055

30

iterations

5

9784.2

10000

25.6

25.8

30

26.4

26.4

10

10000

10000

34.8

35.2

45

35.2

35.2

20

10000

10000

70.8

104

69.8

69.8

166

205.2

166.6

166.6

40

234.8

118

117.8

60

157.8

84.2

84.2

80

131.8

67.2

67.2

30

residual

5

0.04035

0.04043

0.04035

0.04035

0.04035

0.04035

0.04035

10

0.04345

0.04379

0.04343

0.04343

0.04344

0.04343

0.04343

20

0.04603

0.04556

0.04412

0.04414

0.04412

0.04412

0.04313

0.04316

0.04327

0.04327

40

0.04944

0.04943

0.04944

60

0.04106

0.04063

0.04063

80

0.03411

0.03390

0.03390

30

size 300 × 200,  = 10−4 . Average of 10 executions with different initial values. Jingu Kim and Haesun Park (Georgia Tech)

27 / 35

Text dataset k

projgrad

activeset

blockpivot

5

107.24

81.476

82.954

10

131.12

87.012

88.728

Introduction

20

161.56

154.1

144.77

Algorithms for NMF and preparation

30

355.28

314.78

234.61

40

618.1

753.92

479.49

Block principal pivoting algorithm

50

1299.6

1333.4

741.7

60

1616.05

2405.76

1041.78

time (sec) ❖ Outline

Comparison results

5

66.2

60.6

60.6

❖ Experimental Setup

iterations

10

51.8

42

42

❖ Synthetic dataset

20

45.8

44.6

44.6

❖ Text dataset

30

100.6

67.2

67.2

❖ Image dataset

40

118

103.2

103.2

Summary

50

120.4

126.4

126.4

60

154.2

171.4

172.6

5

0.9547

0.9547

0.9547

10

0.9233

0.9229

0.9229

20

0.8898

0.8899

0.8899

30

0.8724

0.8727

0.8727

40

0.8600

0.8597

0.8597

50

0.8490

0.8488

0.8488

60

0.8386

0.8387

0.8387

residual

size 12617 × 1491,  = 10−4 . Average of 10 executions with different initial values. Jingu Kim and Haesun Park (Georgia Tech)

28 / 35

Image dataset k

projgrad

activeset

blockpivot

16

68.529

11.751

11.998

25

124.05

25.675

22.305

Introduction

36

109.1

53.528

35.249

Algorithms for NMF and preparation

49

150.49

115.54

57.85

64

169.7

270.64

91.035

Block principal pivoting algorithm

81

249.45

545.94

146.76

time (sec) ❖ Outline

16

26.8

16.4

16.4

Comparison results

25

20.6

15

15

❖ Experimental Setup

36

17.6

13.4

13.4

❖ Synthetic dataset

49

16.2

12.4

12.4

❖ Text dataset

64

16.6

13.2

13.2

❖ Image dataset

81

16.8

14.4

14.4

16

0.1905

0.1907

0.1907

25

0.1757

0.1751

0.1751

36

0.1630

0.1622

0.1622

49

0.1524

0.1514

0.1514

64

0.1429

0.1417

0.1417

81

0.1343

0.1329

0.1329

Summary

iterations

residual

size 10304 × 400,  = 5 × 10

Jingu Kim and Haesun Park (Georgia Tech)

−4

. Average of 10 executions with different initial values.

29 / 35

❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results Summary ❖ Summary

Summary

❖ References

Jingu Kim and Haesun Park (Georgia Tech)

30 / 35

Summary ❖ Outline Introduction

● A new algorithm for NMF is proposed:

ANLS framework + Block principal pivoting algorithm with improvements for multiple right-hand sides

Algorithms for NMF and preparation Block principal pivoting algorithm



Important observation: long and thin structure



Inherits good convergence property of ANLS framework



Extentions for sparse/regularized NMF



Outperform other algorithms in computational experiments



Source code will become available

Comparison results Summary ❖ Summary ❖ References

Jingu Kim and Haesun Park (Georgia Tech)

31 / 35

Comparison by tolerance 1500 Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results Summary

avg. elapsed (seconds)

❖ Outline

activeset blockpivot projgrad

1000

500

❖ Summary ❖ References

0

−2

10

−4

10 tolerance

−6

10

12617 × 1491 text dataset. Average of 10 executions with different initial values.

Jingu Kim and Haesun Park (Georgia Tech)

32 / 35

Stopping Criterion ● KKT condition: ❖ Outline

W ≥0 ∂f (W, H)/∂W ≥ 0 W. ∗ (∂f (W, H)/∂W ) = 0

Introduction Algorithms for NMF and preparation

● These conditions can be simplified as

Block principal pivoting algorithm Comparison results Summary ❖ Summary

H ≥0 ∂f (W, H)/∂H ≥ 0 H. ∗ (∂f (W, H)/∂H) = 0

min (W, ∂f (W, H)/∂W )

=

0

(5a)

min (H, ∂f (W, H)/∂H)

=

0

(5b)

where the minimum is taken component wise [Gonzalez and Zhang, 2005].

● Normalized KKT residual: ∆=

❖ References

where

δ δW + δH

(6)

m X k X δ= min(Wiq , (∂f (W, H)/∂W )iq i=1 q=1

n k X X + min(Hqj , (∂f (W, H)/∂H)qj

(7)

q=1 j=1

δW =# (min(W, (∂f (W, H)/∂W ) 6= 0)

(8)

δH =# (min(H, (∂f (W, H)/∂H) 6= 0) .

(9)

● Convergence criterion:∆ ≤ ∆0 where ∆0 was computed using the initial values.

Jingu Kim and Haesun Park (Georgia Tech)

33 / 35

References ● ❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm

● ● ● ●

Comparison results Summary ❖ Summary ❖ References

● ● ● ● ● ● ●

S. Behnke. Discovering hierarchical speech features using convolutional non-negative matrix factorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 2758–2763, 2003 D. P. Bertsekas. Nonlinear programming. Athena Scientific, Belmont, Mass, 1999 R. Bro and S. D. Jong. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics, 11:393–401, 1997 J. Brunet, P. Tamayo, T. Golub, and J. Mesirov. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences, 101(12):4164–4169, 2004 M. T. Chu and M. M. Lin. Low-dimensional polytope approximation and its applications to nonnegative matrix factorization. SIAM Journal on Scientific Computing, 30(3):1131–1155, 2008. K. Devarajan. Nonnegative matrix factorization: An analytical and interpretive tool in computational biology. PLoS Computational Biology, 4(7), 2008 Y. Gao and G. Church. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 21(21):3970–3975, 2005 E. F. Gonzalez and Y. Zhang. Accelerating the lee-seung algorithm for non-negative matrix factorization. Technical report, Tech Report, Department of Computational and Applied Mathematics, Rice University, 2005 P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. The Journal of Machine Learning Research, 5:1457–1469, 2004 D. Kim, S. Sra, and I. S. Dhillon. Fast newton-type methods for the least squares nonnegative matrix approximation problem. In Proceedings of the 2007 SIAM International Conference on Data Mining, 2007 H. Kim and H. Park. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics, 23(12): 1495–1502, 2007 H. Kim and H. Park. Non-negative matrix factorization based on alternating non-negativity constrained least squares and active set method. SIAM Journal in Matrix Analysis and Applications, to appear

Jingu Kim and Haesun Park (Georgia Tech)

34 / 35

References ● ❖ Outline



Introduction Algorithms for NMF and preparation Block principal pivoting algorithm

● ●

Comparison results



Summary



❖ Summary ❖ References

● ● ● ● ● ●

C. L. Lawson and R. J. Hanson. Solving Least Squares Problems. Society for Industrial Mathematics, 1995 D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13, pages 556–562. MIT Press, 2001 D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999 S. Z. Li, X. Hou, H. Zhang, and Q. Cheng. Learning spatially localized, parts-based representation. In CVPR ’01: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001 C.-J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10): 2756–2779, 2007 P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(1):111–126, 1994 V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons. Text mining using non-negative matrix factorizations. In Proceedings of the 2004 SIAM International Conference on Data Mining, 2004 V. P. Pauca, J. Piper, and R. J. Plemmons. Nonnegative matrix factorization for spectral data analysis. Linear Algebra and Its Applications, 416(1):29–47, 2006 L. F. Portugal, J. J. Judice, and L. N. Vicente. A comparison of block pivoting and interior-point algorithms for linear least squares problems with nonnegative variables. Mathematics of Computation, 63(208):625–643, 1994 P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. In Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on., pages 177–180, 2003 M. H. V. Benthem and M. R. Keenan. Fast algorithm for the solution of large-scale non-negativity-constrained least squares problems. Journal of Chemometrics, 18:441–450, 2004. W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267–273, New York, NY, USA, 2003. ACM Press.

Jingu Kim and Haesun Park (Georgia Tech)

35 / 35

Toward Faster Nonnegative Matrix Factorization: A New ...

Dec 16, 2008 - Nonlinear programming. Athena Scientific ... Proceedings of the National Academy of Sciences, 101(12):4164–4169, 2004 ... CVPR '01: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and.

518KB Sizes 2 Downloads 293 Views

Recommend Documents

Toward Faster Nonnegative Matrix Factorization: A New Algorithm and ...
College of Computing, Georgia Institute of Technology. Atlanta, GA ..... Otherwise, a complementary ba- ...... In Advances in Neural Information Pro- cessing ...

NONNEGATIVE MATRIX FACTORIZATION AND SPATIAL ...
ABSTRACT. We address the problem of blind audio source separation in the under-determined and convolutive case. The contribution of each source to the mixture channels in the time-frequency domain is modeled by a zero-mean Gaussian random vector with

Joint Weighted Nonnegative Matrix Factorization for Mining ...
Joint Weighted Nonnegative Matrix Factorization for Mining Attributed Graphs.pdf. Joint Weighted Nonnegative Matrix Factorization for Mining Attributed Graphs.

FAST NONNEGATIVE MATRIX FACTORIZATION
FAST NONNEGATIVE MATRIX FACTORIZATION: AN. ACTIVE-SET-LIKE METHOD AND COMPARISONS∗. JINGU KIM† AND HAESUN PARK†. Abstract. Nonnegative matrix factorization (NMF) is a dimension reduction method that has been widely used for numerous application

Nonnegative Matrix Factorization Clustering on Multiple ...
points on different manifolds, which can diffuse information across manifolds ... taking the multiple manifold structure information into con- sideration. ..... Technology. Lee, D. D. ... Turlach, B. A.; Venablesy, W. N.; and Wright, S. J. 2005. Simu

On Constrained Sparse Matrix Factorization
Institute of Automation, CAS. Beijing ... can provide a platform for discussion of the impacts of different .... The contribution of CSMF is to provide a platform for.

On Constrained Sparse Matrix Factorization
given. Finally conclusion is provided in Section 5. 2. Constrained sparse matrix factorization. 2.1. A general framework. Suppose given the data matrix X=(x1, …

Non-Negative Matrix Factorization Algorithms ... - Semantic Scholar
Keywords—matrix factorization, blind source separation, multiplicative update rule, signal dependent noise, EMG, ... parameters defining the distribution, e.g., one related to. E(Dij), to be W C, and let the rest of the parameters in the .... contr

Gene Selection via Matrix Factorization
From the machine learning perspective, gene selection is just a feature selection ..... Let ¯X be the DDS of the feature set X, and R be the cluster representative ...

Focused Matrix Factorization For Audience ... - Research at Google
campaigns to perform audience retrieval for the given target campaign. ... systems in the following way. ... users' preferences in the target campaign, which we call focus ...... In Proceedings of the 15th ACM SIGKDD international conference on.

Sparse Additive Matrix Factorization for Robust PCA ...
a low-rank one by imposing sparsity on its singular values, and its robust variant further ...... is very efficient: it takes less than 0.05 sec on a laptop to segment a 192 × 144 grey ... gave good separation, while 'LE'-SAMF failed in several fram

Group Matrix Factorization for Scalable Topic Modeling
Aug 16, 2012 - ing about 3 million documents, show that GRLSI and GNMF can greatly improve ... Categories and Subject Descriptors: H.3.1 [Information Storage ... A document is viewed as a bag of terms generated from a mixture of latent top- ics. Many

HGMF: Hierarchical Group Matrix Factorization for ...
Nov 7, 2014 - In the experiments, we study the effec- tiveness of our HGMF for both rating prediction and item recommendation, and find that it is better than some state- of-the-art methods on several real-world data sets. Categories and Subject Desc

Similarity-based Clustering by Left-Stochastic Matrix Factorization
Figure 1: Illustration of conditions for uniqueness of the LSD clustering for the case k = 3 and for an LSDable K. ...... 3D face recognition using Euclidean integral invariants signa- ture. Proc. ... U. von Luxburg. A tutorial on spectral clustering

Semi-Supervised Clustering via Matrix Factorization
Feb 18, 2008 - ∗Department of Automation, Tsinghua University. †School of Computer ...... Computer Science Programming. C14. Entertainment. Music. C15.

Similarity-based Clustering by Left-Stochastic Matrix Factorization
Journal of Machine Learning Research 14 (2013) 1715-1746 ..... Figure 1: Illustration of conditions for uniqueness of the LSD clustering for the case k = 3 and.

non-negative matrix factorization on kernels
vm. 1. 2. 3. . . m. 1. 2. 3. 4. 5. 6. 7. 8 . . . n. •Attribute values represented as an n x m matrix: V=[v. 1. , v. 2. , …, v m. ] •Each column represents one of the m objects.

low-rank matrix factorization for deep neural network ...
of output targets to achieve good performance, the majority of these parameters are in the final ... recognition, the best performance with CNNs can be achieved when matching the number of ..... guage models,” Tech. Rep. RC 24671, IBM ...

Chapter 18 Toward a New World-View
The inductive, experimental method of modern science was formalized by Rene Descartes. ___ 4. Organized religion's responses to science in the late-sixteenth and early-seventeenth centuries was characterized by hostility in some countries, but neutra

Social Business A Step Toward Creating a New ...
Dec 9, 2009 - came up with some ideas for making it easier for the poor people to repay ... now branching out to Omaha, Nebraska and San Francisco, California. ..... Nursing Colleges as social business to train girls from Grameen Bank.

Faster and Better Global Placement by a New ...
Jun 17, 2005 - republish, to post on servers or to redistribute to lists, requires prior specific permission ...... http://ballade.cs.ucla.edu/ pubbench/placement.

Toward a new supermarket layout: from industrial ... - Semantic Scholar
Keywords: Data mining, market basket analysis, retailing, store layout. .... These new layout applications do not take the one stop shop phenomenon into ...