Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Efficient Maximum Margin Clustering via Cutting Plane Algorithm Bin Zhao, Fei Wang, Changshui Zhang Dept. Automation, Tsinghua Univ.

SIAM International Conference on Data Mining Atlanta, GA, USA April 25th, 2008

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Outline

1

Maximum Margin Clustering

2

The Proposed Method

3

Theoretical Analysis

4

Experimental Results

5

Conclusions

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Outline

1

Maximum Margin Clustering

2

The Proposed Method

3

Theoretical Analysis

4

Experimental Results

5

Conclusions

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Support Vector Machine

Given X = {x1 , · · · , xn }, y = (y1 , . . . , yn ) ∈ {−1, +1}n , SVM finds a hyperplane f (x) = wT φ(x) + b by solving minw,b,ξi

n 1 T CX w w+ ξi 2 n

(1)

i=1

s.t.

yi (wT φ(xi ) + b) ≥ 1 − ξi ξi ≥ 0

Bin Zhao, Fei Wang, Changshui Zhang

i = 1, . . . , n

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Maximum Margin Clustering [Xu et. al. 2004]

MMC targets to find not only the optimal hyperplane (w∗ , b∗ ), but also the optimal labeling vector y∗ min

min

y∈{−1,+1}n w,b,ξi

s.t.

n 1 T CX w w+ ξi 2 n

(2)

i=1

yi (wT φ(xi ) + b) ≥ 1 − ξi ξi ≥ 0

Bin Zhao, Fei Wang, Changshui Zhang

i = 1, . . . , n

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Representative Works

Semi-definite programming [Xu et. al. (NIPS 2004)]: several relaxations made, n2 variables in SDP, time complexity O(n7 ) Semi-definite programming [Valizadegan and Jin (NIPS 2006)]: reduce number of variables from n2 to n, time complexity O(n4 ) Alternating optimization [Zhang et. al. (ICML 2007)]

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Outline

1

Maximum Margin Clustering

2

The Proposed Method

3

Theoretical Analysis

4

Experimental Results

5

Conclusions

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Problem Reformulation Theorem Maximum margin clustering is equivalent to min

w,b,ξi

s.t.

n CX 1 T w w+ ξi 2 n

(3)

i=1

|wT φ(xi ) + b| ≥ 1 − ξi ξi ≥ 0

i = 1, . . . , n

where the labeling vector yi = sign(w T φ(xi ) + b).

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Problem Reformulation Theorem Any solution (w∗ , b∗ ) to problem (4) is also Pa solution to problem (3) (and vice versa), with ξ ∗ = n1 ni=1 ξi∗ . min

w,b,ξ≥0

s.t.

1 T w w+Cξ 2 ∀ c ∈ {0, 1}n : n n 1X 1X ci |wT φ(xi )+b| ≥ ci −ξ n n i=1

Bin Zhao, Fei Wang, Changshui Zhang

(4)

i=1

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Problem Reformulation

Number of variables reduced by 2n − 1 Number of constraints increased from n to 2n We can always find a polynomially sized subset of constraints, with which the solution of the relaxed problem fulfills all constraints from problem (4) up to a precision of .

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Cutting Plane Algorithm [J. E. Kelley 1960]

Starts with an empty constraint subset Ω Computes the optimal solution to problem (4) subject to the constraints in Ω Finds the most violated constraint in problem (4) and adds it into the subset Ω Stops when no constraint in (4) is violated by more than  n

n

i=1

i=1

1X 1X ci |wT φ(xi )+b| ≥ ci −(ξ + ) n n

Bin Zhao, Fei Wang, Changshui Zhang

(5)

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

The Most Violated Constraint Theorem The most violated constraint could be computed as follows  1 if |wT φ(xi )+b| < 1 ci = (6) 0 otherwise The feasibility of a constraint is measured by the corresponding value of ξ n n 1X 1X T ci |w φ(xi )+b| ≥ ci −ξ (7) n n i=1

Bin Zhao, Fei Wang, Changshui Zhang

i=1

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Enforcing the Class Balance Constraint Enforce class balance constraint to avoid trivially “optimal” solutions 1 T w w+Cξ 2 n n 1X 1X T s.t. ∀c ∈ Ω : ci |w φ(xi )+b| ≥ ci −ξ n n

min

w,b,ξ≥0

−l ≤

i=1 n  X

(8)

i=1



wTφ(xi ) + b ≤ l

i=1

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

The Constrained Concave-Convex Procedure [A. J. Smola et.al. 2005]

Solve non-convex optimization problem whose objective function could be expressed as a difference of convex functions min

f0 (z) − g0 (z)

s.t.

fi (z) − gi (z) ≤ ci

z

(9) i = 1, . . . , n

where fi and gi are real-valued convex functions on a vector space Z and ci ∈ R for all i = 1, . . . , n.

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

The Constrained Concave-Convex Procedure

Given an initial point z0 , the CCCP computes zt+1 from zt by replacing gi (z) with its first-order Taylor expansion at zt min f0 (z) − T1 {g0 , zt }(z)

(10)

z

s.t. fi (z) − T1 {gi , zt }(z) ≤ ci

Bin Zhao, Fei Wang, Changshui Zhang

i = 1, . . . , n

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Optimization via the CCCP By substituting first-order Taylor expansion into problem (8), we obtain the following quadratic programming (QP) problem: 1 T w w+Cξ 2 s.t. ξ ≥ 0 n   X −l ≤ wTφ(xi ) + b ≤ l

min

w,b,ξ

(11)

i=1 n

n

i=1

i=1

h i 1X 1X ∀c ∈ Ω : ci −ξ − ci sign(wTtφ(xi )+bt ) wTφ(xi )+b ≤ 0 n n

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Outline

1

Maximum Margin Clustering

2

The Proposed Method

3

Theoretical Analysis

4

Experimental Results

5

Conclusions

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Justification of CPMMC

Theorem For any dataset X = (x1 , . . . , xn ) and any  > 0, if (w∗ , b∗ , ξ ∗ ) is the optimal solution to problem (4), then our CPMMC algorithm for maximum margin clustering returns a point (w, b, ξ) for which (w, b, ξ + ) is feasible in problem (4). Moreover, the corresponding objective value is better than the one corresponds to (w∗ , b∗ , ξ ∗ ).

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Time Complexity Analysis

Theorem Each iteration of CPMMC takes time O(sn) for a constant working set size |Ω|. Theorem For any  > 0, C > 0, and any dataset X = {x1 , . . . , xn }, the CPMMC algorithm terminates after adding at most CR 2 constraints, where R is a constant number independent of n and s.

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Time Complexity Analysis

Theorem For any dataset X = {x1 , . . . , xn } with n samples and sparsity of s, and any fixed value of C > 0 and  > 0, the CPMMC algorithm takes time O(sn).

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Outline

1

Maximum Margin Clustering

2

The Proposed Method

3

Theoretical Analysis

4

Experimental Results

5

Conclusions

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Clustering Error Comparison Data Digits 3-8 Digits 1-7 Digits 2-7 Digits 8-9 Ionosphere Letter Satellite Text-1 Text-2 UCI digits MNIST digits1

Size 357 361 356 354 351 1555 2236 1980 1989 1797 70000

KM 5.32± 0 0.55± 0 3.09± 0 9.32± 0 32± 17.9 17.94± 0 4.07± 0 49.47±0 49.62±0 3.62 10.79

NC 35 45 34 48 25 23.2 4.21 6.21 8.65 2.43 10.08

MMC 10 31.25 1.25 3.75 21.25 -

GMMC 5.6 2.2 0.5 16.0 23.5 -

IterSVR 3.36± 0 0.55± 0 0.0± 0 3.67± 0 32.3± 16.6 7.2± 0 3.18± 0 3.18± 0 6.01± 1.82 1.82 7.59

CPMMC 3.08 0.0 0.0 2.26 27.64 5.53 1.52 5.00 3.72 0.62 4.29

1 For UCI digits and MNIST datasets, we give a through comparison by considering all 45 pairs of digits 0- 9. For NC/MMC/GMMC/IterSVR, results on the digits and ionosphere data are simply copied from (Zhang et. al., 2007). Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Speed of CPMMC

Data Digits 3-8 Digits 1-7 Digits 2-7 Digits 8-9 Ionosphere Letter Satellite Text-1 Text-2

KM 0.51 0.54 0.50 0.49 0.07 0.08 0.19 66.09 52.32

NC 0.12 0.13 0.11 0.11 0.12 2.24 5.01 6.04 5.35

Bin Zhao, Fei Wang, Changshui Zhang

GMMC 276.16 289.53 304.81 277.26 273.04 -

IterSVR 19.72 20.49 19.69 19.41 18.86 2133 6490 5844 6099

CPMMC 1.10 0.95 0.75 0.85 0.78 0.87 4.54 19.75 16.16

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Dataset Size n vs. Speed 3

CPU−Time (seconds)

10

Letter Satellite Text−1 Text−2 MNIST−1vs2 MNIST−1vs7 O(n)

2

10

1

10

0

10

−1

10

2

10

3

10 Number of Samples

Bin Zhao, Fei Wang, Changshui Zhang

4

10

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

 vs. Accuracy & Speed

(a)

(b)

0.95

CPU−Time (seconds)

Clustering Accuracy

1

0.9 0.85 0.8 0.75 0.7 −4 10

UCI Digits 3vs8 UCI Digits 1vs7 UCI Digits 2vs7 UCI Digits 8vs9 Letter Text−1 Text−2

UCI Digits 3vs8 UCI Digits 1vs7 UCI Digits 2vs7 UCI Digits 8vs9 Letter Text−1 Text−2

2

10

O(x−0.5)

0

10

−2

−2

10 Epsilon

0

10

Bin Zhao, Fei Wang, Changshui Zhang

10 −4 10

−2

10 Epsilon

0

10

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

C vs. Accuracy & Speed

(a)

(b)

1 CPU−Time (seconds)

Clustering Accuracy

2

0.9

0.8

0.7

0.6 −2 10

UCI 3vs8 UCI 1vs7 UCI 2vs7 UCI 8vs9 Letter Satellite Ionosphere

10

UCI 3vs8 UCI 1vs7 UCI 2vs7 UCI 8vs9 Letter Satellite Ionosphere O(x)

0

10

−2

0

10 C

2

10

Bin Zhao, Fei Wang, Changshui Zhang

10 −2 10

0

10 C

2

10

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Outline

1

Maximum Margin Clustering

2

The Proposed Method

3

Theoretical Analysis

4

Experimental Results

5

Conclusions

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Conclusions

No loss in clustering accuracy Major improvement on speed O(ns) Handle large real-world datasets efficiently

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Maximum Margin Clustering The Proposed Method Theoretical Analysis Experimental Results Conclusions

Thanks for Listening MATLAB code available at http://binzhao02.googlepages.com

Bin Zhao, Fei Wang, Changshui Zhang

Efficient Maximum Margin Clustering via Cutting Plane Algorithm

Efficient Maximum Margin Clustering via Cutting Plane ...

Apr 25, 2008 - Automation, Tsinghua Univ. SIAM International Conference on Data Mining ..... Handle large real-world datasets efficiently. Bin Zhao, Fei Wang, ...

306KB Sizes 0 Downloads 279 Views

Recommend Documents

Efficient Maximum Margin Clustering via Cutting Plane ...
where the data samples X are mapped into a high ... kernel trick, this mapping could be done implicitly. ...... document vectors in a 8014-dimensional space.

Efficient Maximum Margin Clustering via Cutting Plane ...
retrieval and text mining, web analysis, marketing, com- putational biology, and ... nology (TNList), Department of Automation, Tsinghua Univer- sity, Beijing, China. ..... and (w, b, ξi) are optimal solutions to problem (3.7) and. (3.9) respectivel

Efficient Maximum Margin Clustering via Cutting Plane ...
Consequently, the algorithms proposed in [19] and [18] can only handle very small datasets containing several hundreds of samples. In real world applications such as scientific information retrieval and text mining, web analysis, and compu- tational

M3IC: Maximum Margin Multiple Instance Clustering
a practical perspective, clustering plays an outstanding role in data mining applications such as information retrieval, text mining, Web analysis, marketing, computational biology, and many others [Han and Kamber, 2001]. However, so far, almost all

Maximum Margin Embedding
is formulated as an integer programming problem and we .... rate a suitable orthogonality constraint such that the r-th ..... 5.2 Visualization Capability of MME.

Solution: maximum margin structured learning
Structured Learning for Cell Tracking. Xinghua Lou ... Machine learning for tracking: • Local learning: fail .... Comparison: a simple model with only distance and.

Unsupervised Maximum Margin Feature Selection ... - Semantic Scholar
Department of Automation, Tsinghua University, Beijing, China. ‡Department of .... programming problem and we propose a cutting plane al- gorithm to ...

Maximum Margin Supervised Topic Models - Semantic Scholar
efficient than existing supervised topic models, especially for classification. Keywords: supervised topic models, max-margin learning, maximum entropy discrimi- nation, latent Dirichlet allocation, support vector machines. 1. Introduction. Probabili

Achieving Anonymity via Clustering - Stanford CS Theory
2Department of Computer Science, Stanford University,. Stanford, CA .... year with a maximum of 100 years. In this ... clustering with minimum cluster size r = 2, applied to the table in .... the clause nodes uj have degree at most 3 and cannot be.

Achieving anonymity via clustering - Research at Google
[email protected]; S. Khuller, Computer Science Department, Unversity of Maryland, .... have at least r points.1 Publishing the cluster centers instead of the individual ... with a maximum of 1000 miles, while the attribute age may differ by a

MedLDA: Maximum Margin Supervised Topic ... - Research at Google
Tsinghua National Lab for Information Science and Technology. Department of Computer Science and Technology. Tsinghua University. Beijing, 100084, China.

Boosting Margin Based Distance Functions for Clustering
Under review by the International Conference ... ing the clustering solutions considered to those that com- ...... Enhancing image and video retrieval: Learning.

Unsupervised Maximum Margin Feature Selection with ...
p XLXT vp. (14). s.t. ∀i ∈ {1,...,n},r ∈ {1,...,M} : d. ∑ k=1. (vyik−vrk)xik ..... American Mathematical. Society, 1997. 3. [4] C. Constantinopoulos, M. Titsias, and A.

Efficient k-Anonymization using Clustering Techniques
ferred to as micro-data. requirements of data. A recent approach addressing data privacy relies on the notion of k-anonymity [26, 30]. In this approach, data pri- vacy is guaranteed by ensuring .... types of attributes: publicly known attributes (i.e

Efficient Distributed Approximation Algorithms via ...
a distributed algorithm for computing LE lists on a weighted graph with time complexity O(S log n), where S is a graph .... a node is free as long as computation time is polynomial in n. Our focus is on the ...... Given the hierarchical clustering, o

Efficient Subspace Segmentation via Quadratic ...
Abstract. We explore in this paper efficient algorithmic solutions to ro- bust subspace ..... Call Algorithm 1 to solve Problem (3), return Z,. 2. Adjacency matrix: W ..... Conference on Computer Vision and Pattern Recognition. Tron, R., and Vidal, .

Efficient Subspace Segmentation via Quadratic ...
tition data drawn from multiple subspaces into multiple clus- ters. ... clustering (SCC) and low-rank representation (LRR), SSQP ...... Visual Motion, 179 –186.

Efficient FDTD algorithm for plane-wave simulation for ...
propose an algorithm that uses a finite-difference time-domain ..... velocity is on the free surface; in grid type 2, the vertical component is on the free surface. ..... 50 Hz. The model consists of a 100-m-thick attenuative layer of QP. = 50 and QS