A Conscience On-line Learning Approach for Kernel-Based Clustering Chang-Dong Wang1 Jian-Huang Lai1 Jun-Yong Zhu2 1 School

of Information Science and Technology Sun Yat-sen University, P. R. China.

2 School

of Mathematics and Computational Science Sun Yat-sen University, P. R. China.

The 10th IEEE International Conference on Data Mining Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

1 / 24

Outline

1

Motivation Kernel Clustering and Problem Previous Work

2

Conscience On-Line Learning The Model Computation

3

Applications Digit Clustering Video Clustering

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

2 / 24

Kernel Clustering Kernel clustering is one of major methods for partitioning nonlinearly separable dataset. 1.5 1 0.5 0 −0.5 −1 −1.5

Wang et al. (Sun Yat-sen University)

−1

−0.5

0

0.5

1

1.5

COLL for Kernel-Based Clustering

2

2.5

ICDM 2010

3 / 24

Kernel Clustering Given X = {x1 , . . . , xn }, a kernel mapping φ, and the number of clusters c, find an assignment function ν ν : X → {1, . . . , c} to minimize the distortion error n X

k φ(xi ) − µνi k2 ,

i=1

where νi is short for ν(xi ), and µk =

1 |ν −1 (k )|

X

φ(xi ),

νi = arg min k φ(xi ) − µk k2

φ(xi )∈ν −1 (k )

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

k =1,...,c

ICDM 2010

4 / 24

Problem

k -means is used to solve min 1

Pn

i=1 k

φ(xi ) − µνi k2 via

Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c

2

Recomputing the prototypes: µk ←

1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )

However, ill-initialization would cause degenerate result.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

5 / 24

Problem

k -means is used to solve min 1

Pn

i=1 k

φ(xi ) − µνi k2 via

Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c

2

Recomputing the prototypes: µk ←

1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )

However, ill-initialization would cause degenerate result.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

5 / 24

Problem

k -means is used to solve min 1

Pn

i=1 k

φ(xi ) − µνi k2 via

Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c

2

Recomputing the prototypes: µk ←

1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )

However, ill-initialization would cause degenerate result.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

5 / 24

Problem

k -means is used to solve min 1

Pn

i=1 k

φ(xi ) − µνi k2 via

Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c

2

Recomputing the prototypes: µk ←

1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )

However, ill-initialization would cause degenerate result.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

5 / 24

Ill-initialization and Degenerate Clustering

μ1 μ1

μ2: degenerate

μ2: ill−initialized μ3

(a) Ill-initialization

μ3

(b) Degenerate results by k -means

Figure: Ill-initialization and degenerate result by k -means.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

6 / 24

Previous Work

1

Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.

2

Lower bound approach. e.g., Zhang et al., ’06.

3

Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.

4

Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.

Computationally expensive.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

7 / 24

Previous Work

1

Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.

2

Lower bound approach. e.g., Zhang et al., ’06.

3

Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.

4

Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.

Computationally expensive.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

7 / 24

Previous Work

1

Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.

2

Lower bound approach. e.g., Zhang et al., ’06.

3

Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.

4

Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.

Computationally expensive.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

7 / 24

Previous Work

1

Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.

2

Lower bound approach. e.g., Zhang et al., ’06.

3

Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.

4

Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.

Computationally expensive.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

7 / 24

Previous Work

1

Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.

2

Lower bound approach. e.g., Zhang et al., ’06.

3

Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.

4

Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.

Computationally expensive.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

7 / 24

Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1

Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c

2

Update the winner:  µνi ← µνi + ηt φ(xi ) − µνi ,

3

Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1

So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

8 / 24

Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1

Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c

2

Update the winner:  µνi ← µνi + ηt φ(xi ) − µνi ,

3

Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1

So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

8 / 24

Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1

Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c

2

Update the winner:  µνi ← µνi + ηt φ(xi ) − µνi ,

3

Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1

So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

8 / 24

Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1

Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c

2

Update the winner:  µνi ← µνi + ηt φ(xi ) − µνi ,

3

Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1

So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

8 / 24

Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1

Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c

2

Update the winner:  µνi ← µνi + ηt φ(xi ) − µνi ,

3

Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1

So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

8 / 24

Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1

Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c

2

Update the winner:  µνi ← µνi + ηt φ(xi ) − µνi ,

3

Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1

So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

8 / 24

Conscience On-Line Learning

μ1 μ1

μ2: ill−initialized

μ2

μ3 μ

3

(a) Ill-initialization

(b) Satisfactory result by COLL

Figure: Ill-initialization and satisfactory result by COLL.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

9 / 24

Conscience On-Line Learning 1 0.9

Winning frequency

0.8 0.7 0.6 0.5 0.4 0.3 0.2 Prototype 1 Prototype 2 Prototype 3

0.1 0 0

20

40

60 80 100 120 140 Competition index over six iterations

160

180

200

Figure: The winning frequencies {f1 , f2 , f3 } of three prototypes as a function of competition index. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

10 / 24

Kernel Matrix

The kernel mapping φ is often unknown or hard to obtain. The feature space Y is characterized by the kernel function κ(x, z) = hφ(x), φ(z)i and corresponding kernel matrix Ki,j = κ(xi , xj ). The distance between two points in the kernel space is computed via kernel trick. For instance, the distance between point φ(xi ) and prototype µk is computed as P P 2 j∈ν −1 (k ) Ki,j h,l∈ν −1 (k ) Kh,l 2 − k φ(xi ) − µk k = Ki,i + |ν −1 (k )|2 |ν −1 (k )| The COLL model must work with only the kernel matrix K .

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

11 / 24

Kernel Matrix

The kernel mapping φ is often unknown or hard to obtain. The feature space Y is characterized by the kernel function κ(x, z) = hφ(x), φ(z)i and corresponding kernel matrix Ki,j = κ(xi , xj ). The distance between two points in the kernel space is computed via kernel trick. For instance, the distance between point φ(xi ) and prototype µk is computed as P P 2 j∈ν −1 (k ) Ki,j h,l∈ν −1 (k ) Kh,l 2 − k φ(xi ) − µk k = Ki,i + |ν −1 (k )|2 |ν −1 (k )| The COLL model must work with only the kernel matrix K .

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

11 / 24

Kernel Matrix

The kernel mapping φ is often unknown or hard to obtain. The feature space Y is characterized by the kernel function κ(x, z) = hφ(x), φ(z)i and corresponding kernel matrix Ki,j = κ(xi , xj ). The distance between two points in the kernel space is computed via kernel trick. For instance, the distance between point φ(xi ) and prototype µk is computed as P P 2 j∈ν −1 (k ) Ki,j h,l∈ν −1 (k ) Kh,l 2 − k φ(xi ) − µk k = Ki,i + |ν −1 (k )|2 |ν −1 (k )| The COLL model must work with only the kernel matrix K .

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

11 / 24

Prototype Descriptor To this end, we develop an efficient framework for computation of the COLL model based on the prototype descriptor.

Definition (Prototype descriptor) A prototype descriptor is a matrix W φ ∈ Rc×(n+1) , such that the k -th row represents prototype µk by Wkφ,i = hµk , φ(xi )i, ∀i = 1, . . . , n, Wkφ,n+1 = hµk , µk i, i.e.,    Wφ =  

hµ1 , φ(x1 )i . . . hµ2 , φ(x1 )i . . . .. .. . . hµc , φ(x1 )i . . .

Wang et al. (Sun Yat-sen University)

hµ1 , φ(xn )i hµ1 , µ1 i hµ2 , φ(xn )i hµ2 , µ2 i .. .. . . hµc , φ(xn )i hµc , µc i

COLL for Kernel-Based Clustering

   . 

ICDM 2010

12 / 24

Prototype Descriptor To this end, we develop an efficient framework for computation of the COLL model based on the prototype descriptor.

Definition (Prototype descriptor) A prototype descriptor is a matrix W φ ∈ Rc×(n+1) , such that the k -th row represents prototype µk by Wkφ,i = hµk , φ(xi )i, ∀i = 1, . . . , n, Wkφ,n+1 = hµk , µk i, i.e.,    Wφ =  

hµ1 , φ(x1 )i . . . hµ2 , φ(x1 )i . . . .. .. . . hµc , φ(x1 )i . . .

Wang et al. (Sun Yat-sen University)

hµ1 , φ(xn )i hµ1 , µ1 i hµ2 , φ(xn )i hµ2 , µ2 i .. .. . . hµc , φ(xn )i hµc , µc i

COLL for Kernel-Based Clustering

   . 

ICDM 2010

12 / 24

Initialization

Theorem (Initialization) The random initialization of prototype descriptor can be realized by φ W:,1:n = AK ,

φ W:,n+1 = diag(AKA> )

c×n where the matrix A = [Ak ,i ]c×n ∈ R+ reflects the initial assignment ν

( Ak ,i =

Wang et al. (Sun Yat-sen University)

1 |ν −1 (k )|

if i ∈ ν −1 (k )

0

otherwise.

COLL for Kernel-Based Clustering

ICDM 2010

13 / 24

Winner Selection & Updating Theorem (Conscience based winner selection) The winning prototype can be selected based on the conscience mechanism as νi = arg min {fk · (Ki,i + Wkφ,n+1 − 2Wkφ,i )}. k =1,...,c

Theorem (On-line winner updating) The winner can be updated in the way of  φ  (1 − ηt )Wνi ,j + ηt Ki,j Wνφi ,j ←   (1 − ηt )2 Wνφi ,j + ηt2 Ki,i + 2(1 − ηt )ηt Wνφi ,i Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

j = 1, . . . , n, j = n + 1.

ICDM 2010

14 / 24

Winner Selection & Updating Theorem (Conscience based winner selection) The winning prototype can be selected based on the conscience mechanism as νi = arg min {fk · (Ki,i + Wkφ,n+1 − 2Wkφ,i )}. k =1,...,c

Theorem (On-line winner updating) The winner can be updated in the way of  φ  (1 − ηt )Wνi ,j + ηt Ki,j Wνφi ,j ←   (1 − ηt )2 Wνφi ,j + ηt2 Ki,i + 2(1 − ηt )ηt Wνφi ,i Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

j = 1, . . . , n, j = n + 1.

ICDM 2010

14 / 24

Iteration Stopping Criterion

Theorem (Iteration stopping criterion) If eφ <  or t > tmax , stop the iteration. φ

e

! 2 mk X mk c X Kπk ,πk X 1 φ 2 h l = 1− W + η t k ,n+1 (1 − ηt )mk (1 − ηt )h+l k =1 k =1 h=1 l=1   φ   m c k Wk ,πk X X 1 l .  1− +2ηt (1 − ηt )mk (1 − ηt )l  c X

k =1

l=1

Here, the array π k stores the indices of mk ordered points assigned to the k-th prototype in one iteration.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

15 / 24

Demonstration of COLL Winner Updating  

φ (ο )  

Nonlinear separator  x ο 

ο

ο  W1,:   ο 



x  Wˆv ,:  

x

ο 



ο

Wv ,:  

φ (ο )  

x′

φ 

φ (ο )  

W1,:φ  

φ (ο )   φ (ο )  



  φ (ο ) Linear separator 

φ (x)   φ (x)   φ (x′)  

φ (x)  Wˆ φ   v ,:

φ (x)   Wvφ,:   φ (x)   φ (x)  

Figure: The mapping φ embeds the data into a feature space where the nonlinear pattern becomes linear. Then COLL is performed in this feature space. The linear separator in feature space is actually nonlinear in input space, so is the update of the winner.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

16 / 24

Digit Clustering Table: Summary of four digit datasets.

Dataset Pendigits Mfeat USPS MNIST

n 10992 2000 11000 5000

c 10 10 10 10

d 16 649 256 784

Balanced × √ √ √

σ 60.53 809.34 1286.70 2018.30

 

 

  Figure: Some samples of MNIST.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

17 / 24

Comparing Convergence Rate 800

500

Kernel k−means The proposed COLL

700 600

φ

Log(eφ)

500 Log(e )

Kernel k−means The proposed COLL

400

400 300 200

300

200

100

100 0 0

10

20

30

40

50 60 Iteration step

70

80

90

0 0

100

10

20

30

(a) Pendigits 800

70

80

90

100

500

500 Log(eφ)

600

400

400

300

300

200

200

100

Kernel k−means The proposed COLL

700

600

φ

Log(e )

50 60 Iteration step

800

Kernel k−means The proposed COLL

700

0 0

40

(b) Mfeat

100 10

20

30

40

50 60 Iteration step

70

80

90

100

0 0

10

20

(c) USPS

30

40

50 60 Iteration step

70

80

90

100

(d) MNIST

Figure: Comparing the convergence rate in terms of log(eφ ) as a function of iteration step. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

18 / 24

Comparing Distortion Error Kernel k−means Global kernel k−means The proposed COLL

7200

Distortion error

Distortion error

1700

7000 6900 6800

1600 1500 1400

6700

1300

6600 6500 2

Kernel k−means Global kernel k−means The proposed COLL

1800

7100

4

6

8

10 12 Number of clusters

14

16

18

1200 2

20

4

6

(a) Pendigits Kernel k−means Global kernel k−means The proposed COLL

8200 8100

10 12 Number of clusters

14

16

18

20

Kernel k−means Global kernel k−means The proposed COLL

3150 3100

8000

3050 Distortion error

Distortion error

8

(b) Mfeat

7900 7800 7700

3000 2950 2900

7600 2850 7500 2800 7400 2

4

6

8

10 12 Number of clusters

14

16

18

20

2750 2

4

6

(c) USPS

8

10 12 Number of clusters

14

16

18

20

(d) MNIST

Figure: Comparing the distortion error as a function of cluster number.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

19 / 24

Comparing using Internal & External Measurement Table: Average distortion error (DE).

Dataset Kk -means Gkk -means Pendigits 6704.0 6664.9 Mfeat 1405.5 1363.3 USPS 8004.7 7782.0 MNIST 3034.3 2894.1

COLL 6619.3 1324.2 7561.0 2754.9

Table: Average normalized mutual information (NMI).

Dataset Kk -means Gkk -means Pendigits 0.715 0.736 Mfeat 0.533 0.542 USPS 0.354 0.369 MNIST 0.441 0.472 Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

COLL 0.753 0.604 0.461 0.520 ICDM 2010

20 / 24

Video Clustering Cluster 1: 0001-0061

Cluster 2: 0062-0163

Cluster 3: 0164-0289

Cluster 4: 0290-0400

Cluster 5: 0401-0504

Cluster 6:0505-699

Cluster 7: 0700-0827

Cluster 8: 0828-1022

Cluster 9: 1023-1083

Cluster 10: 1084-1238

Cluster 11: 1239-1778

Cluster 12: 1779-1983

Cluster 13: 1984-2119

Cluster 14: 2120-2243

Cluster 15: 2244-2400

Cluster 16: 2401-2492

 

Figure: Clustering frames of ANNI002 into 16 scenes using COLL. Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

21 / 24

Comparison Results

Table: The means of NMI and computational time in seconds on the 11 video sequences. Video (#frames) ANNI001 (914) ANNI002 (2492) ANNI003 (4265) ANNI004 (3897) ANNI005 (11361) ANNI006 (16588) ANNI007 (1588) ANNI008 (2773) ANNI009 (12304) ANNI010 (30363) ANNI011 (1987)

Wang et al. (Sun Yat-sen University)

kk -means NMI Time 0.781 72.2 0.705 94.7 0.712 102.2 0.731 98.3 0.645 152.2 0.622 193.0 0.727 81.1 0.749 95.9 0.727 167.0 0.661 257.2 0.738 85.4

gkk -means NMI Time 0.801 94.0 0.721 126.4 0.739 139.2 0.750 121.6 0.656 173.3 0.638 255.5 0.740 136.7 0.771 119.0 0.763 184.4 0.709 426.4 0.749 142.7

COLL for Kernel-Based Clustering

Proposed COLL NMI Time 0.851 70.4 0.741 89.0 0.762 99.5 0.759 93.6 0.680 141.2 0.642 182.3 0.770 79.1 0.794 81.5 0.781 160.4 0.734 249.0 0.785 83.7

ICDM 2010

22 / 24

Summary

1

In this paper, we have shown that, existing method for solving kernel clustering may suffer the ill-initialization problem.

2

We present a conscience on-line learning (COLL) approach for addressing the ill-initialization.

3

In the future, we may explore the on-line learning framework with other mechanisms, such as rival penalization, to realize automatic cluster number selection in the kernel clustering.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

23 / 24

Summary

1

In this paper, we have shown that, existing method for solving kernel clustering may suffer the ill-initialization problem.

2

We present a conscience on-line learning (COLL) approach for addressing the ill-initialization.

3

In the future, we may explore the on-line learning framework with other mechanisms, such as rival penalization, to realize automatic cluster number selection in the kernel clustering.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

23 / 24

Summary

1

In this paper, we have shown that, existing method for solving kernel clustering may suffer the ill-initialization problem.

2

We present a conscience on-line learning (COLL) approach for addressing the ill-initialization.

3

In the future, we may explore the on-line learning framework with other mechanisms, such as rival penalization, to realize automatic cluster number selection in the kernel clustering.

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

23 / 24

Thank you very much! Q&A

Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

24 / 24

A Conscience On-line Learning Approach for Kernel ...

2School of Mathematics and Computational Science. Sun Yat-sen University, P. R. China. The 10th IEEE International Conference on Data Mining. Wang et al.

747KB Sizes 2 Downloads 263 Views

Recommend Documents

A Conscience On-line Learning Approach for Kernel ...
Iterative procedure such as k-means can be used to seek one of the local minima. .... prototypes is characterized by the convergence criterion. = ∑. =1. ∥ − ˆ ∥2 ..... obtain a meaningful comparison, on each dataset, the same value is used f

Conscience online learning: an efficient approach for ... - Springer Link
May 24, 2011 - as computer science, medical science, social science, and economics ...... ics in 2008 and M.Sc. degree in computer science in 2010 from Sun.

A Multiple Operator-valued Kernel Learning Approach ...
not and the amplitude of the finger flexion. Formally, the problem consists in learning a functional depen- dency between a set of d signals and a vector of la- bels and between the same set of signals and vector of real values (the amplitude). This

Hyperparameter Learning for Kernel Embedding ...
We verify our learning algorithm on standard UCI datasets, ... We then employ Rademacher complexity as a data dependent model complexity .... probability of the class label Y being c when the example X is x. ..... Mining, pages 298–307.

Unsupervised multiple kernel learning for ... -
The comparison of these two approaches demonstrates the benefit of our integration ... ity of biological systems, heterogeneous types (continuous data, counts ... profiles to interaction networks by adding network-regularized con- straints with .....

A Cooperative Q-learning Approach for Online Power ...
on a global base (e.g. aggregate femtocell capacity instead of subcarrier based femtocell capacity as in SBDPC-Q). Thus, it makes comparing the performance of the SBDPC-Q algorithm to the global optimal values easier. As SBDPC-Q, FBDPC-Q works in bot

Genetic Programming for Kernel-based Learning with ...
Swap node mutation Exchange a primitive with another of the same arity (prob. ... the EKM, the 5 solutions out of 10 runs with best training error are assessed .... the evolution archive, to estimate the probability for an example to be noisy and.

Kernel Methods for Learning Languages - NYU Computer Science
Dec 28, 2007 - cCourant Institute of Mathematical Sciences,. 251 Mercer Street, New ...... for providing hosting and guidance at the Hebrew University. Thanks also to .... Science, pages 349–364, San Diego, California, June 2007. Springer ...

Kernel Methods for Learning Languages - Research at Google
Dec 28, 2007 - its input labels, and further optimize the result with the application of the. 21 ... for providing hosting and guidance at the Hebrew University.

Kernel and graph: Two approaches for nonlinear competitive learning ...
c Higher Education Press and Springer-Verlag Berlin Heidelberg 2012. Abstract ... ize on-line learning in the kernel space without knowing the explicit kernel ...

Kernel-Based Models for Reinforcement Learning
cal results of Ormnoneit and Sen (2002) imply that, as the sample size grows, for every s ∈ D, the ... 9: until s is terminal. Note that in line 3 we compute the value ...

Multi-task, Multi-Kernel Learning for Estimating Individual Wellbeing
Because our application involves classifying multiple highly related outputs (i.e. the five related measures of wellbeing), we can also benefit from Multi-task ...

Multiple Kernel Learning Captures a Systems-Level Functional ... - PLOS
Dec 31, 2013 - This is an open-access article distributed under the terms of the ... out in the UK. ..... Multi-center, international initiatives for data sharing and.

Learning-Based Approach for Online Lane Change ...
A lane tracker integrated in a passenger ... Warning Assist) radars, and head tracking camera. ..... IEEE Transactions on Intelligent Transportation Systems, vol.

A Semi-supervised Ensemble Learning Approach for ...
The different database collecting initiatives [1], [4], [8] described in the literature ... different abstraction levels for our data representation (see. Fig. 2b). Our first ...

A Discriminative Learning Approach for Orientation ... - Semantic Scholar
180 and 270 degrees because usually the document scan- ning process results in .... features, layout and font or text-printing technology. In Urdu publishing ...