Front. Electr. Electron. Eng. 2012, 7(1): 134–146 DOI 10.1007/s11460-012-0159-1

Jianhuang LAI, Changdong WANG

Kernel and graph: Two approaches for nonlinear competitive learning clustering

c Higher Education Press and Springer-Verlag Berlin Heidelberg 2012 

Abstract Competitive learning has attracted a significant amount of attention in the past decades in the field of data clustering. In this paper, we will present two works done by our group which address the nonlinearly separable problem suffered by the classical competitive learning clustering algorithms. They are kernel competitive learning (KCL) and graph-based multiprototype competitive learning (GMPCL), respectively. In KCL, data points are first mapped from the input data space into a high-dimensional kernel space where the nonlinearly separable pattern becomes linear one. Then the classical competitive learning is performed in this kernel space to generate a cluster structure. To realize on-line learning in the kernel space without knowing the explicit kernel mapping, we propose a prototype descriptor, each row of which represents a prototype by the inner products between the prototype and data points as well as the squared length of the prototype. In GMPCL, a graph-based method is employed to produce an initial, coarse clustering. After that, a multi-prototype competitive learning is introduced to refine the coarse clustering and discover clusters of an arbitrary shape. In the multi-prototype competitive learning, to generate cluster boundaries of arbitrary shapes, each cluster is represented by multiple prototypes, whose subregions of the Voronoi diagram together approximately characterize one cluster of an arbitrary shape. Moreover, we introduce some extensions of these two approaches with experiments demonstrating their effectiveness. Keywords competitive learning, clustering, nonlinearly separable, kernel, graph

Received September 25, 2011; accepted November 16, 2011 Jianhuang LAI , Changdong WANG School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China E-mail: [email protected]

1 Introduction Competitive learning has received a significant amount of attention in the past decades [1,2]. Due to its adaptive on-line learning, it has been widely applied in the fields of data clustering [3,4], vector quantization [5], RBF net learning [6], shape detection [7,8], discretevalued source separation [9], Markov model identification [10], component analysis [11], scheduling [12], etc. Among them, clustering analysis is still one of the most active fields. Despite significant success of competitive learning in data clustering, most competitive learning clustering algorithms assume that data are linearly separable, which is unfortunately not always held in realworld applications [13]. The square distance based winner selection only generates a list of convex regions, each for one cluster. These convex regions can separate linear clusters (i.e., clusters with convex boundaries) as shown in Fig. 1(a); however, they fail to identify nonlinear clusters (i.e., clusters with concave boundaries) as shown in Fig. 1(b). Note that real-world data do not always have convex boundaries. This motivates the development of nonlinear competitive learning to solve this problem. In this paper, two approaches are presented to realize nonlinear competitive learning clustering, one of which is from the viewpoint of kernel clustering and the other is based on the graph method. Kernel method is one of the most popular methodologies for nonlinear clustering [13]. By mapping data points from the input data space into a linearly separable kernel space via a kernel mapping, the nonlinear clustering can be easily realized. Recently, a few kernel versions of competitive learning [14–16] have been developed. However, these works are confined to some special cases. For instance, two methods in Refs. [14,15] mapped the data into normalization-invariant kernel space (i.e., on a unit hypersphere  φ(x) 2 = 1) before performing clustering. The constraint of normalizationinvariant kernel helps avoid the computational challenge

Jianhuang LAI et al.

Kernel and graph: Two approaches for nonlinear competitive learning clustering

135

Fig. 1 Linear limitation of the classical competitive learning clustering algorithm. The straight lines are linear separators which form the convex regions. The prototypes are marked as “×”. Points assigned to different clusters are plotted in different colors. The convex regions can successfully identify linear clusters in (a); however, they fail to identify nonlinear clusters of two moons dataset in (b).

of on-line learning without explicit mapping φ. Unfortunately, most kernels do not satisfy this constraint such as polynomial kernel κ(x, z) = (x, z + α)β , all-subsets d + xi zi ) and ANOVA kernel kernel κ(x, z) = i=1 (1  p κ(x, z) = 1i1 <...
tween the corresponding nodes. Then the notion of useful links between data points or the eigenstructure of the proximity matrix is used to generate clusters of an arbitrary shape. For instance, in the shared nearest neighbor (SNN) clustering method [19], the weight between two data points (nodes) is computed as the number of shared neighbors between nodes given that the nodes are connected (i.e., having useful link). The core-points are defined according to some prespecified parameters M inP ts and Eps and used to obtain clusters of arbitrary shapes. In spectral clustering [20], the clustering problem is transferred to a graph partitioning problem where the goal is to obtain the prespecified number of components of the graph with minimizing some objective function such as normalized cut. To realize this, the eigenstructure of the proximity matrix is obtained, where the eigenvectors with the smallest eigenvalues are used to construct a partitioning. Each component obtained corresponds to a major structure of the graph, such that these components (i.e., clusters) can be of arbitrary shapes revealing the real structures. However, this type of method needs to prespecify either some parameters (e.g., M inP ts and Eps in SNN) or the number of real clusters (e.g., in spectral clustering). Based on the graph method, we propose a novel nonlinear clustering algorithm named graph-based multiprototype competitive learning (GMPCL). This algorithm employs a graph-based method to generate an initial, coarse clustering. Multi-prototype competitive learning is then performed to refine the clustering and identify clusters of an arbitrary shape. Therefore, the proposed approach consists of two phases, namely, graph-based initial clustering and multi-prototype competitive learning. In the multi-prototype competitive learning, to generate cluster boundaries of arbitrary shapes, each cluster is represented by multiple prototypes, whose subregions of the Voronoi diagram together

136

Front. Electr. Electron. Eng. 2012, 7(1): 134–146

approximately characterize one cluster of an arbitrary shape. The remainder of the paper is organized as follows. Section 2 introduces the classical competitive learning clustering algorithm with describing its linear clustering limitation. The KCL algorithm, as well as its extension, is described in Section 3. In Section 4, we describe the GMPCL algorithm. We conclude our paper in Section 5.

Algorithm 1

Classical Competitive Learning (CCL)

Input: dataset X = {xi : i = 1, 2, . . . , n}, c, {ηt }, , tmax . Output: clusters {Ck : k = 1, 2, . . . , c}. Randomly initialize {μk : k = 1, 2, . . . , c}, set t = 0. repeat ˆ k } before this iteration, Save the current prototypes {μ ˆ k } = {μk }, i.e., {μ set t = t + 1, randomly re-sort {xi : i = 1, 2, . . . , n}. for i = 1, 2, . . . , n do

2 Classical competitive learning

Select the winning prototype μνi of xi by Eq. (2).

Given a dataset X = {x1 , x2 , . . . , xn } of n data points in a space Rd and the number of clusters c, the goal of classical competitive learning (CCL) is to find c optimal : k = 1, 2, . . . , c} by minimizing the prototypes {μopt k objective function J(μ1 , μ2 , . . . , μc ) =

n 

2

 xi − μνi 

(1)

i=1

with νi denoting the cluster label of xi as computed by Eq. (2), such that an optimal set of clusters {C1 , C2 , . . . , Cc : xi ∈ Cνi , i = 1, 2, . . . , n} can be generated. To this end, the first step is to randomly initialize prototypes {μk : k = 1, 2, . . . , c}, which are iteratively updated as follows. In the tth iteration, for each randomly taken data point xi ∈ X , select a winning prototype μνi via 2

νi = arg mink=1,2,...,c  xi − μk  ,

(2)

and update the winner with learning rate ηt ∈ (0, 1) μνi ← μνi + ηt (xi − μνi ).

(3)

The procedure continues until either all prototypes converge or the number of iterations reaches a prespecified value tmax . Algorithm 1 summaries the CCL method. To find the globally optimal clustering, the crucial components of CCL include the following four aspects. 1) Random initialization of prototypes. A rational choice is to initialize prototypes in the way of randomly weighted linear combination of data points. That is, μk =

n 

Ak,i xi , ∀k = 1, 2, . . . , c,

(4)

i=1

where A is a randomly generated weight matrix, ⎞ ⎛ A1,1 A1,2 · · · A1,n ⎟ ⎜ ⎜ A2,1 A2,2 · · · A2,n ⎟ ⎟ ⎜ (5) A=⎜ . .. .. ⎟ ⎜ .. . . ⎟ ⎠ ⎝ Ac,1 Ac,2 · · · Ac,n n with each row summing to 1, i.e., i=1 Ak,i = 1, ∀k = 1, 2, . . . , c.

Update μνi with learning rate ηt by Eq. (3). end for Pc

until

k=1

ˆ k 2   or t  tmax  μk − μ

Assign xi to the cluster Cνi , where νi is computed by Eq. (2), ∀i = 1, 2, . . . , n.

2) Competition to select the winning prototype corresponding to the input object by Eq. (2). 3) Learning to update the winner by Eq. (3) with learning rate {ηt } satisfying conditions [21]: lim ηt = 0,

t→∞

∞  t=1

ηt = ∞,

∞ 

ηt2 < ∞.

(6)

t=1

A practical choice of ηt is ηt = const/t, where “const” is some small constant, i.e., 1. c ˆ k 2 4) Iteration stopping criteria, i.e., k=1  μk − μ   or t  tmax . It can be shown [21] that competitive learning given by the four components does indeed converge to the global optimum with probability one. Proposition 1 Given appropriate initialization, CCL converges probability one, to the global optimum with opt opt , μ , . . . , μ } = 1, i.e., P {μ1 , μ2 , . . . , μc } = {μopt c 1 2 opt opt opt where {μ1 , μ2 , . . . , μc } = arg minJ(μ1 , μ2 , . . . , μc ). The initialization problems such as estimating the number of clusters and allocating their initial prototypes can be effectively handled by various mechanisms. For instance, adding conscience to CCL could avoid the neuron underutilization [22], and employing rival penalization can help get rid of redundant prototypes [3]. Thus, the initialization condition for the convergence of CCL is easily satisfied. However, from the rule for selecting a winning prototype, i.e., Eq. (2), a list of convex regions have been generated, each for one cluster, as shown in Fig. 1. Thus, it is well-known that CCL has a limitation to only linearly separable datasets. In our work, two approaches are presented to realize nonlinear competitive learning clustering: one from the viewpoint of kernel clustering and the other based on the graph method.

Jianhuang LAI et al.

Kernel and graph: Two approaches for nonlinear competitive learning clustering

3 Kernel competitive learning 3.1 Problem formulation By mapping the unlabelled dataset X = {x1 , x2 , . . . , xn } of n data points from Rd into a kernel space Y via a mapping φ, we wish to find an assignment of each data point to one of c clusters, such that in the kernel space Y, the within-cluster similarity is high and the between-cluster one is low. That is, we seek a map ν : X → {1, 2, . . . , c}

3.2 Kernel competitive learning

ν = arg min

 φ(xi ) − φ(xj ) 2

i,j:νi =νj



−λ



ν

  φ(xi ) − φ(xj ) 2 ,

(8)

i,j:νi =νj

where λ > 0 is some parameter, and we use the short notation νi = ν(xi ). In this paper, ν −1 (k) denotes the indices of all the data points that are assigned to the kth cluster. Theorem 1 The optimization criterion Eq. (8) is equivalent to the criterion ν = arg min ν

n 

 φ(xi ) − μνi 2 ,

(9)

i=1

where μk is the mean of data points assigned to cluster k, 1 −1 |ν (k)|



φ(xi ), ∀k = 1, 2, . . . , c,

(10)

i∈ν −1 (k)

and νi satisfies νi = arg

In practice, the mapping function φ is often not known or hard to obtain, and the dimensionality of Y is quite high. The feature space Y is characterized by the kernel function κ and corresponding kernel matrix K [13]. Definition 1 A kernel is a function κ, such that κ(x, z) = φ(x), φ(z) for all x, z ∈ X , where φ is a mapping from X to an (inner product) feature space Y. A kernel matrix is a square matrix K ∈ Rn×n such that Ki,j = κ(xi , xj ) for some x1 , x2 , . . . , xn ∈ X and some kernel function κ. Thus, for an efficient approach, the computation procedure using only the kernel matrix is also required.

(7)

to optimize [13]

μk =

137

min

k=1,2,...,c

 φ(xi ) − μk 2 , ∀i = 1, 2, . . . , n.

(11) proof See Ref. [13].  Thus, the goal of kernel clustering is to solve the optimization problem in Eq. (9). The objective term n 

 φ(xi ) − μνi 2

(12)

i=1

is known as the distortion error [21]. Ideally, all possible assignments of the data into clusters should be tested and the best one with smallest distortion error selected. This procedure is unfortunately computationally infeasible in even a very small dataset, since the number of all possible partitions of a dataset grows exponentially with the number of data points. Hence, efficient algorithms are required.

For efficiently performing on-line learning in kernel space with only kernel matrix K, we proposed an On-Line Learning (OLL) framework [23,24], which is based on a new prototype representation termed prototype descriptor W φ . The rows of W φ represent prototypes as the inner products between a prototype and the feature space images of data points, as well as the squared length of the prototype [23,24]. Definition 2 (Prototype descriptor) A prototype descriptor is a real-valued matrix W φ , such that the kth row represents prototype μk by φ φ = μk , φ(xi ), ∀i = 1, 2, . . . , n, Wk,n+1 = μk , μk , Wk,i (13) i.e.,



μ1 , φ(x1 ) · · · μ1 , φ(xn ) μ1 , μ1 



⎟ ⎜ ⎜ μ2 , φ(x1 ) · · · μ2 , φ(xn ) μ2 , μ2  ⎟ ⎟ ⎜ W =⎜ ⎟ . (14) .. .. .. ⎟ ⎜ . . . ⎠ ⎝ μc , φ(x1 ) · · · μc , φ(xn ) μc , μc  φ

With this definition, the four components of on-line learning in kernel space can be realized as follows [23,24]. Theorem 2 (Initialization) The random initialization can be realized by W φ:,1:n = AK,

W φ:,n+1 = diag(AKAT ),

(15)

where diag(M ) denotes the main diagonal of a matrix M and the positive matrix A = [Ak,i ]c×n has the form Ak,i =

⎧ ⎨

1 , |ν −1 (k)| ⎩ 0,

if i ∈ ν −1 (k),

(16)

otherwise.

That is, the matrix A reflects the initial assignment ν. Proof Assume the assignment is randomly initialized as ν. Substitute the computation of the prototypes Eq.

138

Front. Electr. Electron. Eng. 2012, 7(1): 134–146

(10) to the definition of W φk,: in Eq. (13), we get

´ νi , μ ´ νi  j = n + 1 Wνφi ,j ← μ = μνi + ηt (φ(xi ) − μνi ), μνi + ηt (φ(xi ) − μνi )

φ Wk,i = μk , φ(xi ) ∀i = 1, 2, . . . , n   j∈ν −1 (k) φ(xj ) , φ(xi ) = |ν −1 (k)|  1 K = −1 (k)| j,i |ν −1 j∈ν

=

n 

= (1 − ηt )2 μνi , μνi  + ηt2 φ(xi ), φ(xi ) +2(1 − ηt )ηt μνi , φ(xi ).

(k)

Ak,j Kj,i

j=1

= Ak,: K :,i , φ Wk,n+1

(17)

= μk , μk  

h∈ν −1 (k) φ(xh ) , |ν −1 (k)|

=



=



h∈ν −1 (k) l∈ν −1 (k)

=

n  n 



l∈ν −1 (k) φ(xl ) |ν −1 (k)|



1 Kh,l |ν −1 (k)|2

Ak,h Ak,l Kh,l

h=1 l=1

= Ak,: KAT k,: .

(18)

Thus, we obtain the initialization of W φ as Eq. (15). The proof is finished.  Theorem 3 (Winner selection rule) The winner selection rule can be realized by φ φ − 2Wk,i }. νi = arg mink=1,2,...,c {Ki,i + Wk,n+1

(19)

Proof Consider the winner selection rule, i.e., Eq. (2), one can get νi = arg = arg

min { φ(xi ) − μk 2 }

k=1,2,...,c

φ φ min {(Ki,i + Wk,n+1 − 2Wk,i )}. (20)

k=1,2,...,c

Thus, we get the formula required.  Theorem 4 (On-line winner updating rule) The on-line winner updating rule can be realized by ⎧ ⎪ (1 − ηt )Wνφi ,j + ηt Ki,j , if j = 1, 2, . . . , n, ⎪ ⎪ ⎪ ⎨ Wνφi ,j ← φ 2 2 ⎪ ⎪ ⎪ (1 − ηt ) Wνi ,j + ηt Ki,i ⎪ ⎩ +2(1 − η )η W φ , if j = n + 1. t t νi ,i (21) Proof Although we do not know exactly the expression of μνi , however, we can simply take μνi as a symbol of ´ νi . Subthis prototype and denote its updated one as μ stitute the on-line winner updating rule Eq. (3) to the winning prototype W φνi ,: , we have ´ νi , φ(xj ) ∀j = 1, 2, . . . , n Wνφi ,j ← μ = μνi + ηt (φ(xi ) − μνi ), φ(xj ) = (1 − ηt )μνi , φ(xj ) + ηt φ(xi ), φ(xj ), (22)

(23)

Then we get the on-line winner updating rule as Eq. (21).  It is a bit complicated to compute the convergence criterion without explicit expression of {μ1 , μ2 , . . . , μc }. Notice that, in one iteration, each point φ(xi ) is assigned to one and only one winning prototype. Let k ] store the indices of mk orarray π k = [π1k , π2k , . . . , πm k dered points assigned to the kth prototype in one iteration. For instance, if φ(x1 ), φ(x32 ), φ(x8 ), φ(x20 ), φ(x15 ) are 5 ordered points assigned to the 2nd prototype in the tth iteration, then the index array of the 2nd 2 ] = [1, 32, 8, 20, 15] prototype is π 2 = [π12 , π22 , . . . , πm 2 2 2 2 2 with π1 = 1, π2 = 32, π3 = 8, π4 = 20, π52 = 15 and m2 = 5. The following lemma formulates the cumulative update of the kth prototype based on the array k ]. πk = [π1k , π2k , . . . , πm k Lemma 1 In the tth iteration, the relationship beˆ k is tween the updated prototype μk and the old μ ˆk + μk = (1 − ηt )mk μ

mk 

(1 − ηt )mk −l ηt φ(xπlk ), (24)

l=1 k [π1k , π2k , . . . , πm ] k

stores the indices of where array π = mk ordered points assigned to the kth prototype in this iteration. Proof To prove this relationship, we use the Principle of Mathematical Induction. One can easily verify that Eq. (24) is true for mk = 1 directly from Eq. (3),   ˆ k + ηt φ(xπ1k ) − μ ˆk μk = μ k

1

ˆk + = (1 − ηt ) μ

1 

(1 − ηt )0 ηt φ(xπlk ).

(25)

l=1

Assume that it is true for mk = m, that is, for the first m ordered points, ˆk + μk = (1 − ηt )m μ

m 

(1 − ηt )m−l ηt φ(xπlk ).

(26)

l=1

Then for mk = m + 1, i.e., the (m + 1)th point, from Eq. (3) we have   k ) − μk μk = μk + ηt φ(xπm k

= (1 − ηt )μk + ηt φ(xπm k ) k  ˆk + = (1 − ηt ) (1 − ηt ) μ m

m 

 (1 − ηt )

m−l

ηt φ(xπlk )

l=1

+ηt φ(xπm ) k k

ˆk + = (1 − ηt )m+1 μ

m+1  l=1

(1 − ηt )m+1−l ηt φ(xπlk ). (27)

Jianhuang LAI et al.

Kernel and graph: Two approaches for nonlinear competitive learning clustering

This expression shows that Eq. (24) is true for mk = m + 1. Therefore, by mathematical induction, it is true  for all positive integers mk . Theorem 5 (Convergence criterion) The convergence criterion can be computed by   2 c  1 φ φ 1− Wk,n+1 e = (1 − ηt )mk k=1

+ηt2

mk  mk c   k=1 h=1 l=1

+2ηt

c 



Kπhk ,πlk

k=1

1 (1 − ηt )mk

μk − ηt (1 − ηt )mk

Substitute it to eφ =

c k=1

 mk W φ k,π k l

l=1

(1 − ηt )l

⎠. (28)

mk  l=1

φ(xπlk )

(1 − ηt

)l

.

via Eq. (15), set t = 0. repeat Set t = t + 1, get random permutation {I1 , I2 , . . . ,

Select the winning prototype νi of the ith sample (i = Il ) by Eq. (19), and append i to the νi th index array. Update the winner, i.e., the νi th row of W φ with learning rate ηt by Eq. (21). Compute eφ via Eq. (28). until eφ   or t  tmax Assign xi to the cluster Cνi , where νi is computed by

k=1

mk  mk c   φ(xπk ), φ(xπk ) l

(1 − ηt )h+l k=1 h=1 l=1  mk c   μk , φ(xπlk ) 1 1− +2ηt . m (1 − ηt ) k (1 − ηt )l k=1

¯ c×(n+1) Randomly initialize prototype descriptor W φ ∈ R

end for

(29)

ˆ k 2 , we have  μk − μ

h

Output: clusters {Ck : k = 1, 2, . . . , c}.

for l = 1, 2, . . . , n do

 2  mk c    φ(xπlk )  μk   φ − ηt e = μk −   (1 − ηt )mk (1 − ηt )l  k=1 l=1 2 c   1 1− = μk , μk  (1 − ηt )mk +ηt2

Input: kernel matrix K ∈ Rn×n , c, {ηt }, , tmax .

empty index arrays.



ˆ k can be reProof According to Lemma 1, the old μ tained from the updated μk as ˆk = μ

Kernel Competitive Learning (KCL)

In |Ii ∈ {1, 2, . . . , n}, Ii = Ij , i = j}, and initialize c

(1 − ηt )h+l

 ⎝ 1−

Algorithm 2

139

l=1

(30) Thus, eφ can be computed by Eq. (28). This ends the proof.  Algorithm 2 summarizes the schedule of KCL, while Fig. 2 illustrates its key procedure. Figure 3 demonstrates the nonlinear separation ability of KCL in clustering the nonlinearly separable two moons dataset. A satisfactory clustering result is generated.

Eq. (19), ∀i = 1, 2, . . . , n.

3.3 Extension of KCL and experiment evaluation Apart from the capability of partitioning nonlinearly separable datasets, another advantage of KCL lies in its potential of utilizing various mechanisms to eliminate the sensitivity in ill-initialization and to automatically estimate the cluster number. For instance, by adopting the conscience mechanism, which controls the winning prototype selection via the winning frequency, the underutilized problem [22] can be easily tackled. By utilizing rival penalization, which eliminates redundant prototypes via not only moving the winning prototype towards the input but also delearning the rival at a smaller delearning rate, it is easy to realize nonlinear clustering and automatic cluster number estimate simultaneously. In our work [23,24], the conscience mechanism is adopted to tackle the under-utilized problem suffered from ill-initialization. The key idea is to introduce the

Fig. 2 Illustration of KCL. The function φ embeds the data into a feature space where the original nonlinear pattern becomes linear. Then competitive learning is performed in this linear feature space. The new inputs are denoted in blue x and φ(x ) while the red arrow plots the update of the winning prototype, i.e., W ν,: and W φ ν,: . Note that, the linear separator in feature space is actually nonlinear in input space, so is the update of the winner.

140

Front. Electr. Electron. Eng. 2012, 7(1): 134–146

Fig. 3 Demonstration of nonlinear separation ability of KCL. The original two moons dataset is shown in (a), and the nonlinear clustering results obtained by KCL are shown in (b).

winning frequency of each prototype, and the winning prototype νi is selected by taking the winning frequency into consideration for each input data point φ(x). That is, φ φ − 2Wk,i )}. (31) νi = arg mink=1,2,...,c {fk (Ki,i + Wk,n+1

The winning frequencies {f1 , f2 , . . . , fc } are updated as follows: nνi ← nνi + 1, fk = nk /

c 

nl , ∀k = 1, 2, . . . , c, (32)

errors among the compared methods. Especially, it outperforms the latest development of kernel k-means, i.e., global kernel k-means on all the four datasets. It should be pointed out that, ideally, the globally minimal distortion error, e.g., the one obtained by testing all the possible assignments ν with the best one selected, should have monotonically decreased as the number of clusters increases. However, since all the compared methods are just heuristic methods that can achieve local optima, it is not necessary to be monotonically decreasing.

l=1

where nk denotes the cumulative winning number of the kth prototype. This approach is called conscience on-line learning (COLL) in our previous work [23,24]. To demonstrate the effectiveness of COLL in kernelbased clustering, we select four widely tested digit datasets, including Pen-based recognition of handwritten digit dataset (Pendigits), Multi-feature digit dataset (Mfeat), USPS [25] and MNIST [26]. The first two datasets are from UCI repository [27]. Table 1 summarizes the properties of the four datasets, as well as the σ used in constructing the Gaussian kernel matrix. Note that appropriate kernel selection is out of the scope of this paper, and we only try the Gaussian kernel with some appropriate σ value according to the distance matrix X = [ xi − xj ]n×n . Table 1 Summary of digit datasets. n is the number of data points; c is the number of classes; d is the dimensionality; “balanced” means whether all classes are of the same size. The σ is fixed for all compared kernel based methods. dataset Pendigits

n

c

d

balanced

10992

10

16

× √

Mfeat

2000

10

649

USPS

11000

10

256

5000

10

784

MNIST

√ √

σ 60.53 809.34 1286.70 2018.30

Figure 4 plots the distortion error as a function of the preselected number of clusters obtained by COLL, kernel k-means [17] and global kernel k-means [28]. It is obvious that, COLL achieves the smallest distortion

4 Graph-based multi-prototype competitive learning The proposed GMPCL approach consists of two phases, namely, graph-based initial clustering (Subsection 4.1) and multi-prototype competitive learning (Subsection 4.2) [29]. 4.1 Graph-based initial clustering Given a dataset X = {x1 , x2 , . . . , xn } of n points in Rd , the first step of the graph-based algorithm is to construct a graph Ge = (V, A, e). The vertex set V contains one node for each sample in X . The affinity matrix A = [Aij ]n×n is defined as ⎧ 2 ⎪ ⎨ exp(− xi − xj  ), Aij  if xi ∈ Nk (xj ) ∧ xj ∈ Nk (xi ), (33) ⎪ ⎩ 0, otherwise, where Nk (xi ) denotes the set consisting of k nearest neighbors of xi . The vertex energy vector e = [e1 , e2 , . . . , en ]T is defined as  ei  log2 1 +

 j

Aij 

maxl=1,2,...,n

j

 Alj

, i = 1, 2, . . . , n. (34)

Jianhuang LAI et al.

Kernel and graph: Two approaches for nonlinear competitive learning clustering

141

Fig. 4 The distortion error as a function of the preselected number of clusters obtained by COLL, kernel k-means and global kernel k-means. The proposed COLL achieves the smallest distortion errors on the four digit datasets.

The component ei ∈ [0, 1] is the vertex energy of xi , which measures how “important” xi is. Figure 5(a) shows a two moons dataset, and Fig. 5(b) plots its vertex energy. In Ref. [30], the density is measured simply by the number of points within the neighborhood of one point. However, the proposed vertex energy defined in Eq. (34) takes into account the correlations between all data points, which results in a global estimate of the vertex energy. Although both of them can be used to discover arbitrarily shaped clusters, the proposed vertex energy is more suitable for datasets containing clusters of differing densities. A possible limitation is that in an extremely unbalanced dataset, the presence of a cluster that is dense and contains a large number of points will hinder the detection of smaller ones due to the global estimate of the vertex energy. This is a problem to be addressed in our future research. A subset S consisting of the vertices of higher energy is obtained (e.g., Fig. 5(c)), which is termed core point set. Definition 3 Given a graph Ge = (V, A, e) and a per-

centage ρ, the core point set S is defined as S  {xi |ei  ζ} with ζ ∈ [0, 1] such that |S|/|V| = ρ. The core-point-connectivity of any two core points p and q in S is defined as follows. Definition 4 (Core-point-connectivity) Two core points p and q in S are core-point-connected w.r.t. k (denoted as p Sk q) if there exists a chain of core points p1 , p2 , . . . , pm , p1 = p, pm = q such that   pi+1 ∈ Nk (pi ) S and pi ∈ Nk (pi+1 ) S. From the viewpoint of density-based clustering [19,30], the core-point-connectivity separates S into some natural subgroups, as shown in Fig. 5(d), which are defined as connected components as follows. Definition 5 A set of c connected components {I1 , I2 , . . . , Ic } is obtained by separating the core point c  set S w.r.t. k, such that ∀i = j, Ii Ij = ∅, S = i=1 Ii , and ∀p, q ∈ Ii , p Sk q, while ∀p ∈ Ii , ∀q ∈ Ij , i = j, p Sk q does not hold. The connected components {I1 , I2 , . . . , Ic } are taken as initial clusters which will be further refined via multiprototype competitive learning.

142

Front. Electr. Electron. Eng. 2012, 7(1): 134–146

Fig. 5 GMPCL: Two moons example. (a) The original dataset, different classes are plotted in different markers; (b) the vertex energy, the color and size of each point is marked according to its vertex energy; (c) the subset S comprising vertices of high energy (core points); (d) the k-NN graph of S with k = 12.

4.2 Multi-prototype competitive learning The initial clusters {I1 , I2 , . . . , Ic } obtained in the first phase take into account only data points of higher energy, and the remaining data points are not assigned with cluster labels. Therefore, the output of the first phase is only a coarse clustering that requires further refinement. Rather than directly assigning the unlabeled data points to the core points as in Ref. [19], this section employs classical competitive learning to refine the initial clustering and assign cluster labels to all data points. Experimental results show that the proposed approach can obtain at least 9.8% improvement over the direct assignment. Since the dataset is nonlinearly separable, a nonlinear cluster with concave boundaries would always exist, which cannot be characterized by a single prototype that produces convex boundaries [31]. However, multiple prototypes produce subregions of the Voronoi diagram which can approximately characterize one cluster of an arbitrary shape. Therefore, we represent each cluster by multiple prototypes. Every point in Ij can be taken as one of the initial prototypes representing the jth cluster Cj . But there is no need of using so many prototypes to represent one cluster, and some of them are more appropriate and more effective than others. These points should be as

few as possible to lower the computational complexity of multi-prototype competitive learning, meanwhile be scattered in the whole space of the initial cluster in order to suitably characterize the corresponding cluster. Affinity propagation [32] can generate suitable prototypes to represent an input dataset without preselecting the number of prototypes. In our work, the representative points are obtained by applying affinity propagation to each Ij . The similarity s(xi , xi ) between xi , xi ∈ Ij is set to −  xi − xi 2 , and the preferences are set to the median of the similarities, which outputs pj suitable p multi-prototypes μ1j , . . . , μj j . In this way, we obtain an initial multi-prototype set W = {μ11 , . . . , μp11 , μ12 , . . . , μp22 , . . . , μ1c , . . . , μpc c }. (35)   !   !   ! represent C1

represent C2

represent Cc

Throughout the paper, we use the index notation ωjq to denote the multi-prototype μqj . That is, referring to the ωjq th multi-prototype is equivalent to mentioning μqj , and ω = {ω11 , . . . , ω1p1 , ω21 , . . . , ω2p2 , . . . , ωc1 , . . . , ωcpc }. After the initial multi-prototype set W is obtained, classical competitive learning is performed to iteratively update the multi-prototypes such that the multiprototype objective function is minimized: J(W) =

n  i=1

 xi − μυνii 2 ,

(36)

Jianhuang LAI et al.

Kernel and graph: Two approaches for nonlinear competitive learning clustering

143

2

where μυνii satisfies ωνυii = arg minωjq ∈ω  xi − μqj  , i.e., the winning multi-prototype of xi that is nearest to xi . For each randomly taken xi , the winning multi-prototype ωνυii is selected via the winner selection rule: ωνυii

= arg min

ωjq ∈ω

 xi −

μqj

2

 ,

(37)

Graph-Based Multi-Prototype Competitive Learning (GMPCL)

Input: X = {x1 , x2 , . . . , xn }, ρ, tmax , {ηt }, . Construct a graph Ge = (V, A, e) and initialize a clustering {I1 , I2 , . . . , Ic }. Initialize a multi-prototype set W, set t = 0.

and is updated by the winner update rule: μυνii ← μυνii + ηt (xi − μυνii )

Algorithm 3

repeat

(38)

with learning rates {ηt } satisfying [21]: limt→∞ ηt = 0, ∞ 2 ∞ t=1 ηt = ∞, t=1 ηt < ∞. In practice, ηt = const/t, where “const” is a small constant, e.g., 0.5. Figures 6(a) and 6(b) illustrate the procedure of updating the winning multi-prototype. The converged multi-prototype set W and the corresponding Voronoi diagram are shown in Fig. 6(c). The multi-prototypes representing different clusters are plotted in different markers. The piecewise linear separator consists of hyperplanes shared by two subregions which are induced by the multi-prototypes representing different clusters. This piecewise linear separator is used to identify nonlinearly separable clusters, as shown in Fig. 6(d). Algorithm 3 summarizes the proposed graph-based multi-prototype competitive learning (GMPCL) method.

ˆ = W, t ← t+1. Randomly permute {x1 , x2 , . . . , xn }, W for i = 1, 2, . . . , n do Select a winning multi-prototype ωνυii by Eq. (37). Update μυνii with learning rate ηt by Eq. (38).

end for ˆ 2 = Compute L = W − W

P

q

ωj ∈ω

ˆ qj 2 .  μqj − μ

until L   or t  tmax Output: clusters {C1 , C2 , . . . , Cc : xi ∈ Cνi , s.t. ωνυii = arg minωq ∈ω  xi − μqj 2 , ∀i = 1, 2, . . . , n}. j

4.3 Extension of GMPCL and experiment evaluation High-dimensional clustering applications, such as video clustering, are characterized by a high computational load, which is mainly due to the redundant calculation

Fig. 6 GMPCL: the winner update and the clustering result of two moons. (a) and (b) are the procedure of updating the winning multi-prototype. By comparing (a) and (b), both the winning multi-prototype and the corresponding lines in Voronoi diagram move slightly. (c) is the converged multi-prototypes and the corresponding Voronoi diagram, the multi-prototypes representing different clusters are plotted in different markers. (d) is the final clusters.

144

Front. Electr. Electron. Eng. 2012, 7(1): 134–146

of the distances between high-dimensional points in the update procedure of competitive learning. To overcome this problem, an approach similar to the kernel trick [17] is considered. First, an inner product matrix M = [Mi,j ]n×n of the dataset X is computed such that Mi,j = xi , xj . Then the computation of  xi − xj 2 is efficiently accomplished by  xi − xj 2 = Mi,i + Mj,j − 2Mi,j . Thus, the redundant high-dimensional computation is avoided. Unfortunately, it cannot be directly applied in competitive learning due to the incremental update rule. Since the winning multi-prototype wυνii is updated by w υνii ← wυνii + ηt (xi − w υνii ), it is unlikely that the updated wυνii satisfies w υνii ∈ X . No pre-computed distance  xi − wυνii 2 is available for calculating Eq. (37). In KCL, a prototype descriptor W φ is designed to represent c prototypes {μ1 , μ2 , . . . , μc } in the kernel space induced by a mapping φ. The prototype descriptor W φ is a c × (n + 1) matrix, whose rows represent prototypes as the inner products between a prototype and the data points, as well as the squared length of the prototype. Similarly, we can develop a multi-prototype descriptor, which is a row-block matrix independent of the dimensionality, and extend GMPCL to deal with high-dimensional clustering. According to the initialization of multi-prototypes, the initial W satisfies W ⊂ X . The multi-prototype descriptor is defined as follows. Definition 6 (Multi-prototype descriptor) A multi-prototype descriptor is a row-block matrix W of size |W| × (n + 1), ⎞ ⎛ W1 ⎟ ⎜ ⎜W2⎟ ⎟ ⎜ (39) W = ⎜ . ⎟, ⎜ .. ⎟ ⎠ ⎝ Wc such that the jth block W j represents Cj , and the qth row of W j , i.e., W qj,: , represents wqj by q q Wj,i = w qj , xi , i = 1, 2, . . . , n, Wj,n+1 = w qj , w qj , (40) q denotes the ith column of W qj,: . That is, where Wj,i

Fig. 7



w 11 , x1 

w11 , x2 

⎜ ⎜ w 21 , x1  w21 , x2  ⎜ ⎜ .. .. ⎜ . . ⎜ ⎜ p1 p1 ⎜ w1 , x1  w 1 , x2  ⎜ ⎜ .. .. ⎜ . . ⎜ ⎜ 1 1 ⎜ w c , x1  wc , x2  ⎜ ⎜ w 2 , x1  w2 , x2  c c ⎜ ⎜ .. .. ⎜ . . ⎝ pc pc w c , x1  wc , x2 

···

w 11 , xn 

w 11 , w11 

···

w 21 , xn  .. .

w 21 , w21  .. .



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ p1 p1 p1 ⎟ · · · w 1 , xn  w1 , w1  ⎟ ⎟ ⎟ .. .. ⎟. . . ⎟ 1 1 1 ⎟ · · · w c , xn  w c , wc  ⎟ ⎟ · · · w 2c , xn  w 2c , w2c  ⎟ ⎟ ⎟ .. .. ⎟ . . ⎠ pc pc pc · · · w c , xn  w c , wc  (41) Using the ω-notation, the ωjq th row, i.e., W qj,: , represents the ωjq th multi-prototype, i.e., wqj . The initial multi-prototype descriptor W is obtained as a sub-matrix of M . In Algorithm 3, three key procedures of multi-prototype competitive learning are involved with the redundant computation of distances, which are the winning multi-prototype selection, the winner update, and the computation of the sum of prototype update, i.e., L. Based on the multi-prototype descriptor, similar to kernel competitive learning, we implement these procedures whose computational complexity is independent of the dimensionality. This fast approach is termed fast graph-based multi-prototype competitive learning (FGMPCL) in our previous work [29]. To demonstrate the effectiveness of FGMPCL, we report the experimental results for the video clustering task. Video clustering aims at clustering video frames according to different scenes. It plays an important role in automatic video summarization/abstraction as a preprocessing step [33]. Since our intention here is to show that FGMPCL provides an effective and efficient tool for video clustering, it is beyond the scope of this paper to use the domain specific cues [33]. The gray-scale values of raw pixels were used as the feature vector for each frame. One video sequence was used, which we termed “qiangqiangsrx20100603”, and was downloaded from v.ifeng.com1) . This video has a duration of 101 seconds, consisting of 5402 frames of size 576 × 432. During this video, different types of scenes are present. Figure 7

Clustering frames of the video “qiangqiangsrx20100603” into seven groups of scenes using FGMPCL.

1) http://v.ifeng.com/society/201006/999a2cc5-d6b0-43f4-8bf9-b829fce16540.shtml

Jianhuang LAI et al.

Kernel and graph: Two approaches for nonlinear competitive learning clustering

plots the video clustering result on this video by the proposed FGMPCL algorithm with the parameter k in k-NN graph set to 100. From the figure, we can see that, a good segmentation of this video has been obtained. It can correctly cluster the 5402 frames into seven different scenes presented in the video, as listed in Fig. 7, with each scene plotted by two presentative frames.

145

based approach for discrete-valued source separation. International Journal of Neural Systems, 2000, 10(6): 483–490 10.

Cheung Y-M, Xu L. An RPCL-based approach for Markov model identification with unknown state number. IEEE Signal Processing Letters, 2000, 7(10): 284–287

11.

Liu Z-Y, Xu L. Topological local principal component analysis. Neurocomputing, 2003, 55(3–4): 739–745

12.

Chen R-M, Huang Y-M. Competitive neural network to solve scheduling problems. Neurocomputing, 2001, 37(1–4): 177– 196

13.

Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004

14.

Mizutani K, Miyamoto S. Kernel-based fuzzy competitive learning clustering. In: Proceedings of IEEE International Conference on Fuzzy Systems. 2005, 636–639

15.

Bacciu D, Starita A. Expansive competitive learning for kernel vector quantization. Pattern Recognition Letters, 2009, 30(6): 641–651

16.

Inokuchi R, Miyamoto S. Kernel methods for clustering: Competitive learning and c-means. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2006, 14(4): 481–493

17.

Sch¨ olkopf B, Smola A, M¨ uller K-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1998, 10(5): 1299–1319

18.

Tan P-N, Steinbach M, Kumar V. Introduction to Data Mining. Pearson Addison Wesley, 2006

19.

Ert¨ oz L, Steinbach M, Kumar V. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of SIAM International Conference on Data Mining. 2003, 47–58

20.

Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888–905

1. Rumelhart D E, Zipser D. Feature discovery by competitive learning. Cognitive Science, 1985, 9(1): 75–112

21.

Bishop C M. Pattern Recognition and Machine Learning. Springer, 2006

2.

Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition, 2007, 40(8): 2129–2153

22.

DeSieno D. Adding a conscience to competitive learning. In: Proceedings of IEEE International Conference on Neural Networks. 1988, 117–124

3. Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Transactions on Neural Networks, 1993, 4(4): 636–649

23.

Wang C D, Lai J H, Zhu J Y. A conscience on-line learning approach for kernel-based clustering. In: Proceedings of the 10th International Conference on Data Mining. 2010, 531– 540

24.

Wang C D, Lai J H, Zhu J Y. Conscience online learning: An efficient approach for robust kernel-based clustering. Knowledge and Information Systems (in press) DOI: 10.1007/s10115-011-0416-2

25.

Hull J J. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(5): 550–554

26.

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324

27.

Asuncion A, Newman D. UCI Machine Learning Repository. 2007, http://www.ics.uci.edu/˜mlearn/MLRepository.html

5 Conclusions Competitive learning clustering is a hot research topic in the field of data clustering. This paper presents two works done by our group which address the nonlinearly separable problem suffered by the classical competitive learning clustering algorithms. They are kernel competitive learning (KCL) and graph-based multi-prototype competitive learning (GMPCL), respectively. The former is derived from the viewpoint of kernel clustering, and the latter is based on the graph method. In KCL, a novel prototype representation termed prototype descriptor is proposed, which represents prototypes without explicit kernel mapping. In GMPCL, multi-prototype representation is introduced to characterize clusters of irregular shapes. Some extensions of these two approaches have been presented. And experimental results have been reported to demonstrate their effectiveness. Acknowledgements This project was supported by the National Natural Science Foundation of China (NSFC) (Grant No. 61173084) and the NSFC-GuangDong (U0835005).

References

4. Wang C D, Lai J H. Energy based competitive learning. Neurocomputing, 2011, 74(12–13): 2265–2275 5. Hwang W-J, Ye B-Y, Liao S-C. A novel entropy-constrained competitive learning algorithm for vector quantization. Neurocomputing, 1999, 25(1–3): 133–147 6. Xu L. RBF nets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing, 1998, 19(1–3): 223–257 7.

Liu Z-Y, Chiu K-C, Xu L. Strip line detection and thinning by RPCL-based local PCA. Pattern Recognition Letters, 2003, 24(14): 2335–2344

8. Liu Z-Y, Qiao H, Xu L. Multisets mixture learning-based ellipse detection. Pattern Recognition, 2006, 39(4): 731–735 9. Cheung Y-M, Xu L. Rival penalized competitive learning

146

Front. Electr. Electron. Eng. 2012, 7(1): 134–146

28. Tzortzis G F, Likas A C. The global kernel k-means algorithm for clustering in feature space. IEEE Transactions on Neural Networks, 2009, 20(7): 1181–1194 29. Wang C D, Lai J H, Zhu J Y. Graph-based multiprototype competitive learning and its applications. IEEE Transactions on Systems, Man, and Cybernetics — Part C: Applications and Reviews (in press) 30.

31.

Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 1996, 226–231 Liu M, Jiang X, Kot A C. A multi-prototype clustering algorithm. Pattern Recognition, 2009, 42(5): 689–698

32. Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972–976 33. Truong B T, Venkatesh S. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications, 2007, 3(1): 1–37

rent research interests are in the areas of digital image processing, pattern recognition, multimedia communication, wavelet and its applications. He has published over 80 scientific papers in the international journals and conferences on image processing and pattern recognition, e.g., IEEE TNN, IEEE TIP, IEEE TSMC (Part B), Pattern Recognition, ICCV, CVPR, and ICDM. Prof. Lai is chair of the Image and Graphics Association of Guangdong Province and also serves as a standing member of the Image and Graphics Association of China. Changdong WANG received the B.S. degree in applied mathematics in 2008 and M.Sc. degree in computer science in 2010 from Sun Yat-sen University,

his

Guangzhou, China. He started the pursuit of the Ph.D. degree

M.Sc. degree in applied math-

with Sun Yat-sen University in September 2010. His cur-

ematics in 1989 and his Ph.D. degree in mathematics in 1999

rent research interests include machine learning and data mining, especially focusing on data clustering and its ap-

from Sun Yat-sen University, Guangzhou, China. He joined

plications. He has published over 10 scientific papers in several international journals and conferences such as

Sun Yat-sen University in 1989 as an Assistant Professor, where currently, he is a Pro-

Neurocomputing, Knowledge and Information System, IEEE TSMC-C, IEEE TKDE, and IEEE ICDM. His

fessor with the Department of Automation of School of Information Science and Technology and Vice Dean of

ICDM 2010 paper won the Honorable Mention for Best Research Paper Awards. He won the Student Travel

School of Information Science and Technology. His cur-

Award from ICDM 2010 and ICDM 2011, respectively.

Jianhuang

LAI

received

Kernel and graph: Two approaches for nonlinear competitive learning ...

c Higher Education Press and Springer-Verlag Berlin Heidelberg 2012. Abstract ... ize on-line learning in the kernel space without knowing the explicit kernel ...

850KB Sizes 0 Downloads 222 Views

Recommend Documents

Graph-Based Multiprototype Competitive Learning and ... - IEEE Xplore
Oct 12, 2012 - to deal with high-dimensional data clustering, i.e., the fast graph- based multiprototype competitive learning (FGMPCL) algorithm.

Two-Stage Learning Kernel Algorithms - NYU Computer Science
Alignment definitions. The notion of kernel alignment was first introduced by. Cristianini et al. (2001). Our definition of kernel alignment is different and is based ...... ∆(KimK′im) + ∆(∑ i=m. KmiK′mi). By the Cauchy-Schwarz inequality,

Two-Stage Learning Kernel Algorithms - NYU Computer Science
This paper explores a two-stage technique and algorithm for learning kernels. ...... (http://www.cs.toronto.edu/∼delve/data/datasets.html). Table 1 summarizes ...

Two-Stage Learning Kernel Algorithms - NYU Computer Science
(http://www.cs.toronto.edu/∼delve/data/datasets.html). Table 1 summarizes our results. For classification, we com- pare against the l1-svm method and report ...

Hyperparameter Learning for Kernel Embedding ...
We verify our learning algorithm on standard UCI datasets, ... We then employ Rademacher complexity as a data dependent model complexity .... probability of the class label Y being c when the example X is x. ..... Mining, pages 298–307.

Unsupervised multiple kernel learning for ... -
The comparison of these two approaches demonstrates the benefit of our integration ... ity of biological systems, heterogeneous types (continuous data, counts ... profiles to interaction networks by adding network-regularized con- straints with .....

Unsupervised Learning for Graph Matching
used in the supervised or semi-supervised cases with min- ... We demonstrate experimentally that we can learn meaning- ..... date assignments for each feature, we can obtain the next ..... Int J Comput Vis. Fig. 3 Unsupervised learning stage. First r

Unsupervised Learning for Graph Matching - Springer Link
Apr 14, 2011 - Springer Science+Business Media, LLC 2011. Abstract Graph .... tion as an integer quadratic program (Leordeanu and Hebert. 2006; Cour and Shi ... computer vision applications such as: discovering texture regularity (Hays et al. .... fo

Active Learning Approaches for Learning Regular ...
A large class of entity extraction tasks from unstructured data may be addressed by regular expressions, because in ..... management, pages 1285–1294. ACM ...

Active Learning Approaches for Learning Regular ...
traction may span across several lines (e.g., HTML elements including their content). Assuming that ..... NoProfit-HTML/Email. 4651. 1.00. 1.00. 1.00. 1.00. 1.00.

A two-grid approximation scheme for nonlinear ...
analysis of the Fourier symbol shows that this occurs because the two-grid algorithm (consisting in projecting slowly oscillating data into a fine grid) acts, to some ...

Two Approaches for Pay-per-Use Software ...
Java Virtual Machine (JVM) and a small program, the .... details on this protocol, the reader is referred to .... known credit-card-based systems for the Internet.

Two Approaches for the Generalization of Leaf ... - Semantic Scholar
Center for Combinatorics, Nankai University, .... set E, and a specified vertex r ∈ V , which we call the root of G. If X is a .... Then, contract G2 into one vertex ¯r.

Two Approaches for the Generalization of Leaf ... - Semantic Scholar
Definition 2.1 Let G be a graph and F a subgraph of G. An edge e of F is called ..... i=1 Bi. By the same proof technique as for the case m = 1, one can transform F ...

TWO INFINITE VERSIONS OF NONLINEAR ...
[5] A. Grothendieck, Sur certaines classes de suites dans les espaces de ... geometric analysis (Berkeley, CA, 1996), volume 34 of Math. ... Available online at.

Genetic Programming for Kernel-based Learning with ...
Swap node mutation Exchange a primitive with another of the same arity (prob. ... the EKM, the 5 solutions out of 10 runs with best training error are assessed .... the evolution archive, to estimate the probability for an example to be noisy and.

Two LOM Application Profiles: for Learning Objects and ...
Particularly, studying the design of two LOM application profiles for exploit the specifics of ... They were conceived as building blocks, which we can build a lesson, a unit, or a course. ..... July 30, 2009, http://www.profetic.org/spip.php?article

Kernel Methods for Learning Languages - NYU Computer Science
Dec 28, 2007 - cCourant Institute of Mathematical Sciences,. 251 Mercer Street, New ...... for providing hosting and guidance at the Hebrew University. Thanks also to .... Science, pages 349–364, San Diego, California, June 2007. Springer ...

Kernel Methods for Learning Languages - Research at Google
Dec 28, 2007 - its input labels, and further optimize the result with the application of the. 21 ... for providing hosting and guidance at the Hebrew University.

Kernel-Based Models for Reinforcement Learning
cal results of Ormnoneit and Sen (2002) imply that, as the sample size grows, for every s ∈ D, the ... 9: until s is terminal. Note that in line 3 we compute the value ...