On relational possibilistic clustering

Viewer
Transcript

Pattern Recognition 39 (2006) 2010 – 2024 www.elsevier.com/locate/patcog

On relational possibilistic clustering Miquel De Cáceresa, b,∗ , Francesc Olivab , Xavier Fonta a Departament de Biologia Vegetal, Universitat de Barcelona, Avda. Diagonal 645, 08028 Barcelona, Spain b Departament d’Estadística, Universitat de Barcelona, Avda. Diagonal 645, 08028 Barcelona, Spain

Received 16 March 2006; accepted 6 April 2006

Abstract This paper initially describes the relational counterpart of possibilistic c-means (PCM) algorithm, called relational PCM (or RPCM). RPCM is then improved to better handle arbitrary dissimilarity data. First, a re-scaling of the PCM membership function is proposed in order to obtain zero membership values when the distance to prototype equals the maximum value allowed in bounded dissimilarity measures. Second, a heuristic method of reference distance initialisation is provided which diminishes the known PCM tendency of producing coincident clusters. Finally, RPCM improved with our initialisation strategy is tested on both synthetic and real data sets with satisfactory results. 䉷 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Cluster analysis; Possibilistic c-means; Relational data; Dissimilarity measures

1. Introduction Clustering is a popular approach to study the structure of multivariate data. Clustering methods have been successfully applied in many scientiﬁc domains, including artiﬁcial intelligence, psychometrics, ecology, economics and bioinformatics. Generally speaking, the goal of clustering is to derive c “natural” classes or clusters from a set of n unlabelled objects, ={1 , 2 , . . . , n }. Those objects inside a “natural” cluster show a certain degree of closeness or similarity and the cluster itself shows a certain isolation from external objects. 1.1. Prototype-based clustering One of the fundamental distinctions between clustering algorithms rests with the type of data available. Numerical data is often in the form of a set of n vectors in the feature space Rp of p features or variables. That is, the ∗ Corresponding author. Tel.: +34 934035896; fax: +34 934112842.

E-mail address: [email protected] (M. De Cáceres).

matrix to analyse is usually a rectangular (p ×n) data matrix X = [x1 x2 . . . xn ], where xj ∈ Rp is a feature vector representing object j (j =1, . . . , n), and xkj ∈ R (k=1, . . . , p) is the kth measured feature of j . Prototype-based clustering algorithms assume that properties of member objects can be averaged into a cluster prototype, normally represented as a point. Two prototype clustering algorithms widely used in many unsupervised pattern recognition applications are hard c-means (HCM) [1,2] and fuzzy c-means (FCM) [3]. HCM and FCM are partitioning algorithms, meaning that cluster memberships, either crisp or fuzzy, are constrained to sum to one for each object. FCM fuzzy memberships can be interpreted as degrees of sharing or relative memberships but not as degrees of true typicality (i.e. the compatibility between the object and the class prototype). The most serious disadvantage of using relative memberships is that the performance can be inadequate when the data set is contaminated by noise and/or outliers. This problem has motivated the development of more robust clustering methods [4–9]. Among them, we will focus here on possibilistic c-means (PCM), introduced by Krishnapuram and Keller [7]. Their proposal consisted on modifying FCM to cast the clustering problem into the framework of possibility theory,

0031-3203/$30.00 䉷 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2006.04.008

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

where fuzzy membership values are absolute and can be interpreted in terms of true cluster typicalities. The resulting PCM is a mode-seeking algorithm, i.e., each cluster corresponds to a dense region. In fact, each cluster is independent of the others, so a single PCM run can be regarded as c independent runs of a robust mode-seeking algorithm, each looking for a single cluster [9]. Barni et al. [10] criticised the tendency of PCM to produce coincident clusters and its dependence on initial conditions. As a matter of fact, PCM clusters have a lot of mobility because cluster repulsion was eliminated from the original FCM objective function. Some authors have proposed to modify the PCM objective function to include cluster repulsion terms [11,12], but from our point of view these modiﬁcations make PCM less robust as a mode-seeking algorithm. 1.2. Relational data clustering Another form of data that may be used in clustering is relational. Let R=[rhl ] denote a symmetric (n×n) relational or resemblance matrix, where rhl measures the strength of the relationship between objects h and l . R may be either a similarity (S) or a dissimilarity (D) matrix. A dissimilarity (or distance) matrix D = [dhl ] satisﬁes the following wellknown conditions: dhh = 0 for all h = 1, . . . , n;

(1a)

dhl 0 for all h = 1, . . . , n and l = 1, . . . , n; and

(1b)

dhl = dlh for all h = 1, . . . , n and l = 1, . . . , n;

(1c)

For the rest of this paper, we will deal only with dissimilarity matrices. If the available data is a matrix of similarities S = [shl ], the corresponding dissimilarity matrix can be obtained by applying a data transformation √ to the similarity values, such as Gower’s [13] dhl = shh + sll − 2 · shl , among others. Relational clustering algorithms are frequently graphtheoretic in nature, owing to the fact that R can be viewed as the adjacency matrix of a weighted digraph on the n objects (nodes) in X. However, prototype clustering algorithms such as HCM, FCM and PCM allow the deﬁnition of algorithm duals based on dissimilarity matrices. The relational duality for HCM has been known for a long time [14,15], and an analogous duality can be found under the context of discriminant analysis [16,17]. Hathaway et al. [18] presented relational dual algorithms for HCM and FCM, which were called relational hard c-means (RHCM) and relational fuzzy c-means (RFCM). Afterwards, Hathaway and Bezdek [19] introduced a modiﬁcation of RFCM in order to make it suitable for non-Euclidean dissimilarity data. Later on, Hathaway et al. [20] reviewed relational versions of c-means algorithms and outlined it was straightforward to deﬁne an analogous relational dual for PCM algorithm. As far as we know, up to date no other contributions have been made

2011

addressing the issue of PCM algorithm with relational data. In the next section we review the possibilistic approach and the usual object data PCM algorithm. Then, in Section 3, we state the relational duality and describe the relational PCM (RPCM) algorithm, mainly adapting the work done in [18,19]. After that, the case of non-Euclidean dissimilarity matrices is brieﬂy discussed. In further sections we focus our attention in solving two problems that arise when running PCM. Section 4 tackles the problem of PCM clustering on bounded dissimilarity matrices; when a given dissimilarity measure has an upper distance bound, the PCM membership function gives overestimated memberships for high distances to cluster prototype, because it only approaches zero asymptotically. Therefore, we propose to re-scale the PCM membership function in order to make it attain zero values for the distance upper bound. On the other hand, one of the main disadvantages of PCM is that it needs a good initialisation of the reference distance parameter to provide accurate clustering results [10,21]. This holds true for both object data and relational data PCM. Therefore, we dedicate Section 5 to provide a new reference distance initialisation strategy for PCM. Speciﬁcally, our second contribution provides suitable reference distance values by studying the dependence of cluster variability on this parameter. Finally, in Sections 6 we provide two simulated and one real data examples of application in order to illustrate the proposed modiﬁcations.

2. Object-data PCM We begin with a description of the notation used to represent the results of a possibilistic clustering. U = [uij ] is a (c × n) fuzzy possibilistic membership matrix if it satisﬁes the following three conditions: 0 uij 1 n

for i = 1, . . . , c and j = 1, . . . , n;

(2a)

uij > 0

for i = 1, . . . , c; and

(2b)

uij > 0

for j = 1, . . . , n;

(2c)

j =1 c i=1

This possibilistic membership matrix deﬁnes c distinct (uncoupled) possibility distributions. Strictly speaking, U is not a fuzzy “partition” matrix since object cluster memberships are not constrained to sum to one (like those of FCM). In other words, a possibilistic matrix U does not satisfy the partition constraint c

uij = 1

for j = 1, . . . , n.

(3)

i=1

Given the possibilistic constraints (2), it is important to note that many functions could be used as fuzzy possibilistic

2012

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

membership functions, provided they were continuous, monotonically decreasing with increasing distances and ranged in the interval [0, 1]. The standard possibilistic cmeans (PCM) algorithm chooses the following membership function: 2 2 ) = 1/(1 + (eij /i )1/(m−1) ), uij = u(eij

j =1

n c i=1 j =1 c

+

i

i=1

n j =1

2 = i − xj 2 = (i − xj ) Ai (i − xj ), eij

(1 − uij )m ,

(5)

j =1

2 (uij )m eij + i

n

(1 − uij )m .

(6)

j =1

It is important to realise that the possibilistic membership value of object j in cluster i depends only on the distance from the feature vector xj to the cluster prototype i and not on the distances from xj to other cluster prototypes, as it is in FCM. This fact ensures cluster independence. In contrast, under the possibilistic approach distance to prototype must be sized with a reference distance (i ). This is reﬂected in the second term of the PCM objective function. Note that after having discarded the partition restriction (3), the PCM algorithm would not be able to avoid the trivial solution without the inclusion of this second term in its objective function. Interpretation of the parameter m is also different in FCM and PCM. In the former, increasing values of m represent increased sharing of points among all clusters, whereas in the latter increasing values of m represent increased possibility of all points completely belonging to a given cluster. Therefore, the m value that gives satisfactory results can be different in the two clustering algorithms. While an usual value for FCM is m = 2.0, Krishnapuram and Keller [21] state that, in case of Gaussian clusters, an appropriate choice for PCM would be m = 1.5. In the original deﬁnition [7], the PCM algorithm deals only with object–feature data matrices. To stress this object–data orientation, from now on we will denote the

(8)

where Ai is a norm-inducing matrix. If Ai = I, data clusters are assumed to be spherical. In a more general context, hyper-ellipsoidal cluster forms can be adequately detected by using the inverse of cluster covariance matrix, that is Ai = |Sf i |1/P · S−1 f i where Sf i = ⎝

where L=(1 , 2 , . . . , C ) is a c-tuple of cluster prototypes. As stated before, PCM can be viewed as c independent runs of a one-cluster possibilistic algorithm. Indeed, the PCM objective function is a sum of c independent one-cluster objective functions, each one being [21]: Ji (i , ui ; X) =

2 are given and then the squared distances to prototypes eij by the usual inner product norm

⎛

2 (uij )m eij n

j =1

(4)

2 is the squared distance from feature vector x to the where eij j i cluster prototype, m 1 is the “fuzziﬁcation” parameter, and i is a suitable positive number called the reference distance. We will use here the letter e to denote object to cluster prototype distances to avoid the confusion with interobject distances, which will be noted with the usual d. As it can be easily shown, Eq. (4) minimises the following PCM objective function [7]:

JP CM (L, U; X) =

original PCM algorithm as OPCM. Cluster prototype vectors are obtained in OPCM through ⎛ ⎞ ⎛ ⎞ n n ⎠ ⎝ ⎠ um um (7) i = ⎝ ij · xj ij

n j =1

⎞ ⎛ ⎞ n ⎠ ⎝ ⎠ um um ij · (xj − i ) · (xj − i ) ij j =1

is the fuzzy covariance matrix of cluster i. This last approach gives the possibilistic Gustafson–Kessel (PGK) algorithm [7,22]. The OPCM algorithm alternates the calculation of cluster prototypes and object to prototype distances with the update of the possibilistic membership matrix U. As a result, cluster prototypes are successively attracted to dense regions in feature space. The OPCM steps are as follows: OPCM algorithm OPCM-1: Given object data matrix X=[x1 x2 . . . xn ] where xj ∈ Rp , choose an inner product norm on Rp . Fix m, 1 < m < ∞, and i . Initialise possibilistic membership matrix U(0) . Then for r = 0, 1, 2, . . . OPCM-2: Calculate new cluster prototypes and square distances to prototype using Eqs. (7) and (8) for all 1 j n and 1 i c. OPCM-3: Update possibilistic membership matrix U(r) to U(r+1) using (4) for all 1 i c and 1 j n. OPCM-4: Check for convergence using any convenient matrix norm : if U(r+1) − U(r) then stop. Otherwise set r = r + 1 and return to OPCM-2.

3. Relational data PCM 3.1. Relational possibilistic c-means (RPCM) A relational version of OPCM is possible if we calculate the distances in step OPCM-2 directly from relational data. This was shown to be possible for HCM and FCM by Hathaway et al. [18]. Later on, Hathaway et al. [20] outlined that the same duality could be applied in the case of PCM. The results in [18] are repeated here in the following theorem, speciﬁcally adapted for OPCM.

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

Theorem 1. Let X = [x1 x2 . . . xn ] be a given data matrix of object-feature data (xj ∈ Rp ) and · denote a given inner product norm on Rp . Deﬁne the corresponding matrix of 2 ], where d 2 =x −x 2 for squared distances R=[rhl ]=[dhl j h hl 1 h, l n. Then the distance values calculated in OPCM-2 equal the distance values calculated via ⎞ ⎛ n m ⎠ for i = 1, . . . , c, (9a) ⎝ i = (um um i1 , . . . , uin ) ij j =1

2 eij = (Ri )j − (1/2)(i ) R(i ) j = 1, . . . , n.

for i = 1, . . . , c and (9b)

Furthermore, the sequence of partitions produced by OPCM using OPCM-2 is identical to the sequence produced by the same algorithm replacing (7) and (8) with (9) and using the relational data R instead of object data X. The proof of the above theorem can be found in Ref. [18] for HCM and FCM, and it is straightforwardly valid for PCM. On the other hand, it is easy to see that Eqs. (9a) and (9b) skip the computation of cluster prototype coordinates and thus avoid the need of an object-feature data matrix X. The same equations can be used for both HCM, FCM and PCM algorithms, being the u values crisp memberships in the ﬁrst case, relative memberships in the second and absolute memberships (i.e. cluster typicalities) in the third. Without loss of generality, Eqs. (9a) and (9b) can be summarised in a single simpler equation, avoiding the deﬁnition of vector i : n n 2 m 2 m eij = uih dj h uih h=1

⎛

− (1/2) · ⎝

h=1 n h,l=1

⎞

m 2⎠ um ih uil dhl

n

2 um ih

.

(10)

h=1

Replacing Eqs. (7) and (8) with (10) in OPCM-2 gives the relational possibilistic c-means (RPCM) algorithm. Its ob2 , equal to jective function is, despite the new meaning of eij (5). Relational possibilistic c-means (RPCM) algorithm RPCM-1: Given a relational dissimilarity matrix D = [dhl ], where dhl represents the distance between the (unknown) feature vectors xh and xl of objects h and l . Fix m, 1 < m < ∞ and i . Initialise possibilistic membership matrix U(0) . Then for r = 0, 1, 2, . . . . RPCM-2: Calculate the square distances to the cluster prototypes using (10) for all 1 i c and 1 j n . RPCM-3: Update the possibilistic membership matrix U(r) to U(r+1) using (4) for all 1 i c and 1 j n. RPCM-4: Check for convergence using any convenient matrix norm : if U (r+1) − U (r) then stop. Otherwise set r = r + 1 and return to RPCM-2.

2013

3.2. Relational duality and non-Euclidean dissimilarity matrices As Hathaway et al. [18] state, the relational duality holds not only for inner product norms, but for any Euclidean relational data matrix. We mean by Euclidean a dissimilarity matrix whose squared distances can be fully represented in an Euclidean space; that is, there exists some X = [x1 x2 . . . xn ], where xj ∈ Rp , whose inter-object distances measured with the Euclidean norm yield the same original dissimilarity values. Classical multidimensional scaling (MDS) [13] is an ordination technique that performs exactly this, i.e. it obtains an Euclidean representation X in Rp from an arbitrary dissimilarity matrix D, where the inter-object distances in the full space equal those of matrix D. Thus, the relational approach for prototype clustering methods is equivalent to running classical MDS on the dissimilarity matrix D and using the resulting X matrix as the input for the object data PCM version. There is no guarantee that an arbitrary dissimilarity matrix satisfying (1) will be Euclidean. If it is not, classical MDS will yield some negative eigenvalues [13] and X will have some corresponding imaginary axes. Under such situation, the original dissimilarity matrix D can be recovered too, because inter-object distances can be obtained by subtracting the distances in the imaginary axes from those in the real ones. That is, let {a1 , . . . , an } be the set of real components, and {b1 , . . . , bn } the corresponding set of imaginary components. Then the inter-object squared distances can be calculated via 2 dhl = ah − al 2 + ibh − ibl 2

= ah − al 2 + (i 2 )bh − bl 2 = ah − al 2 − bh − bl 2

(11)

for 1 h, l n. As the original distance values in D are not negative, the squared distance values obtained by (11) will not be negative either. Nevertheless, if new points had to be located in this ordination space, some negative “proximities” could appear, breaking the non-negativity dissimilarity condition (1b). What is the corresponding effect of non-Euclidean dissimilarity matrices on relational dual approaches? Following the comparison with classical MDS, cluster prototypes are new points to be located in the MDS space. Therefore, if D is not Euclidean there can appear negative “proximities” to cluster prototypes. When calculating PCM membership values these negative “proximities” are translated into membership values slightly above 1, which in turn breaks the possibilistic membership condition (2a). Under the context of HCM and FCM relational duals, Hathaway and Bezdek [19] propose a correction for non-Euclidean D matrices. It consists in applying a “spreading” transformation in all distance to prototype values whenever a negative value is encountered in (10). Their transformation can be interpreted as effecting

2014

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

a geometric spreading or dilation of the corresponding object data. Of course, the same correction could be applied to RPCM. That is, we could replace RPCM-2 by RPCM-2 : Calculate the square distances to cluster proto2 using (10) for all 1 i c and 1 j n. If e2 < 0 types eij ij for any i and j, then calculate

2 = max −2eij /i − aj 2 (12a) and change 2 2 eij ← eij + (/2) · i − aj 2 1i c and 1j n.

for all (12b)

In the above, aj denotes the jth unit vector in R . A complete discussion regarding the theory and implementation of the transformation (12) is given in Ref. [19]. The geometric spreading introduced in (12), while being small, induces an increase in data fuzziness, because small distances are proportionally more affected by transform than large distances. From our point of view, this can be a drawback in noisy data sets. For this reason, we prefer not to use that correction for non-Euclidean dissimilarity matrices. A modiﬁcation, which we estimate is less prone to increase fuzziness, would be simply to replace RPCM-2 by RPCM-2 : Calculate the square distances to cluster pro2 <0 totypes using (10) for all 1i c and 1j n. If eij 2 = 0. for any i and j, then set eij 2 to zero As PCM clusters are independent, setting eij (or any other value) for one cluster does not have any inﬂuence on the other clusters. Therefore to put negative “dissimilarities” to zero seems an easy valid strategy to avoid tackling them. In addition, maybe the strongest argument against the relational approach on prototype clustering methods is not the presence of negative distances to prototype, but that it assumes that under all dissimilarity spaces, including non-Euclidean ones, cluster prototypes are adequately represented by the weighted average of patterns of Eq. (7). n

4. RPCM and bounded dissimilarity measures We will call bounded dissimilarities those dissimilarity measures with an upper bound (i.e. a maximum value), regardless of their metric or Euclidean properties. Let db be the upper dissimilarity bound of one such measure. In the case of dissimilarity matrices obtained using a bounded dissimilarity, it is easy to see that a problem arises in RPCM, concerning the possibilistic membership function (4). Whereas one would expect a zero membership value for distances to centroid equal to db , the possibilistic membership function yields null values for large distances only asymptotically. As a result, the calculated membership value for the upper distance bound, u(db2 ), is always non-zero. In Fig. 1a 2 values. The magnitude of the dewe plotted uij against eij parture from zero depends on the fuzziness value, (i.e. the

Fig. 1. (a) Possibilistic membership function u for different values of m and constant i . (b) Re-scaled possibilistic membership function u∗ for different values of m and constant i .

highest the fuzziness level is, the larger u(db2 ) will be) and on the reference distance used (i.e. the closer to db2 gets i , the larger u(db2 ) will be). As bounded dissimilarities will often appear when transforming similarity values into dissimilarities, these “overestimated” membership values could be potentially misleading in several similarity-based clustering applications. In order to solve this problem, we propose simply to correct function (4) by re-scaling it to the interval [0, u(db )]. That is, we deﬁne a new possibilistic membership function, noted u∗ , as u∗ij = (uij − u(db ))/(1 − u(db )) = uij · aij ,

(13)

2 /d 2 )1/(m−1) where the right product member aij = 1 − (eij b 2 is the membership function correcting factor. The closer eij gets to db2 , the closer to zero will be factor aij , and so will be u∗ . The effect of the proposed correction on possibilistic memberships can be checked comparing Fig. 1a with Fig. 1b, where the new possibilistic membership function u∗

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024 2 . As a collateral effect, the correction is plotted against eij for bounded distances induces a bias on the squared distance value that yields a membership of 0.5, which we will refer to as hi . In the original possibilistic membership function hi was always equal to i , and hence i could be interpreted as hi . However, in the modiﬁed membership function, hi is always equal or smaller than i . The following expression gives hi for any i , m and db values:

hi = i · (1 + 2 · (i /db2 )1/(m−1) )(1−m) .

(14)

The larger m, the smaller will be hi with respect to i (see Fig. 1b). Notwithstanding these last considerations, one does not normally want to specify a concrete value for hi or i . Rather, what is normally needed is a good estimate of the parameter i in terms of the speciﬁc cluster size, regardless of the resulting value hi . This is what we try to solve in the next section. It is important to say that one can obtain a new algorithmic variant for RPCM by replacing Eq. (4) with (13) in step RPCM-3. However, the effect of the correction on the algorithm’s functional and result is small. That is, as the “overestimated” membership values only occur for objects far away from the cluster centroid, there aren’t big differences in the clustering solution. Therefore, while we believe that membership assignments had to be corrected, we do not estimate a new algorithm deﬁnition to be necessary.

5. Estimating the reference distance in possibilistic clustering 5.1. The reference distance parameter The reference distance, i , can be interpreted as a “scale” or “bandwidth” parameter. In general, it is desirable that i relates to the overall size of the cluster i. This parameter also determines the zone of inﬂuence of a cluster. Indeed, the larger is i in a cluster, the more mobility it has, since it can “see” more points. Therefore, overestimating i for a “small” cluster located beside a “larger” one increases the probability of missing the small one. In addition, recall that mobility can be further increased when using large values of m. The high mobility resulting from the inadequate initialisation of i (and m) is the reason why Barni et al. [10] reported a miss of cluster structure when running OPCM, even with the correct partition as initial starting point. Thus, it is obvious that good estimation of this parameter is crucial for the success of both the OPCM algorithm and its relational dual RPCM. When the nature of clusters is known, values for i may be ﬁxed a priori. On the other hand, when running the PGK algorithm [7,22], the use of the Mahalanobis norm can simplify the choice of i . Unfortunately, this is not generally the case in relational clustering problems. Krishnapuram and

2015

Keller [7] provide two ways of initialising i . The ﬁrst one is a K multiple of cluster geometric variance (Vi ): ⎞ ⎛ ⎞⎞ ⎛⎛ n n 2⎠ ⎝ ⎠⎠ = K · Vi (15) i = K · ⎝⎝ um um ij · eij ij j =1

j =1

and the second way is ⎞ ⎛ 2⎠ |(i ) |, eij i = ⎝

(16)

xj ∈(i )

where (i ) is an appropriate -cut of i . Speciﬁcally, these authors recommend to compute ﬁrst an approximate value for i based on an initial fuzzy partition and using Eq. (15) with K = 1 and, after convergence, to re-calculate i using Eq. (16) and run the algorithm again [7]. Estimating reference distance from FCM and using the above equations can overestimate or underestimate this parameter, due to the inclusion of outliers and inliers in the FCM partition. In addition, cluster variance may not provide good i estimates when clusters exhibit large deviations from multi-normality, which is expected to be the case in many arbitrary dissimilarity matrices. What we propose in the following subsection is to perform a small search of the suitable i values before running OPCM or RPCM. 5.2. Estimating i on the basis of cluster variability Let’s accept we have more or less correctly determined the cluster prototype of a dense region (explicitly as in OPCM or implicitly as in RPCM) but we lack a good estimation of i . What we need is a function to provide suitable values for i , that is, values that yield compact and isolated (i.e. “natural”) clusters. Our approach can be explained using the following rationale: For very small values, i is obviously underestimated in the dense region. Then, each increment of i provokes an increase in possibilistic memberships. As new cluster points are progressively “seen” and included, cluster variance increases too. However, when cluster growing reaches a less dense (or even empty) region, new increments in i do not include so many new objects and cluster variance progressively stops increasing. As soon as this low density region is stepped over, and new external objects are about to be included, cluster variance increases again. Following the above reasoning, we believe that a heuristic criterion to provide suitable i values would be to search for those values which correspond to local minimum values of the partial derivative of cluster variance with respect to reference distance: jVi /ji . However, cluster variance increases quadratically and not linearly in an isotropic space. The solution given to this fact is to use cluster “standard 1/2 deviation” (Stdi = Vi ) instead of cluster variance, though we have obtained similar applied results using Vi . The analytical partial derivative of Stdi with respect to i is jStdi /ji = (jVi /ji )/(2 · Stdi ),

(17)

2016

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

where

⎞ ⎛ ⎞⎞ ⎛⎛ n n 2 ⎠ ⎝ ⎠⎠ jVi /ji = ⎝⎝ eij · um um ij · ij ij j =1

j =1

⎞ ⎛ ⎞⎞ n n ⎠ ⎝ ⎠⎠ − Vi · ⎝⎝ um um ij · ij ij ⎛⎛

j =1

j =1

and 2 1/(m−1) ij = (m/(m − 1)) · −1 . i · uij · (eij /i )

In order to ﬁnd minimum values for jStdi /ji we used its derivative, that is, the second partial derivative of cluster standard deviation with respect to the reference distance: j2 Stdi /j2 i = ((j2 Vi j2 i ) − 2 · (jStdi /ji )2 )/2 · Stdi , (18) where j2 Vi /j2 i ⎞ ⎛ ⎞ ⎛ n n 2 ⎠ ⎠ ⎝ =⎝ eij · um um ij · (ij + jij /ji ) ij j =1

⎛

−Vi · ⎝

j =1

n j =1

⎞ ⎛ ⎠ um ij · (ij + jij /ji )

⎝

n j =1

⎞ ⎠ um ij

and 2 jij /ji = ((ij /((m−1) · i )) · (uij · (eij /i )1/(m−1) −m).

The same equations are valid for both OPCM and RPCM, and u∗ values can be used instead of u. Finally, our reference distance initialisation strategy is as follows: (1) Starting from a suitable initial membership matrix, calculate for each cluster the usual estimate of reference distance using Eq. (15). (2) For each cluster, ﬁnd the closest reference distance that yields a jStdi /ji minimum. 6. Example applications In this section we illustrate the application of the reference distance estimation approach in PCM clustering through the study of both simulated and real data examples. The aim is to compare the performance of OPCM and RPCM when using the usual or the proposed reference distance initialisation strategies. We will also show FCM (or RFCM) results because this clustering method is normally used to provide starting cluster memberships for PCM. 6.1. Simulated examples In this subsection we study two simulated data sets in R2 . Note that, despite the fact that we will display here

object–feature data, one would obtain the same results running either OPCM or RPCM algorithms. In addition, as both possibilistic algorithms are better suited to detect spheres than ellipses or circles (which would be better tackled using PGK), all synthetic clusters are made spherical on purpose. The ﬁrst example (Fig. 2a) consists of two spherical Gaussian clusters without noise. Cluster 1 has 100 objects and its prototype lies at (2, 2); Cluster 2 is much larger, has 400 objects and its prototype lies at (10, 10). Running FCM on this data set, with m = 2.0 resulted in an unsatisfactory partition due to the differences in cluster size (shown in Fig. 2b, where symbol size and colour intensity is a function of object’s largest fuzzy membership value). On the other hand, cluster prototypes were not badly positioned. The small cluster prototype was lightly attracted towards the large cluster due to the inclusion of some of its points. At the same time, the large cluster was underestimated and its prototype position was pulled apart because of cluster repulsion. We used the FCM result to initialise OPCM. Following the recommendations made by Krishnapuram and Keller [21] for Gaussian clusters, the fuzziness exponent was now set to m = 1.5, and the reference distances were estimated using (15) with K =1. As a result, the reference distance for cluster 1 was overestimated (1 = 19.6), and for cluster 2 it was underestimated (2 =23.9). The membership values obtained after running OPCM with this initialisation are displayed in Fig. 2c. Again some points from the large group are clustered in the small one. Nevertheless the solution is slightly better than with FCM. Cluster 1 standard deviation and partial derivative values are plotted in Fig. 3 against 1 . That is, assuming that clus2 values are ﬁxed, for each value of ter prototypes and e1j 1 we computed the possibilistic memberships and the resulting cluster statistics. The closest minimum of jStdi /ji for cluster 1 was at 1 = 6.7. In the case of cluster 2, the closest minimum was at 2 = 45. When we ran OPCM initialised with these reference distance values, we ended up with a complete satisfactory solution for both clusters, which is displayed in Fig 2d. In addition, Table 1 shows that the cluster prototypes were better located when estimating reference distances using our approach. Recall from Eq. (17) that cluster standard deviation derivatives depend, like possibilistic memberships, on m values. We plotted in Fig. 4 the small cluster standard deviation ﬁrst derivative values obtained using different “fuzziﬁcation” levels. Low values (i.e. m = 1.1) yield more derivative minima points. Higher values progressively make the less isolated clusters unnoticed. At the same time, the i “optimum” for a given cluster decreases and its associated derivative minimum is higher. For very high fuzziness values (i.e. m = 2.0 or higher) only two partial derivative minima remain: those corresponding to i = 0.0 and ∞. The second simulated example is made up of three spherical clusters immersed inside a uniform random noise

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

2017

20

20

20

20

10

10

10

10

0

0

0

0

(a)

0

10

20

(b)

0

10

20

20

20

20

20

10

10

10

10

0

0

0

0

(c)

0

10

20

(d)

0

10

20

Fig. 2. A small and a big cluster (synthetic data 1). (a) Point coordinates, in R2 . (b) FCM (m = 2.0) solution. (c) OPCM (m = 1.5) solution with initialisation of i using Eq. (15). (d) OPCM (m = 1.5) solution with initialisation of i ﬁnding the closest jStdi /ji minimum. Symbol size and colour intensity is a function of the object’s largest membership value in (b–d).

background (Fig. 5a). Two of the three clusters are equally sized and the third one is only slightly larger. At the same time, one of the small clusters lies somewhat closer to the larger than the other. As with the previous example, we ﬁrst ran FCM (m = 2.0). The resulting relative memberships are displayed in Fig. 5b. The “fuzziﬁcation” parameter for PCM this time was lowered to m = 1.3 because of the presence of noise. When initialising PCM reference distances from this membership matrix and using Eq. (15), all i values became overestimated due to the effect of noise points on the cluster variance. Therefore, running OPCM (m = 1.3) resulted in the recovering of only two of the three clusters, failing to recognise the small cluster closest to the large one (Fig. 5c). Under the possibilistic approach, there are two ways to solve the problem of not recognizing a cluster. The ﬁrst implies reducing m to more crisp values, but has the drawback of decreasing PCM mobility. The second one consists in performing a better reference distance estimation. Fig. 5d shows the complete cluster recovery obtained with OPCM (m = 1.3) using our reference distance estimation approach. In addition, as can be seen in Table 2, cluster prototypes were again set closer to the true values than those provided by FCM.

6.2. Microarray data example Recent technological advances such as cDNA microarray technology have made it possible to simultaneously interrogate thousands of genes in a biological specimen, based on the relative abundance of each gene’s mRNA, by using a two-colour ﬂuorescent probe hybridization system [23]. The gene expression proﬁle obtained for a specimen consists of log transformed normalized expression ratios measured on the full set of genes represented in the microarray. For a given spot (e.g. gene) on an array, the expression ratio is formed by dividing the ﬂuorescent signal measured for a test sample at that spot by the ﬂuorescent signal measured from a reference sample. Cluster analysis techniques have frequently been used for investigating structure in microarray data. Hierarchical clustering methods were introduced by Eisen et al. [24] as a means of visualizing and interpreting the high dimensional data generated by microarray technology. In this sense, hierarchical clustering has been utilized with some success for ﬁnding disease subtypes in cancer studies [25], and as the ﬁrst step to discovering regulatory elements in transcriptional regulatory networks [26,27]. Co-expressed genes in

2018

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

Table 1 Synthetic data 1 cluster prototype coordinates and overall displacement from the true solution, obtained using different clustering strategies. Displacement is measured as the sum of Euclidean distances between the centroids of the true solution and those found by the clustering algorithm

True FCM m = 2.0 OPCM m = 1.5, i from Eq. (15) OPCM m = 1.5, i using the new strategy

11

12

21

22

Displacement

2.00 3.32 2.84 2.20

2.00 3.45 2.92 2.10

10.00 10.83 9.88 9.68

10.00 10.34 9.60 9.47

0 2.86 1.66 0.84

1/2

Fig. 4. jStdi /ji plotted against 1 data 1 using different m values.

Fig. 3. Functions jStd1 (above) jStd1 /j1 (in the middle) and 1/2 j2 Std1 /j2 1 (bottom) plotted against 1 , for the small cluster of synthetic data 1 using m = 1.5.

the same cluster are probably involved in the same cellular process, and strong correlation between gene expression patterns indicates co-regulation. Once a clustering algorithm has grouped similar objects (genes or samples) together, the biologist is then faced with the task of interpreting these clusters. For example, if a gene of unknown function is clustered together with many genes of similar, known function, one might hypothesize that the unknown gene also has a related function. As an example to demonstrate the clustering ability of RPCM with the proposed modiﬁcations on real data, and only for illustrative purposes, we include here the analysis of data from a study of human ﬁbroblast differential expression after serum addition [28]. We chose for our analysis

for the small cluster of synthetic

a subset of 517 genes which was studied in [28] with hierarchical clustering methods, concluding that 10 different clusters could be distinguished. The same data set was later used as a test data for OFCM [29]. It can be downloaded at http://www.sciencemag.org/feature/data/984559.shl. To begin with, we computed the complementary dissimilarity of Pearson correlation coefﬁcient between genes. It is usual with this kind of data to measure gene resemblance using the Pearson correlation coefﬁcient, as it assesses the similarity of their expression patterns in the sense of how closely the values in one pattern can be approximated by a linear function of the values in the other [30]. In order to graphically display the scatter of the dissimilarity matrix obtained, we computed classical MDS. The ﬁrst two principal coordinates are shown in Fig. 6a. To start exploring the cluster structure of this data set, we ran RFCM on the dissimilarity matrix using c = 10 and m = 1.25 (the same exponent was used in [29]). The relative membership matrix obtained is displayed in Fig. 6b. The crisp agreement between RFCM solution, after defuzziﬁcation, and the ten original clusters, as measured with the Rand index corrected for random effects [31], is rather poor (Rand = 0.352). Note that the ten FCM clusters appear as “segmentations” of the global circular structure but not all of them correspond to dense regions of genes with correlated expression patterns. In our opinion (R)PCM may be a useful tool to

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

20

20

15

15

10

10

5

5

0

0

-5 (a) -5

0

5

10

15

20

-5 (b) -5

20

20

15

15

10

10

5

5

0

0

-5 (c) -5

0

5

10

15

20

-5 (d) -5

2019

0

5

10

15

20

0

5

10

15

20

Fig. 5. Three clusters with noise (synthetic data 2). (a) Point coordinates, in R2 . (b) FCM (m = 2.0) solution. (c) OPCM (m = 1.3) solution with initialisation of i using Eq. (15). (d) OPCM (m = 1.3) solution with initialisation of i ﬁnding the closest jStdi /ji minimum.

Table 2 Synthetic data 2 cluster prototype co-ordinates and overall displacement from the true solution, obtained using different clustering strategies. Displacement is measured as the sum of Euclidean distances between the centroids of the true solution and those found by the clustering algorithm

True FCM m = 2.0 OPCM m = 1.3, i from Eq. (15) OPCM m = 1.3, i from our approach

11

12

21

22

31

32

Displacement

2.00 1.90 2.47 1.97

2.00 2.45 2.33 1.95

9.00 8.60 8.90 8.88

9.00 10.07 8.60 8.71

13.00 13.34 8.87 12.98

2.00 2.28 8.60 1.99

0 2.04 N/A 0.39

avoid the effect of loosely related genes (i.e. outliers) on the clustering solution of the others. We ran RPCM, using the 10 FCM clusters as the starting conﬁguration and with the fuzziness parameter set to m = 1.2. PCM cluster names keep the number of the FCM cluster from which the algorithm was initialised. RPCM was run twice, as with the synthetic data examples, ﬁrst initialising the cluster reference distances using Eq. (15), and secondly initialising them with our new strategy. Cluster standard deviation derivative proﬁles can be seen in Fig. 7. The difference between cluster derivative values at the local minima can be interpreted as

differences in cluster compactness and isolation. Some clusters (e.g. 8, 9 and 10) show only shallow minima, which would soon become unnoticed if using higher fuzziness exponent levels. The reference distances used in each RPCM run are signalled in Fig. 7 with arrows for the ﬁrst strategy and dots for the second. In seven clusters the reference distances resulting from (15) are higher than those of the proposed initialisation strategy. Table 3 shows the fuzzy intersection matrices between the resulting possibilistic fuzzy sets. While in both cases there is a certain amount of fuzzy overlap and inclusion, the classical

2020

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

Fig. 6. Clustering of microarray data using the complementary of Pearson correlation coefﬁcient. (a) Scatter graphic using the ﬁrst two coordinates obtained from classical MDS. (b) RFCM (m = 1.25) solution for c = 10. (c) Four top-level clusters for the RPCM (m = 1.2) solution with initialisation of using Eq. (15). (d) Six top-level clusters for the RPCM (m = 1.2) solution with initialisation of i ﬁnding the closest jStdi /ji minimum. Cluster labels are used to indicate the correspondence between the FCM clusters used to initialise PCM and its results.

reference distance initialisation gives more cases of partial overlap (e.g. between clusters 4 and 7), because cluster size is usually inadequately assessed. Although clusters 3 and 4 still have an overlap of about 16 points, one can conclude that four main structures, i.e. clusters 1–4, are identiﬁed in this run. These groups are indicated with an asterisk in Table 3a and displayed in Fig. 6c. In contrast, using the proposed reference distance estimation substantially reduces the amount of cluster overlap, though there are still some cases of inclusion (Table 3b). Moreover, this time six distinct clusters (1–4, 6 and 9) can be recognized (Fig. 6d.). In order to compare the two solutions found with RPCM (i.e. the four and six-cluster solutions) with the original classiﬁcation in ten clusters, the following was done to each case. To begin with, the genes not classiﬁed (i.e. considered outliers by PCM) were removed by using a minimum membership threshold of 0.05. The number of genes to be compared was 342 for the ﬁrst solution and 316 for the second. Afterwards, a crisp partition was obtained by choosing for each object the closest cluster prototype. This partition was compared to the original classiﬁcation using the Rand index corrected for random, which allows the comparison of partitions of different number of clusters. The corrected Rand values were 0.520 for the solution coming from the classical

PCM reference distance initialisation and 0.647 for the solution obtained using the proposed strategy. These scores and the number of clusters found conﬁrm that the proposed reference distance initialisation strategy substantially improves the ability of (R)PCM algorithm to detect true cluster structures. Table 4 shows the confusion matrix between the clusters of the original analysis [28] and those found in the second PCM run. The cluster average silhouette [32] has been computed for each cluster as a measure of isolation. First, note that the clusters of PCM are by far more isolated on average than the original clusters, a fact that stresses the “naturality” of PCM solutions. Second, almost all genes in original clusters I and J, which have negative isolations, are left unclassiﬁed by PCM. Third, approximately two thirds of the genes originally belonging to the next less isolated clusters (A, B and C) have been rearranged into PCM clusters 3, 4 and 9. In contrast, the three originally most isolated clusters (D, F and H) can be mainly mapped into three corresponding PCM clusters (i.e. clusters 2, 1 and 6, respectively). To conclude, although there is not a complete correspondence between the original and the PCM clusters, results show that PCM fairly keeps the originally isolated gene groups while discarding or rearranging those clusters with lower isolation values.

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

Fig. 7. Cluster standard deviation derivatives computed using different reference distances for FCM clusters 1–5 (a) and 6–10 (b). Reference distances used in the ﬁrst RPCM run are indicated by arrows and those used in the second run are indicated by dots.

7. Discussion In this paper we have presented the relational dual of the possibilistic c-means algorithm (RPCM) and improved some issues of its application. As a general rule, it is computationally much cheaper to do a PCM clustering with object data rather than with relational data (the same happens with relational versions of the partitioning methods HCM and FCM). RPCM may be cheaper only when there are a large number of features in X. Nevertheless, the obvious application for relational data clustering algorithms is when only direct relational data is available (that is, when D is not derived from a matrix X, but expresses directly measured relations between objects). Another case of application is when the desired dissimilarity measure cannot be obtained by means of a norm-inducing matrix but is calculated from a matrix X by other means, including non-Euclidean dissimilarity measures. There are many applications where non-Euclidean

2021

relational data is generated (e.g. Gowda and Diday [33] derive relational data for clustering applications ranging from fat-oil to microcomputers). We provided in Section 6 an application example of microarray data where the desired dissimilarity measure was non-Euclidean, but other examples could be found. Concerning the application of RPCM on dissimilarity measures with an upper bound, it is important to say that a modiﬁcation of the possibilistic membership function was conceptually needed, regardless of the small inﬂuence on the clustering ability. Nevertheless, we believe the second contribution is by far more useful. Indeed, a good estimation of reference distance is crucial if one wants to be able to detect dense regions in noisy or complex data. The simulated and real examples presented show that the proposed reference distance initialisation strategy diminishes the PCM tendency to produce coincident or partially overlapped clusters. We believe it can be helpful in many possibilistic clustering applications, especially when there is a lack of cluster nature knowledge. In addition the proposed strategy applies both to object data and relational data versions of PCM. Despite the fact that PCM is based on the search of dense regions in the feature space, it is important to remember that it is also a prototype-based clustering algorithm. This means that summarising the cluster using an object with average properties makes sense. There exists another type of density-based clustering algorithms which can also be run on dissimilarity matrices, but whose cluster concept is based on cluster connectedness rather than on prototypes (e.g. DBSCAN [34]). This conceptual difference naturally drives the two types of algorithms to different applicative contexts. For example, if a clustering algorithm based on connectedness is applied to the clustering of genes in microarray data, the resulting clusters may include loosely correlated genes, thus changing the clustering application objective. Therefore, it is difﬁcult to compare the performance of this kind of algorithms with relational versions of PCM. Besides not having included comparisons with other clustering algorithms (apart from the starting FCM solutions) there exist some limitations in our approach that have to be acknowledged. With highly noisy data, under many values of m some clusters may not exhibit ﬁrst derivative minima or the possibilistic clustering algorithm may have too much mobility to detect them. In such situations, ﬁrst derivative minima may only appear when m is close to 1. However, setting m too close to 1 may result in many alternative overlapping clusters, due to the lack of mobility. Therefore, there exists an obvious compromise in the setting of the “fuzziﬁcation” parameter. Secondly, note that the proposed estimation approach assumes that cluster prototypes are correctly set. If large deviations from this assumption occur, the reference distance indicated by the minimum of the cluster standard deviation derivative will not be correct and some clusters can be still missed. This suggests that a successful possibilistic clustering algorithm could result if we integrate reference distance estimation as a step of OPCM,

2022

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

Table 3 Fuzzy intersection matrices between RPCM clusters. In (a) reference distance estimation was done using Eq. (15); in (b) reference distance estimation was done using the proposed strategy. Diagonal values (in italics) correspond to cluster cardinalities. Top-level clusters are indicated with an asterisk ‘*’ (a) Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster

1* 2* 3* 4* 5 6 7 8 9 10

(b) Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster

1* 2* 3* 4* 5 6* 7 8 9* 10

Cl. 1*

Cl. 2*

Cl. 3*

Cl. 4*

Cl. 5

Cl. 6

Cl. 7

Cl. 8

Cl. 9

Cl. 10

55.1 0.1 0.0 0.0 41.0 50.2 0.0 0.0 0.0 0.0

51.3 0.5 0.0 0.0 0.0 0.5 0.3 0.0 0.3

89.1 16.1 0.0 0.0 88.3 64.7 11.0 72.0

68.2 0.0 0.0 15.8 6.3 58.2 8.9

41.0 41.0 0.0 0.0 0.0 0.0

50.2 0.0 0.0 0.0 0.0

88.5 64.7 10.7 72.1

64.8 3.9 64.8

59.0 5.6

72.2

Cl.1*

Cl.2*

Cl.3*

Cl.4*

Cl.5

Cl.6*

Cl.7

Cl.8

Cl.9*

Cl.10

27.8 0.0 0.0 0.0 2.5 3.0 0.0 0.0 0.0 0.0

40.9 0.1 0.0 0.0 0.0 0.1 0.1 0.0 0.1

79.7 0.0 0.0 0.0 62.5 74.7 0.6 63.1

17.7 0.0 0.0 0.0 0.0 2.1 0.0

46.4 46.4 0.0 0.0 0.0 0.0

51.8 0.0 0.0 0.0 0.0

62.6 62.5 0.2 62.5

74.8 0.4 63.1

22.0 0.2

63.1

Table 4 Confusion matrix between the original ten clusters reported in [28] (A–J) and the six distinct clusters of the second RPCM run (after defuzziﬁcation). Percentages indicate the proportion of the genes in the original cluster classiﬁed into the corresponding PCM cluster. Cluster cardinalities and average silhouette values are also indicated Clusters

A

B

C

D

E

F

G

H

I

J

99

144

34

43

7

34

14

63

19

25

Av. Silh.

0.02

0.00

0.22

0.37

0.31

0.40

0.35

0.24

−0.16

−0.03

0.62 0.75 0.31 0.64 0.54 0.37

0 0 15 (15%) 6 (6%) 0 31 (31%)

0 9 (6%) 86 (60%) 0 0 4 (3%)

0 0 0 15 (44%) 0 7 (21%)

0 37 (86%) 2 (5%) 0 0 0

0 0 1 (14%) 0 0 0

24 (71%) 0 0 0 2 (6%) 0

0 5 (36%) 0 0 2 (14%) 0

2 (3%) 0 0 0 54 (86%) 0

3 (16%) 0 0 0 2 (11%) 0

1 (4%) 0 0 0 0 0

% class.

52 (53%)

99 (69%)

22 (65%)

39 (91%)

1 (14%)

26 (76%)

7 (50%)

56 (89%)

5 (26%)

1 (4%)

Card.

Cluster Cluster Cluster Cluster Cluster Cluster

1 2 3 4 6 9

33 53 104 22 60 43

or RPCM. Unfortunately, the optimisation objective function of those algorithms is assured only under constant i values, so we believe that future research should be done in this direction.

8. Summary Clustering is a popular approach to study the structure of multivariate data. Prototype-based clustering algorithms such as hard c-means (HCM), fuzzy c-means (FCM) and possibilistic c-means (PCM) allow the deﬁnition of algorithm duals based on dissimilarity matrices. We have de-

scribed the relational counterpart of PCM algorithm, called relational PCM (or RPCM). RPCM has been improved in two ways to better handle arbitrary dissimilarity data. First, a re-scaling of PCM membership function has been proposed in order to obtain zero membership values when the distance to prototype equals the maximum value of bounded dissimilarity measures. Second, a heuristic method for the initialisation of the PCM reference distance has been provided based on the search of cluster standard deviation partial derivative minima. The proposed strategy has been tested on two synthetic data sets, containing different cluster sizes and background noise, and on a real microarray data set containing gene expression proﬁles. Both simulated

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

and real data results show that our strategy produces better estimates of cluster prototypes and diminishes the known PCM tendency of producing coincident or overlapping clusters. Acknowledgements This research was funded by a pre-doctoral fellowship from the “Departament d’Universitats, Recerca i Societat de la Informació de la Generalitat de Catalunya” (2001 FI 00269), and with the assistance of the Spanish “Ministerio de Educación y Ciencia” (MTM2004-00440, CUR: 2001SGR0006). The authors would like to thank Sergi Vives, Ph.D., from the Statistics Department, for his invaluable help in the example of application with microarray data. References [1] J. MacQueen, Some methods for classiﬁcation and analysis of multivariate observation, in: L.M. LeCam, J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Math. Stat. Prob., University of California Press, Berkeley, Los Angeles, 1967, pp. 281–297. [2] G.H. Ball, D.J. Hall, A clustering technique for summarizing multivariate data, Behav. Sci. 12 (1967) 153–155. [3] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Functions, Plenum Press, New York, 1981. [4] Y. Oshashi, Fuzzy clustering and robust estimation, in: Ninth Meet, SAS Users Grp. Int., Hollywood Beach, FL, 1984. [5] J.J. De Gruijter, A.B. McBratney, A modiﬁed fuzzy K-means method for predictive classiﬁcation, in: H.H. Bock (Ed.), Classiﬁcation and Related Methods of Data Analysis, Elsevier, Amsterdam, The Netherlands, 1988. [6] R.N. Davé, Characterisation and detection of noise in clustering, Pattern Recognition Lett. 12 (1991) 657–664. [7] R. Krishnapuram, J.M. Keller, A possibilistic approach to clustering, IEEE Trans. Fuzzy Systems 1 (1993) 98–110. [8] H. Frigui, R. Krishnapuram, A robust algorithm for automatic extraction of an unknown number of clusters from noisy data, Pattern Recognition Lett. 17 (1996) 1223–1232. [9] R.N. Davé, R. Krishnapuram, Robust clustering methods: a uniﬁed view, IEEE Trans. Fuzzy Systems 5 (1997) 270–293. [10] M. Barni, V. Cappellini, A. Mecocci, Comments on a possibilistic approach to clustering, IEEE Trans. Fuzzy Systems 4 (1996) 393–396. [11] N.R. Pal, K. Pal, J.C. Bezdek, A mixed c-means clustering model, in: Proceedings of the Sixth IEEE International Conference on Fuzzy Systems, 1997, pp. 11–21. [12] H. Timm, C. Borgelt, C. Döring, R. Kruse, An extension to possibilistic fuzzy cluster analysis, Fuzzy Sets and Systems 147 (2004) 3–16. [13] J.C. Gower, Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika 53 (1966) 325–338. [14] H. Späth, Cluster Analysis Algorithms, Ellis Horwood Ltd., Chichester, England, 1980.

2023

[15] W. DeSarbo, J.D. Carroll, L.A. Clark, P.E. Green, Synthesized clustering: a method for amalgamating alternative clustering bases with differential weighting of variables, Psychometrika 49 (1984) 57–78. [16] C.M. Cuadras, J. Fortiana, A continuous metric scaling solution for a random variable, J. Multivar. Anal. 52 (1995) 1–14. [17] C.M. Cuadras, J. Fortiana, F. Oliva, The proximity of an individual to a population with applications in discriminant analysis, J. Class. 14 (1997) 117–136. [18] R.J. Hathaway, J.W. Davenport, J.C. Bezdek, Relational duals of the c-means clustering algorithms, Pattern Recognition 22 (1989) 205–212. [19] R.J. Hathaway, J.C. Bezdek, NERF c-means: non-Euclidean relational fuzzy clustering, Pattern Recognition 27 (1994) 429–437. [20] R.J. Hathaway, J.C. Bezdek, J.W. Davenport, On relational data versions of c-means algorithm, Pattern Recognition Lett. 17 (1996) 607–612. [21] R. Krishnapuram, J.M. Keller, The possibilistic c-means algorithm: insights and recommendations, IEEE Trans. Fuzzy Systems 4 (1996) 385–393. [22] M. Barni, R. Gualtieri, A new possibilistic clustering algorithm for line detection in real world imagery, Pattern Recognition 32 (1999) 1897–1909. [23] M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science 270 (1995) 467–470. [24] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. (1998) 14863–14868. [25] A.A. Alizadeh, M.B. Eisen, et al., Distinct types of diffuse large Bcell lymphoma identiﬁed by gene expression proﬁling, Nature 403 (2000) 503–511. [26] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. 96 (1999) 6745–6750. [27] S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho, G.M. Church, Systematic determination of genetic network architecture, Nature Gen. 22 (1999) 281–285. [28] V.R. Iyer, M.B. Eisen, D.T. Ross, G. Schuler, T. Moore, J.C.F. Lee, J.M. Trent, L.M. Staudt, J. Hudson Jr., M.S. Boguski, D. Lashkari, D. Shalon, D. Botstein, P.O. Brown, The transcriptional program in the response of human ﬁbroblasts to serum, Science (1999) 83–87. [29] D. Dembélé, P. Kastner, Fuzzy C-means method for clustering microarray data, Bioinformatics 19 (2003) 973–980. [30] R.M. Simon, E.L. Korn, L.M. McShane, M.D. Radmacher, G.W. Wright, Y. Zao, Design and Analysis of DNA Microarray Investigations, Springer, New York, 2003. [31] L. Hubert, P. Arabie, Comparing partitions. J. Class. 2 (1985) 193–218. [32] P.J. Rousseuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65. [33] K.C. Gowda, E. Diday, Symbolic clustering using a new similarity measure, IEEE Trans. Systems Man Cybernet. 22 (1992) 368–378. [34] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, 1996, pp. 226–231.

About the Author—MIQUEL DE CÁCERES received degree in Biology in the University of Barcelona in 1998. He has ﬁnished a Ph.D. applying multivariate statistics on vegetation data, at the Plant Biology Department. He is now an associate lecturer in the Department of Statistics. His research interests include multivariate statistics, clustering applications, numerical ecology and computational chemistry.

2024

M. De Cáceres et al. / Pattern Recognition 39 (2006) 2010 – 2024

About the Author—FRANCESC OLIVA received his Ph.D. degree from the Department of Statistics, University of Barcelona, Spain in 1995. Part of his Ph.D. work focused on the development of distance-based methods in discriminant analysis. He is now a professor in the Department of Statistics, University of Barcelona. His research interests include classiﬁcation methods and their application to pattern recognition and data mining techniques in the ﬁelds of ecology, brain-computer interfaces and marketing. About the Author—XAVIER FONT obtained his Ph.D. degree in 1989 from the Plant Biology Department, University of Barcelona, Spain. He is at present professor in the Plant Biology Department, University of Barcelona. His research interests include biodiversity databases, plant biogeography, phytosociology and numerical ecology.

Relational Clustering by Symmetric Convex ... - Research at Google