Fuzzy Clustering Bisen Vikrantsingh Mohansingh [MT2012036]

Contents 1. Introduction ................................................................................................... 3 1.1 Hard clustering.......................................................................................... 3 1.2 Soft clustering ........................................................................................... 3 2. Techniques ..................................................................................................... 4 2.1 Fuzzy C-Means ......................................................................................... 4 2.1.1 Algorithm .......................................................................................... 5 2.1.3 Pseudo code ........................................................................................ 5 2.1.2 Advantages ......................................................................................... 7 2.1.3 Limitations .......................................................................................... 7 2.1.4 Example ............................................................................................. 7 References ......................................................................................................... 8

1. Introduction Clustering is a process of classifying the given data objects as exclusive subsets (clusters). It means we can discriminate clearly whether an object belongs to a cluster or not. In real applications there is very often no sharp boundary between clusters. So a fuzzy clustering method is used to construct clusters with uncertain boundaries and that will allows one object belongs to overlapping clusters with some membership degree. That is, the property of fuzzy clustering is to consider the belonging status to the clusters, along with the degrees to which that object belong to the cluster. 1.1 Hard clustering In hard clustering, each example is placed definitively in a class. The class is then used to predict the feature values of the example. Hard clustering methods are based on classical set theory, and require that an object either does or does not belong to a cluster. Hard clustering means partitioning the data into a specified number of mutually exclusive subsets. 1.2 Soft clustering Fuzzy clustering methods also knows as soft clustering, It allow the objects to belong to several clusters at the same time, with different degrees of membership. In many cases, fuzzy clustering is more natural than hard clustering. Objects on the boundaries between several classes are not forced to fully fit into one of the classes, but rather are assigned membership degrees between 0 and 1 indicating their partial membership. The discrete nature of the hard partitioning also causes difficulties with algorithms based on analytic functionals, since these functionals are not differentiable.

2. Techniques 2.1 Fuzzy C-Means In this algorithm we assign membership to each data point corresponding to each cluster center, on the basis of distance between the cluster center and the data point. More the data is near to the cluster center more is its membership towards the particular cluster center. Thus, summation of membership of all the data point should be equal to one. After each iterations membership and cluster centers are updated according to the formula:

where, 'n' is the number of data points. 'vj' represents the jth cluster enter. ‘m’ is the fuzziness index. 'm' is the fuzziness index m € [1,∞]. 'c' represents the number of cluster center. 'µij' represents the membership of ith data to jth cluster center. 'dij' represents the Euclidean distance between ithdata and jth cluster center. Main objective of fuzzy c-means algorithm is to minimize:

where, '||xi – vj||' is the Euclidean distance between ith data and jth cluster center.

2.1.1 Algorithm Let X = {x1, x2, x3 ..., xn} be the set of data points and V = {v1, v2, v3 ..., vc} be the set of centers. 1) Randomly select ‘c’ cluster centers. 2) Calculate the fuzzy membership 'µij' using:

3) Compute the fuzzy centers 'vj' using:

4) Repeat step 2) and 3) until the minimum 'J' value is achieved or ||U(k+1) - U(k)|| < β. where, ‘k’ is the iteration step. ‘β’ is the termination criterion between [0, 1]. ‘U = (µij)n*c’ is the fuzzy membership matrix. ‘J’ is the objective function.

2.1.3 Pseudo code // C is initial number of clusters, k is the iteration of fuzzy c-means, p is for the weight //Input initial number of clusters C, k, p ------------step 0: -------------//initialize weights of prototype for c = 0 to C-1 for q = 0 to Q-1 u[q,c] = random(); //standardize the initial weight over C

for q = 0 to Q-1 sum = 0.0; for c = 0 to C-1 sum = sum + u[q,c]; for c = 0 to C-1 u[q,c] = u[q,c] /sum; ***************************************** // starting fuzzy c-means loop I = 0 //------------step 1: -------------// standardize cluster weights over Q for c = 0 to C-1 min = 99999.0; max =0.0; for q = 0 to Q-1 if (u[q,c] > max) max = u[q,c]; if (u[q,c] < min) min = u[q,c]; sum = 0.0 for q = 0 to Q-1 sum = sum + (u[q,c] – min) /( max –min); for q = 0 to Q-1 u[q,c] = u[q,c]/sum; //------------step 2: -------------// compute new prototype center for c = 0 to C-1 for n = 0 to N-1 sum = 0.0; for q = 0 to Q-1 sum = sum + u[q,c] x[n,q]; u[n,c] = sum; //------------step 3: -------------// compute new weight for q = 0 to Q-1 sum = 0.0 for c = 0 to C-1 D[q,c] =0.0; for n = 0 to N-1 D[q,c] = D[q,c] + (x[n,q] – z[n,c])2

sum = sum + (1/(1 + D[q,c]))1/(p-1) ; for c = 0 to C-1 U[q,c] = (1/(1 + D[q,c]))1/(p-1) /sum; //------------step 4: -------------I = I + 1 If I < k Goto step 2; // end of fuzzy c-means loop

2.1.2 Advantages  It gives best result for overlapped data set and comparatively better than k-means algorithm.

Data points are assigned to cluster with degree of membership, no force fitting to particular category even if data point is on boundary.

2.1.3 Limitations  A priori specification of the number of clusters. The correct choice of

is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a data set and the desired clustering resolution of the user.

Better result with lower value of β but at the expense of more number of iteration.

Euclidean distance measures can unequally weight underlying factors.

2.1.4 Example  On below link you can find an interactive demo of Fuzzy C-Mean Algorithm. http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletFCM. html

References 1. Bezdek, J. C., Ehrlich, R., & FULL, W. (n.d.). The Fuzzy c-means Clustering Algorithm. 2. Raut, A. B., & Bamnote, G. R. (n.d.). Web Document Clustering Using Fuzzy Equivalence Relations. Journal of Emerging Trends in Computing and Information Sciences. 3. Fuzzy Clustering http://aerostudents.com/files/knowledgeBasedControlSystems/fuzzyClust ering.pdf 4. Matjaz Jursic, Nada Lavrac, Fuzzy Clustering of Document, Department of Knowledge Discovery, Jozef Stefan Institute. 5. Sonali A., P.R.Deshmukh, Categorization of Unstructured Web Data Using Fuzzy Clustering, International Journal of Emerging Technology and Advanced Engineering.

## Fuzzy Clustering

2.1 Fuzzy C-Means . ... It means we can discriminate clearly whether an object belongs to .... Sonali A., P.R.Deshmukh, Categorization of Unstructured Web Data.

#### Recommend Documents

Application of Fuzzy Clustering and Piezoelectric Chemical Sensor ...
quadratic distances between data points and cluster prototypes. ..... card was built and a computer program was developed to measure the frequency shift. .... recovery rate was the same for all analytes in the applied concentration range.

Evaluating Fuzzy Clustering for Relevance-based ...
meaningful groups [3]. Our motivation for using document clustering techniques is to enable ... III, the performance evaluation measures that have been used.

Simulated Annealing based Automatic Fuzzy Clustering ...
Department of Computer Science and Engineering .... it have higher membership degree to that cluster, and can be considered as they are clustered properly.

Modified Gath-Geva Fuzzy Clustering for Identification ...
product of the individual membership degrees and the rule's weight ... or by using different distance measures. The GK ..... chine-learning-databases/auto-mpg).

Towards Improving Fuzzy Clustering using Support ...
Apr 11, 2009 - Key words: Microarray gene expression data, fuzzy clustering, cluster validity indices .... by some visualization tools for expression data.

Clustering and Visualization of Fuzzy Communities In ...
Bezdek et al. [7-9] collected data from small groups of students in communications classes, and developed models based on reciprocal fuzzy relations that quantified notions such as distance to consensus. An idea that is gaining traction in social net

A Scalable Hierarchical Fuzzy Clustering Algorithm for ...
discover content relationships in e-Learning material based on document metadata ... is relevant to different domains to some degree. With fuzzy ... on the cosine similarity coefficient rather than on the Euclidean distance [11]. ..... Program, vol.

Fast and Robust Fuzzy C-Means Clustering Algorithms ...
Visually, FGFCM_S1 removes most of the noise, FGFCM_S2 and FGFCM ..... promote the clustering performs in present of mixed noise. MLC, a ... adaptive segmentation of MRI data using modified fuzzy C-means algorithm, in Proc. IEEE Int.

Towards Improving Fuzzy Clustering using Support ...
Apr 11, 2009 - expression levels of huge number of genes, hence produce large amount of data to handle. Due to its ...... from satistics toolbox for this purpose.

Fuzzy Grill m-Space and Induced Fuzzy Topology - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: .... Definition 3.13:-Let G be a fuzzy grill on fuzzy m-space.

Application of Fuzzy Logic Pressure lication of Fuzzy ...
JOURNAL OF COMPUTER SCIENCE AND ENGINEER .... Experimental data has been implemen ... The dynamic process data obtained via modelling or test-.

Fuzzy Grill m-Space and Induced Fuzzy Topology - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June ... Roy and Mukherjee [1] introduced an operator defined by grill on.

Web page clustering using Query Directed Clustering ...
IJRIT International Journal of Research in Information Technology, Volume 2, ... Ms. Priya S.Yadav1, Ms. Pranali G. Wadighare2,Ms.Sneha L. Pise3 , Ms. ... cluster quality guide, and a new method of improving clusters by ranking the pages by.

data clustering
Clustering is one of the most important techniques in data mining. ..... of data and more complex data, such as multimedia data, semi-structured/unstructured.

Fuzzy-KNN-Prediksii.pdf
of Secondary Structure of Proteins (DSSP) [17] where the secondary structure is. classified as eight states. By grouping these eights states into three classes Coil.