IJRIT International Journal of Research in Information Technology, Volume 2, Issue 2, February 2014, Pg: 11-13

International Journal of Research in Information Technology (IJRIT) www.ijrit.com

ISSN 2001-5569

An Efficient Approach for Subspace Clustering By Using Cat Seeker G.Rajasekar#1, T.Aravind*2 #

Research scholar, Department of Computer Science and Engineering, Muthayammal Engineering College, Namakkal * Assistant Professor, Department of Computer Science and Engineering, Muthayammal Engineering College, Namakkal 1

[email protected] , [email protected] Abstract

Subspace clustering solves many clustering problems in which require the mining of actionable subspaces identified by objects and attributes at same time. Subspaces are used to find the profitable ideas to decision makers. Subspace clustering use CAT seeker algorithm to find most profitable object from three dimensional databases in form objectattribute-time like financial or biological database based on their centroid values. Furthermore, we propose a novel subspace clustering algorithm known as optimal centroid. This paper extend the CAT seeker algorithm with optimal centroid value in order to reduce time consuming and allows to get more profitable objects in database. The optimal centroid allows user to move the centroids based on their profitable objects in the database. Keywords- Clustering, Data mining, Optimal centroid, Subspace clustering

1. INTRODUCTION Past few year variety of databases can saves information like financial and biological data’s. This having set of large data sets for maintaining the accounting and documents for user purposes. The way of collecting and extracting user required data information from set of database. This extraction process is known as data mining. Clustering plays an important role in data mining. A lot of work has been done in the area of clustering [1]. Clustering is aims to get similar type of data (objects) from large set of database. This is a technique of grouping attributes together that share similar type of values. It can use large number of variables but not in limit.

G.Rajasekar, IJRIT

11

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 2, February 2014, Pg: 11-13

Fig 1: Clustering In simply Clustering is the process of grouping physical or abstract objects into classes of similar objects [2]. The quality of a clustering result also depends on both the similarity measure used by the method and its implementation. Clustering typically assumes that each instance is given a “hard” assignment to exactly one cluster. APPLICATION Clustering is a difficult problem combinatorial, and differences in assumptions and contexts in different communities have made the transfer of useful generic concepts and methodologies. Clustering also having some important applications [3] like image segmentation, object recognition and information retrieval. In following we describe some main applications of clustering in data mining. • WWW: Document classification, Cluster Weblog data to discover groups of similar access patterns. • Economic Science: particularly market investigation. • Natural language processing: linguistic analysis, parsing, learning languages, hyphenation patterns. • Image recognition and processing: segmentation, object recognition, texture recognition. • Signal processing: adaptive filters, real-time signal analysis, radar, sonar seismic, USG, EKG, EEG and other medical signals. • Optimization: configuration of telephone connections, VLSI design, time series prediction, scheduling algorithms.

2. RELATED WORK The problems of helpfulness and usability of subspace clusters are very important issue in subspace clustering [4]. Subspace clustering is a division of clustering algorithm that is talented to find low dimensional clusters in very high dimensional datasets. This approach is used to clustering allows our system to find groups of users who share a regular interest in a particular field or sub-filed regardless of differences in other fields. In high dimensional datasets, the number of potential subspaces is enormous (huge). For example, if there are ‘N dimensions’ in the data means, the number of possible subspaces is ‘2N’ [5]. In this paper, we recognize real-world problems, which encourage the need to introduce subspace clustering with actionability and users domain knowledge via centroids. This paper particularly used to compare and find datasets in Marketing, Land use, Insurance, City-planning and many others.

3. PROBLEM STATEMENT In pattern-based subspace clustering, the values in the subspace clusters satisfy some distance or similarity based functions, and these functions normally require some thresholds to set. This normal process is required fixed centroids. In existing approach we used fixed centroids to handle and prune the datasets. The subspace clustering problem using fixed centroids, but which the sensitivity problem of thresholds is mitigated as the clustering results is not sensitive to the optimization parameter. Then focus on subspace clustering on two dimensional dataset, and thus is not suitable for subspace clustering on three dimensional dataset. So here we used 3D subspace clustering algorithms CAT Seeker with fixed centroids used to mine CATSs (Centroid-based Actionable 3D Subspace clusters) subspace. This uses three-dimensional (3D) datasets, in the form of object-attribute-time. For example, the ‘stock-ratio-year’ data in the financial domain. The fixed centroids is one type of homogeneous model i.e., same type of data will be managed and return. Here we have known full domain knowledge. So if we have only some knowledge of domain means we cannot get proper outcome or result. This algorithm focuses only on separate group of data. So fixed centroids is focus on partitioning of objects into separate groups to maintain the dataset information. The main drawback is if the object can be in multiple groups’ means it cannot maintain properly.

G.Rajasekar, IJRIT

12

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 2, February 2014, Pg: 11-13

4. PROPOSED WORK We proposed a new algorithm called CAT Seeker with optimal centroids for handling the multiple groups at same time. This optimal technique can works on heterogeneous model i.e., this compares the multiple groups of datasets and provide the appropriate result. In case we known only some knowledge means it’s enough to find out the results. CATSeeker uses SVD to prune the search space for using the SVDpruning algorithm to detect high homogeneous values. CATS allowed incorporating their domain knowledge, by selecting their preferred objects as centroids of the actionable subspace clusters. To denote such clusters as centroid-based, actionable 3D subspace clusters (CATSs) and also denotes utility as a function measuring the profits or benefits of the objects. GS-search [7] and MASC [8] ‘flatten’ the continuous valued 3D dataset values into a dataset with having the single timestamp. They require the clusters to arise in each timestamp, and it is hard to find out clusters in dataset that has a bulky number of timestamps. CATSeeker, TRICLUSTER [9] and MIC [10] have the concept of subspace in all the three dimensions that is they mine 3D subspace clusters that are subsets of attributes and subsets of timestamps.

5. CONCLUSION AND FUTURE WORK Subspace clustering with optimal centorid will improve their efficiency and reduce time consuming process. It allows incorporating domain knowledge with a sensitive way. The CAT seeker algorithm will support for three dimensional databases only. In future, we research a new algorithm for support four dimensional datasets object-attribute-time-place in data mining.

REFERENCES [1] Karin Kailing, Hans-Peter Kriegel and Peer Kroger, “Density Connected Subspace Clustering for High dimensional Data,” SIAM, pp. 246-256. [2] Jerzy Stefanowski, “Data Mining – Clustering,” Institute of Computing Sciences, Poznan University of Technology, Poznan, Poland, Lecture 7, SE Master Course, 2008/2009. [3] Jain A.K, Murty M.N and Flynn P.J, “Data Clustering - A Review,” Michigan State University, 2008. [4] Kriegel H.P, Kroger P and Zimek A, “Clustering high dimensional data: A survey on subspace clustering, pattern based clustering, and correlation clustering,” ACM Transaction Knowledge Disc Data, 3, pp. 1–58, 2009. [5] Nitin Agarwal, Ehtesham Haque, Huan Liu and Lance Parsons, “A Subspace Clustering Framework for Research Group Collaboration,” Department of Computer Science Engineering, Arizona State University, Tempe, AZ 85281. [6] Kelvin Sim, Ghim-Eng Yap, David R. Hardoon, Vivekanand Gopal krishnan, Gao Cong, and Suryani Lukman “Centroid-based Actionable 3D Subspace Clustering”, IEEE trans on knowl and data engineering, vol. 25, I-6, June 2013. [7] D. Jiang, J. Pei, M. Ramanathan, C. Tang, and A. Zhang. Mining coherent gene clusters from gene-sample-time microarray data. In KDD, pp. 430–439, 2004. [8] K. Sim, A. K. Poernomo, and V. Gopalkrishnan. “Mining actionable subspace clusters in sequential data,” In SDM, pp. 442–453, 2010. [9] L. Zhao and M.J. Zaki. “TRICLUSTER: An effective algorithm for mining coherent clusters in 3D microarray data,” In SIGMOD, pp. 694–705, 2005. [10] K. Sim, Z. Aung, and V. Gopakrishnan, “Discovering correlated subspace clusters in 3D continuous_valued data,” In ICDM, pp. 471–480, 2010.

G.Rajasekar, IJRIT

13

An Efficient Approach for Subspace Clustering By ...

Optimization: configuration of telephone connections, VLSI design, time series ... The CAT seeker algorithm will support for three dimensional databases only.

87KB Sizes 0 Downloads 262 Views

Recommend Documents

Groupwise Constrained Reconstruction for Subspace Clustering
50. 100. 150. 200. 250. Number of Subspaces (Persons). l.h.s.. r.h.s. difference .... an illustration). ..... taining 2 subspaces, each of which contains 50 samples.

Groupwise Constrained Reconstruction for Subspace Clustering - ICML
k=1 dim(Sk). (1). Unfortunately, this assumption will be violated if there exist bases shared among the subspaces. For example, given three orthogonal bases, b1 ...

Groupwise Constrained Reconstruction for Subspace Clustering
The objective of the reconstruction based subspace clustering is to .... Kanade (1998); Kanatani (2001) approximate the data matrix with the ... Analysis (GPCA) (Vidal et al., 2005) fits the samples .... wji and wij could be either small or big.

Groupwise Constrained Reconstruction for Subspace Clustering - ICML
dal, 2009; Liu et al., 2010; Wang et al., 2011). In this paper, we focus .... 2010), Robust Algebraic Segmentation (RAS) is pro- posed to handle the .... fi = det(Ci)− 1. 2 (xi C−1 i xi + νλ). − D+ν. 2. Ci = Hzi − αHxixi. Hk = ∑ j|zj =k

An Efficient Algorithm for Clustering Categorical Data
the Cluster in CS in main memory, we write the Cluster identifier of each tuple back to the file ..... algorithm is used to partition the items such that the sum of weights of ... STIRR, an iterative algorithm based on non-linear dynamical systems, .

Efficient Subspace Segmentation via Quadratic ...
tition data drawn from multiple subspaces into multiple clus- ters. ... clustering (SCC) and low-rank representation (LRR), SSQP ...... Visual Motion, 179 –186.

Subspace Clustering with a Twist - Microsoft
of computer vision tasks such as image representation and compression, motion ..... reconstruction error. Of course all of this is predicated on our ability to actu-.

Probabilistic Low-Rank Subspace Clustering
recent results in VB matrix factorization leading to fast and effective estimation. ...... edges the Beckman Institute Postdoctoral Fellowship. SN thanks the support ...

Efficient Subspace Segmentation via Quadratic ...
Abstract. We explore in this paper efficient algorithmic solutions to ro- bust subspace ..... Call Algorithm 1 to solve Problem (3), return Z,. 2. Adjacency matrix: W ..... Conference on Computer Vision and Pattern Recognition. Tron, R., and Vidal, .

Conscience online learning: an efficient approach for ... - Springer Link
May 24, 2011 - as computer science, medical science, social science, and economics ...... ics in 2008 and M.Sc. degree in computer science in 2010 from Sun.

An efficient direct differentiation approach for sensitivity ...
tem and requires minimum data storage. The use of ... multirate integrations methods, and a large amount of data (the complete state of the system at each ...

A survey on enhanced subspace clustering
clustering accounts the full data space to partition objects based on their similarity. ... isfy the definition of the cluster, significant subspace clusters that are ...

A Comparison of Scalability on Subspace Clustering ...
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 3, March ... 2Associate Professor, PSNA College of Engineering & Technology, Dindigul, Tamilnadu, India, .... choosing the best non-redundant clustering.

Distance Based Subspace Clustering with Flexible ...
Instead of using a grid based approach to partition the data ... In many applications, users are .... mining only δ-nClusters containing at least mr objects and.

A survey on enhanced subspace clustering
Feb 6, 2012 - spective, and important applications of traditional clustering are also given. ... (2006) presents the clustering algorithms from a data mining ...

Centroid-based Actionable 3D Subspace Clustering
is worth investing, e.g. Apple) as a centroid. The shaded region shows a cluster .... We denote such clusters as centroid-based, actionable 3D subspace clusters ...

Distributive Energy Efficient Adaptive Clustering Protocol for Wireless ...
Indian Institute of Technology, Kharagpur, India ... solutions to some of the conventional wireless ... routing protocol for wireless sensor networks, which uses a ...

Multilevel Clustering Approach Using an Energy ...
IJRIT International Journal of Research in Information Technology, Volume 1, ... A Wireless Sensor Network (WSN) consists of a large number of tiny nodes with ...

An Approach to Data Mining: Clustering
analysis. Data mining uses sophisticated mathematical algorithms to segment ... It is a main task of exploratory data mining, and a common technique for statistical ... Let us apply the k-Means clustering algorithm to the same example as in the ...

An approach for high security by using efficient sha- 176
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 7, ... 2Professor, Computer Science & Engineering, SSSIST Sehore, Madhya .... Table 4.1: Bits wise comparison between Existing SHA-160, SHA-192 and ...

Efficient k-Anonymization using Clustering Techniques
ferred to as micro-data. requirements of data. A recent approach addressing data privacy relies on the notion of k-anonymity [26, 30]. In this approach, data pri- vacy is guaranteed by ensuring .... types of attributes: publicly known attributes (i.e

An Efficient Direct Approach to Visual SLAM
homogeneous transformation matrix T ∈ SE(3). A. Lie Algebra se(3) and the Lie Group SE(3). Let Ai, i = 1, 2,..., 6, be the canonical basis of the Lie algebra se(3) ...

An Efficient Direct Approach to Visual SLAM
data association is error-prone, care must be taken in order to .... Other systems recover and de- compose the ..... where card(·) denotes the cardinality of the set.