Clustering High Dimensional Data Using Subspace and Projected Clustering Algorithms

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

Download Full-text

Projected Clustering for Biological Data Analysis

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch247 ◽

2011 ◽

pp. 1617-1622

Author(s):

Ping Deng ◽

Qingkai Ma ◽

Weili Wu

Keyword(s):

Nearest Neighbor ◽

Dimensional Space ◽

Clustering Algorithms ◽

Biological Data ◽

High Dimensional ◽

Projected Clustering ◽

Cluster Data ◽

Biological Data Analysis ◽

Data Points ◽

Entire Dataset

Clustering can be considered as the most important unsupervised learning problem. It has been discussed thoroughly by both statistics and database communities due to its numerous applications in problems such as classification, machine learning, and data mining. A summary of clustering techniques can be found in (Berkhin, 2002). Most known clustering algorithms such as DBSCAN (Easter, Kriegel, Sander, & Xu, 1996) and CURE (Guha, Rastogi, & Shim, 1998) cluster data points based on full dimensions. When the dimensional space grows higher, the above algorithms lose their efficiency and accuracy because of the so-called “curse of dimensionality”. It is shown in (Beyer, Goldstein, Ramakrishnan, & Shaft, 1999) that computing the distance based on full dimensions is not meaningful in high dimensional space since the distance of a point to its nearest neighbor approaches the distance to its farthest neighbor as dimensionality increases. Actually, natural clusters might exist in subspaces. Data points in different clusters may be correlated with respect to different subsets of dimensions. In order to solve this problem, feature selection (Kohavi & Sommerfield, 1995) and dimension reduction (Raymer, Punch, Goodman, Kuhn, & Jain, 2000) have been proposed to find the closely correlated dimensions for all the data and the clusters in such dimensions. Although both methods reduce the dimensionality of the space before clustering, the case where clusters may exist in different subspaces of full dimensions is not handled well. Projected clustering has been proposed recently to effectively deal with high dimensionalities. Finding clusters and their relevant dimensions are the objectives of projected clustering algorithms. Instead of projecting the entire dataset on the same subspace, projected clustering focuses on finding specific projection for each cluster such that the similarity is reserved as much as possible.

Download Full-text

Projected Clustering with LASSO for High Dimensional Data Analysis

Advances in Intelligent Systems and Computing - Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014 ◽

10.1007/978-3-319-11933-5_23 ◽

2015 ◽

pp. 201-209

Author(s):

Lidiya Narayanan ◽

Anoop S. Babu ◽

M. R. Kaimal

Keyword(s):

Data Analysis ◽

High Dimensional Data ◽

High Dimensional ◽

Projected Clustering ◽

High Dimensional Data Analysis

Download Full-text

Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications

Advanced Science Engineering and Medicine ◽

10.1166/asem.2016.1915 ◽

2016 ◽

Vol 8 (9) ◽

pp. 749-757

Author(s):

Ali Baghernia ◽

Hamid Pavin ◽

Miresmail Mirnabibaboli ◽

Hamid Alinejad-Rokny

Keyword(s):

Data Stream ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Projected Clustering

Download Full-text

Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data

International Journal of Computer Applications ◽

10.5120/ijca2015906144 ◽

2015 ◽

Vol 125 (11) ◽

pp. 35-40

Author(s):

Smita Chormunge ◽

Sudarson Jena

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Efficiency And Effectiveness

Download Full-text

M-Denclue for Effective Data Clustering in High Dimensional Non-Linear Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9109.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2925-2927

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Research Work ◽

Curse Of Dimensionality ◽

Distance Measures ◽

High Dimensional ◽

Clustering Methods ◽

Non Linear ◽

Low Dimensional ◽

Automatic Grouping

Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Clustering in high-dimensional spaces is a recurrent problem in many domains. It affects time complexity, space complexity, scalability and accuracy of clustering methods. Highdimensional non-linear datausually live in different low dimensional subspaces hidden in the original space. As high‐dimensional objects appear almost alike, new approaches for clustering are required. This research has focused on developing Mathematical models, techniques and clustering algorithms specifically for high‐dimensional data. The innocent growth in the fields of communication and technology, there is tremendous growth in high dimensional data spaces. As the variant of dimensions on high dimensional non-linear data increases, many clustering techniques begin to suffer from the curse of dimensionality, de-grading the quality of the results. In high dimensional non-linear data, the data becomes very sparse and distance measures become increasingly meaningless. The principal challenge for clustering high dimensional data is to overcome the “curse of dimensionality”. This research work concentrates on devising an enhanced algorithm for clustering high dimensional non-linear data.

Download Full-text

Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms

International Journal of Computer Applications ◽

10.5120/10584-5732 ◽

2013 ◽

Vol 63 (20) ◽

pp. 29-35 ◽

Cited By ~ 1

Author(s):

Sunita Jahirabadkar ◽

Parag Kulkarni

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Data Density

Download Full-text

A Framework for Projected Clustering of High Dimensional Data Streams

Proceedings 2004 VLDB Conference ◽

10.1016/b978-012088469-8.50075-9 ◽

2004 ◽

pp. 852-863 ◽

Cited By ~ 205

Author(s):

Charu C. Aggarwal ◽

Jiawei Han ◽

Jianyong Wang ◽

Philip S. Yu

Keyword(s):

Data Streams ◽

High Dimensional Data ◽

High Dimensional ◽

Projected Clustering

Download Full-text

Generalized Projected Clustering in High-Dimensional Data Streams

Frontiers of WWW Research and Development - APWeb 2006 - Lecture Notes in Computer Science ◽

10.1007/11610113_72 ◽

2006 ◽

pp. 772-778 ◽

Cited By ~ 2

Author(s):

Ting Wang

Keyword(s):

Data Streams ◽

High Dimensional Data ◽

High Dimensional ◽

Projected Clustering

Download Full-text

Mining of high dimensional data using enhanced clustering approach

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12384 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 291

Author(s):

S Sivakumar ◽

Kumar Narayanan ◽

Swaraj Paul Chinnaraju ◽

Senthil Kumar Janahan

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional ◽

Projected Clustering ◽

Low Dimensionality ◽

Clustering Approach ◽

Numerous Data ◽

Dimension Space

Extraction of useful data from a set is known as Data mining. Clustering has top information mining process it supposed to help an individual, divide and recognize numerous data from records inside group consistent with positive similarity measure. Clustering excessive dimensional data has been a chief undertaking. Maximum present clustering algorithms have been inefficient if desired similarity is computed among statistics factors inside the complete dimensional space. Varieties of projected clustering algorithms were counseled for addressing those problems. However many of them face problems whilst clusters conceal in some space with low dimensionality. These worrying situations inspire our system to endorse a look at partitional distance primarily based projected clustering set of rules. The aimed paintings is successfully deliberate for projects clusters in excessive huge dimension space via adapting the stepped forward method in k Mediods set of pointers. The main goal for second one gadget is to take away outliers, at the same time as the 1/3 method will find clusters in numerous spaces. The (clustering) technique is based on the adequate Mediods set of guidelines, an excess distance managed to set of attributes everywhere values are dense.

Download Full-text