Density Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

Download Full-text

A novel algorithm for fast and scalable subspace clustering of high-dimensional data

Journal Of Big Data ◽

10.1186/s40537-015-0027-y ◽

2015 ◽

Vol 2 (1) ◽

Cited By ~ 16

Author(s):

Amardeep Kaur ◽

Amitava Datta

Keyword(s):

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Novel Algorithm

Download Full-text

Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2020.3023973 ◽

2020 ◽

pp. 1-14

Author(s):

Jinping Sui ◽

Zhen Liu ◽

Li Liu ◽

Alexander Jung ◽

Xiang Li

Keyword(s):

Data Streams ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Sparse Subspace Clustering

Download Full-text

A feature group weighting method for subspace clustering of high-dimensional data

Pattern Recognition ◽

10.1016/j.patcog.2011.06.004 ◽

2012 ◽

Vol 45 (1) ◽

pp. 434-446 ◽

Cited By ~ 83

Author(s):

Xiaojun Chen ◽

Yunming Ye ◽

Xiaofei Xu ◽

Joshua Zhexue Huang

Keyword(s):

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Weighting Method ◽

Feature Group

Download Full-text

RMSC: Robust modeling of subspace clustering for high dimensional data

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2017.8126059 ◽

2017 ◽

Author(s):

K. R. Radhika ◽

C. N. Pushpa ◽

J. Thriveni ◽

K. R. Venugopal

Keyword(s):

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Robust Modeling

Download Full-text

A New Approach for Subspace Clustering of High Dimensional Data

International Journal of Computer Science and Application ◽

10.14355/ijcsa.2014.0302.02 ◽

2014 ◽

Vol 3 (2) ◽

pp. 74

Author(s):

M. Suguna ◽

S. Palaniammal

Keyword(s):

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

New Approach

Download Full-text

Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications

Advanced Science Engineering and Medicine ◽

10.1166/asem.2016.1915 ◽

2016 ◽

Vol 8 (9) ◽

pp. 749-757

Author(s):

Ali Baghernia ◽

Hamid Pavin ◽

Miresmail Mirnabibaboli ◽

Hamid Alinejad-Rokny

Keyword(s):

Data Stream ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Projected Clustering

Download Full-text

Automatic subspace clustering of high dimensional data for data mining applications

ACM SIGMOD Record ◽

10.1145/276305.276314 ◽

1998 ◽

Vol 27 (2) ◽

pp. 94-105 ◽

Cited By ~ 378

Author(s):

Rakesh Agrawal ◽

Johannes Gehrke ◽

Dimitrios Gunopulos ◽

Prabhakar Raghavan

Keyword(s):

Data Mining ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional

Download Full-text

A Preview on Subspace Clustering of High Dimensional Data

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v6i3.4466 ◽

2013 ◽

Vol 6 (3) ◽

pp. 441-448 ◽

Cited By ~ 1

Author(s):

Sajid Nagi ◽

Dhruba Kumar Bhattacharyya ◽

Jugal K. Kalita

Keyword(s):

Search Strategy ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Expression Data ◽

Clustering Methods ◽

Top Down ◽

Data Points ◽

Low Dimensional ◽

Entire Dataset

When clustering high dimensional data, traditional clustering methods are found to be lacking since they consider all of the dimensions of the dataset in discovering clusters whereas only some of the dimensions are relevant. This may give rise to subspaces within the dataset where clusters may be found. Using feature selection, we can remove irrelevant and redundant dimensions by analyzing the entire dataset. The problem of automatically identifying clusters that exist in multiple and maybe overlapping subspaces of high dimensional data, allowing better clustering of the data points, is known as Subspace Clustering. There are two major approaches to subspace clustering based on search strategy. Top-down algorithms find an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, iteratively improving the results. Bottom-up approaches start from finding low dimensional dense regions, and then use them to form clusters. Based on a survey on subspace clustering, we identify the challenges and issues involved with clustering gene expression data.

Download Full-text