A Survey on Various Clustering Algorithms in High Dimensional Data

2019 ◽

pp. 47-74 ◽

Cited By ~ 1

Author(s):

Parul Agarwal ◽

Shikha Mehta

Keyword(s):

Differential Evolution ◽

Distance Measure ◽

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Dbscan Clustering ◽

Evolution Algorithms ◽

Self Adaptive

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

Download Full-text

Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data

International Journal of Computer Applications ◽

10.5120/ijca2015906144 ◽

2015 ◽

Vol 125 (11) ◽

pp. 35-40

Author(s):

Smita Chormunge ◽

Sudarson Jena

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Efficiency And Effectiveness

Download Full-text

M-Denclue for Effective Data Clustering in High Dimensional Non-Linear Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9109.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2925-2927

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Research Work ◽

Curse Of Dimensionality ◽

Distance Measures ◽

High Dimensional ◽

Clustering Methods ◽

Non Linear ◽

Low Dimensional ◽

Automatic Grouping

Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Clustering in high-dimensional spaces is a recurrent problem in many domains. It affects time complexity, space complexity, scalability and accuracy of clustering methods. Highdimensional non-linear datausually live in different low dimensional subspaces hidden in the original space. As high‐dimensional objects appear almost alike, new approaches for clustering are required. This research has focused on developing Mathematical models, techniques and clustering algorithms specifically for high‐dimensional data. The innocent growth in the fields of communication and technology, there is tremendous growth in high dimensional data spaces. As the variant of dimensions on high dimensional non-linear data increases, many clustering techniques begin to suffer from the curse of dimensionality, de-grading the quality of the results. In high dimensional non-linear data, the data becomes very sparse and distance measures become increasingly meaningless. The principal challenge for clustering high dimensional data is to overcome the “curse of dimensionality”. This research work concentrates on devising an enhanced algorithm for clustering high dimensional non-linear data.

Download Full-text

Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms

International Journal of Computer Applications ◽

10.5120/10584-5732 ◽

2013 ◽

Vol 63 (20) ◽

pp. 29-35 ◽

Cited By ~ 1

Author(s):

Sunita Jahirabadkar ◽

Parag Kulkarni

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Data Density

Download Full-text

Clustering High Dimensional Data Using Subspace and Projected Clustering Algorithms

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2010.2414 ◽

2010 ◽

Vol 2 (4) ◽

pp. 162-170 ◽

Cited By ~ 7

Author(s):

Rahmat Widia Sembiring ◽

Jasni Mohamad Zain ◽

Abdullah Embong

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Projected Clustering

Download Full-text

Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1108 ◽

2013 ◽

pp. 293-299

Author(s):

B.Hari Babu ◽

N.Subash Chandra ◽

T. Venu Gopal

Keyword(s):

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Microarray Gene Expression Data ◽

Distance Measures ◽

High Dimensional ◽

Data Mining Technique ◽

Microarray Gene Expression ◽

Redundancy Elimination ◽

Different Types

Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data.

Download Full-text

Subspace Clustering for High-Dimensional Data Using Cluster Structure Similarity

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2018070103 ◽

2018 ◽

Vol 14 (3) ◽

pp. 38-55 ◽

Cited By ~ 2

Author(s):

Kavan Fatehi ◽

Mohsen Rezvani ◽

Mansoor Fateh ◽

Mohammad-Reza Pajoohan

Keyword(s):

Similarity Measure ◽

State Of The Art ◽

Clustering Algorithms ◽

Cluster Structure ◽

High Dimensional Data ◽

Subspace Clustering ◽

The State ◽

High Dimensional ◽

Running Time ◽

Structure Similarity

This article describes how recently, because of the curse of dimensionality in high dimensional data, a significant amount of research has been conducted on subspace clustering aiming at discovering clusters embedded in any possible attributes combination. The main goal of subspace clustering algorithms is to find all clusters in all subspaces. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time of the algorithms. A bottom-up density-based approach is suggested in this article, in which the cluster structure serves as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Based on this idea, the algorithm discovers similar subspaces by considering similarity in their cluster structure, then combines them and the data in the new subspaces would be clustered again. Finally, the algorithm determines all the subspaces and also finds all clusters within them. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in quality and runtime than the state-of-the-art on clustering high-dimensional data.

Download Full-text

Urban green economic development indicators based on spatial clustering algorithm and blockchain

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189535 ◽

2020 ◽

pp. 1-12

Author(s):

Xiaoguang Gao

Keyword(s):

Development Strategy ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Large Data ◽

Experimental Comparison ◽

High Dimensional ◽

Density Peak ◽

Data Set

The unbalanced development strategy makes the regional development unbalanced. Therefore, in the development process, resources must be effectively utilized according to the level and characteristics of each region. Considering the resource and environmental constraints, this paper measures and analyzes China’s green economic efficiency and green total factor productivity. Moreover, by expounding the characteristics of high-dimensional data, this paper points out the problems of traditional clustering algorithms in high-dimensional data clustering. This paper proposes a density peak clustering algorithm based on sampling and residual squares, which is suitable for high-dimensional large data sets. The algorithm finds abnormal points and boundary points by identifying halo points, and finally determines clusters. In addition, from the experimental comparison on the data set, it can be seen that the improved algorithm is better than the DPC algorithm in both time complexity and clustering results. Finally, this article analyzes data based on actual cases. The research results show that the method proposed in this paper is effective.

Download Full-text

Cross Breed Clustering Algorithm for High Dimensional Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5313.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5049-5052

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Growing Domain ◽

Present World

Clustering plays a major role in machine learning and also in data mining. Deep learning is fast growing domain in present world. Improving the quality of the clustering results by adopting the deep learning algorithms. Many clustering algorithm process various datasets to get the better results. But for the high dimensional data clustering is still an issue to process and get the quality clustering results with the existing clustering algorithms. In this paper, the cross breed clustering algorithm for high dimensional data is utilized. Various datasets are used to get the results.

Download Full-text

High dimensional data clustering

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.82 ◽

2018 ◽

Vol 3 (1) ◽

pp. 21-30

Author(s):

M. Pavithra ◽

R.M.S. Parvathi

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Quality Metrics ◽

Experimental Results ◽

High Dimensional ◽

Future Research ◽

High Quality ◽

Feature Subspace ◽

Data Points ◽

Visualization Techniques

Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. The proposed method called “kernel trick” and “Collective Neighbour Clustering”, which takes as input measures of correspondence between pairs of data points. Real-valued hubs are exchanged between data points until a high-quality set of patterns and corresponding clusters gradually emerges [2]. To validate our theory by demonstrating that hubness is a high-quality measure of point centrality within a high dimensional information cluster, and by proposing several hubness-based clustering algorithms, showing that main hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster patterns [4]. Experimental results demonstrate the good performance of our proposed algorithms in manifold settings, mainly focused on large quantities of overlapping noise. The proposed methods are modified mostly for detecting approximately hyper spherical clusters and need to be extended to properly handle clusters of arbitrary shapes [6]. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself [10]. The process is described through a reworked version of the well-known information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reﬂections on implications of our model for future research. High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data-mining techniques, both in terms of effectiveness and efficiency [7]. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper we take a novel perspective on the problem of clustering high-dimensional data [8]. Instead of attempting to avoid the curse of dimensionality by observing a lower-dimensional feature subspace, we embrace dimensionality by taking advantage of some inherently high-dimensional phenomena. More speciﬁcally, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest neighbour lists of other points, can be successfully exploited in clustering. We validate our hypothesis by proposing several hubness-based clustering algorithms and testing them on high-dimensional data. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise [9].

Download Full-text

A Survey on Various Clustering Algorithms in High Dimensional Data

Subspace Clustering of High Dimensional Data Using Differential Evolution

Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data

M-Denclue for Effective Data Clustering in High Dimensional Non-Linear Data

Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms

Clustering High Dimensional Data Using Subspace and Projected Clustering Algorithms

Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches

Subspace Clustering for High-Dimensional Data Using Cluster Structure Similarity

Urban green economic development indicators based on spatial clustering algorithm and blockchain

Cross Breed Clustering Algorithm for High Dimensional Data

High dimensional data clustering

Export Citation Format