A new Kernelized Fuzzy Possibilistic C-Means for high dimensional data clustering based on kernel-induced distance measure

In this paper, we propose a latent feature group learning (LFGL) algorithm to discover the feature grouping structures and subspace clusters for high-dimensional data. The feature grouping structures, which are learned in an analytical way, can enhance the accuracy and efficiency of high-dimensional data clustering. In LFGL algorithm, the Darwinian evolutionary process is used to explore the optimal feature grouping structures, which are coded as chromosomes in the genetic algorithm. The feature grouping weighting k-means algorithm is used as the fitness function to evaluate the chromosomes or feature grouping structures in each generation of evolution. To better handle the diverse densities of clusters in high-dimensional data, the original feature grouping weighting k-means is revised with the mass-based dissimilarity measure rather than the Euclidean distance measure and the feature weights are optimized as a nonnegative matrix factorization problem under the orthogonal constraint of feature weight matrix. The genetic operations of mutation and crossover are used to generate the new chromosomes for next generation. In comparison with the well-known clustering algorithms, LFGL algorithm produced encouraging experimental results on real world datasets, which demonstrated the better performance of LFGL when clustering high-dimensional data.

Download Full-text

High dimensional data clustering through fuzzy possibilistic C-means with symmetry-based distance measure

International Journal of Computational Intelligence Studies ◽

10.1504/ijcistudies.2013.057646 ◽

2013 ◽

Vol 2 (3/4) ◽

pp. 288

Author(s):

B. Shanmugapriya ◽

M. Punithavalli

Keyword(s):

Data Clustering ◽

Distance Measure ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Robust models and novel similarity measures for high-dimensional data clustering

10.32657/10356/48657 ◽

2012 ◽

Author(s):

Duc Thang Nguyen

Keyword(s):

Data Clustering ◽

High Dimensional Data ◽

Similarity Measures ◽

High Dimensional

Download Full-text

Subspace Clustering of High Dimensional Data Using Differential Evolution

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch003 ◽

2019 ◽

pp. 47-74 ◽

Cited By ~ 1

Author(s):

Parul Agarwal ◽

Shikha Mehta

Keyword(s):

Differential Evolution ◽

Distance Measure ◽

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Dbscan Clustering ◽

Evolution Algorithms ◽

Self Adaptive

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

Download Full-text

Data Visualization and High-Dimensional Data Clustering

Clustering ◽

10.1002/9780470382776.ch9 ◽

2009 ◽

pp. 237-261

Keyword(s):

Data Visualization ◽

Data Clustering ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

HDGSOM: A Modified Growing Self-Organizing Map for High Dimensional Data Clustering

Fourth International Conference on Hybrid Intelligent Systems (HIS'04) ◽

10.1109/ichis.2004.52 ◽

2005 ◽

Cited By ~ 19

Author(s):

R. Amarasiri ◽

D. Alahakoon ◽

K.A. Smith

Keyword(s):

Data Clustering ◽

High Dimensional Data ◽

High Dimensional ◽

Self Organizing Map ◽

Self Organizing

Download Full-text

IQRAM: a high dimensional data clustering technique

International Journal of Knowledge Engineering and Data Mining ◽

10.1504/ijkedm.2012.051237 ◽

2012 ◽

Vol 2 (2/3) ◽

pp. 117

Author(s):

Dharmveer Singh Rajput ◽

Pramod Kumar Singh ◽

Mahua Bhattacharya

Keyword(s):

Data Clustering ◽

High Dimensional Data ◽

High Dimensional ◽

Clustering Technique

Download Full-text

Evolution of SOMs’ Structure and Learning Algorithm: From Visualization of High-Dimensional Data to Clustering of Complex Data

Algorithms ◽

10.3390/a13050109 ◽

2020 ◽

Vol 13 (5) ◽

pp. 109 ◽

Cited By ~ 1

Author(s):

Marian B. Gorzałczany ◽

Filip Rudziński

Keyword(s):

Data Visualization ◽

Data Clustering ◽

Learning Algorithm ◽

High Dimensional Data ◽

High Dimensional ◽

Data Sets ◽

Complex Data ◽

Self Organizing Maps ◽

Grid Networks ◽

Self Organizing

In this paper, we briefly present several modifications and generalizations of the concept of self-organizing neural networks—usually referred to as self-organizing maps (SOMs)—to illustrate their advantages in applications that range from high-dimensional data visualization to complex data clustering. Starting from conventional SOMs, Growing SOMs (GSOMs), Growing Grid Networks (GGNs), Incremental Grid Growing (IGG) approach, Growing Neural Gas (GNG) method as well as our two original solutions, i.e., Generalized SOMs with 1-Dimensional Neighborhood (GeSOMs with 1DN also referred to as Dynamic SOMs (DSOMs)) and Generalized SOMs with Tree-Like Structures (GeSOMs with T-LSs) are discussed. They are characterized in terms of (i) the modification mechanisms used, (ii) the range of network modifications introduced, (iii) the structure regularity, and (iv) the data-visualization/data-clustering effectiveness. The performance of particular solutions is illustrated and compared by means of selected data sets. We also show that the proposed original solutions, i.e., GeSOMs with 1DN (DSOMs) and GeSOMS with T-LSs outperform alternative approaches in various complex clustering tasks by providing up to 20 % increase in the clustering accuracy. The contribution of this work is threefold. First, algorithm-oriented original computer-implementations of particular SOM’s generalizations are developed. Second, their detailed simulation results are presented and discussed. Third, the advantages of our earlier-mentioned original solutions are demonstrated.

Download Full-text