Research on self-adaptive clustering algorithms for large data sparse networks based on information entropy

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Subspace Clustering of High Dimensional Data Using Differential Evolution

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch003 ◽

2019 ◽

pp. 47-74 ◽

Cited By ~ 1

Author(s):

Parul Agarwal ◽

Shikha Mehta

Keyword(s):

Differential Evolution ◽

Distance Measure ◽

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Dbscan Clustering ◽

Evolution Algorithms ◽

Self Adaptive

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

Download Full-text

Featureless Data Clustering

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch009 ◽

2010 ◽

pp. 141-164 ◽

Cited By ~ 2

Author(s):

Wilson Wong

Keyword(s):

Data Clustering ◽

Dominant Role ◽

Clustering Algorithms ◽

Adaptive Clustering ◽

Feature Based ◽

Clustering Approach ◽

Semantic Computation ◽

The Many ◽

Time Required ◽

Existing Data

Feature-based semantic measurements have played a dominant role in conventional data clustering algorithms for many existing applications. However, the applicability of existing data clustering approaches to a wider range of applications is limited due to issues such as complexity involved in semantic computation, long pre-processing time required for feature preparation, and poor extensibility of semantic measurement due to non-incremental feature source. This chapter first summarises the many commonly used clustering algorithms and feature-based semantic measurements, and then highlights the shortcomings to make way for the proposal of an adaptive clustering approach based on featureless semantic measurements. The chapter concludes with experiments demonstrating the performance and wide applicability of the proposed clustering approach.

Download Full-text

Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2020070103 ◽

2020 ◽

Vol 11 (3) ◽

pp. 42-67

Author(s):

Soumeya Zerabi ◽

Souham Meshoul ◽

Samia Chikhi Boucherkha

Keyword(s):

Clustering Algorithms ◽

Large Data ◽

Optimal Number ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Distributed Models ◽

Hadoop Mapreduce ◽

Distributed Solutions ◽

Clustering Validation

Cluster validation aims to both evaluate the results of clustering algorithms and predict the number of clusters. It is usually achieved using several indexes. Traditional internal clustering validation indexes (CVIs) are mainly based in computing pairwise distances which results in a quadratic complexity of the related algorithms. The existing CVIs cannot handle large data sets properly and need to be revisited to take account of the ever-increasing data set volume. Therefore, design of parallel and distributed solutions to implement these indexes is required. To cope with this issue, the authors propose two parallel and distributed models for internal CVIs namely for Silhouette and Dunn indexes using MapReduce framework under Hadoop. The proposed models termed as MR_Silhouette and MR_Dunn have been tested to solve both the issue of evaluating the clustering results and identifying the optimal number of clusters. The results of experimental study are very promising and show that the proposed parallel and distributed models achieve the expected tasks successfully.

Download Full-text

Fair Self-Adaptive Clustering for Hybrid Cellular-Vehicular Networks

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2020.2966279 ◽

2020 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Julian Garbiso ◽

Ada Diaconescu ◽

Marceau Coupechoux ◽

Bertrand Leroy

Keyword(s):

Vehicular Networks ◽

Adaptive Clustering ◽

Self Adaptive

Download Full-text

CLUSTERING CATEGORICAL AND NUMERICAL DATA: A NEW PROCEDURE USING MULTIDIMENSIONAL SCALING

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622003000549 ◽

2003 ◽

Vol 02 (01) ◽

pp. 135-159 ◽

Cited By ~ 12

Author(s):

SUNG-GI LEE ◽

DEOK-KYUN YUN

Keyword(s):

Multidimensional Scaling ◽

Clustering Algorithms ◽

Numerical Data ◽

Large Data ◽

Careful Analysis ◽

Mixed Data ◽

Coordinate Space ◽

Data Sets ◽

Categorical Attributes ◽

Categorical Attribute

In this paper, we present a concept based on the similarity of categorical attribute values considering implicit relationships and propose a new and effective clustering procedure for mixed data. Our procedure obtains similarities between categorical values from careful analysis and maps the values in each categorical attribute into points in two-dimensional coordinate space using multidimensional scaling. These mapped values make it possible to interpret the relationships between attribute values and to directly apply categorical attributes to clustering algorithms using a Euclidean distance. After trivial modifications, our procedure for clustering mixed data uses the k-means algorithm, well known for its efficiency in clustering large data sets. We use the familiar soybean disease and adult data sets to demonstrate the performance of our clustering procedure. The satisfactory results that we have obtained demonstrate the effectiveness of our algorithm in discovering structure in data.

Download Full-text

Low-Rank Matrix Factorization and Co-clustering Algorithms for Analyzing Large Data Sets

Lecture Notes in Computer Science - Data Engineering and Management ◽

10.1007/978-3-642-27872-3_41 ◽

2012 ◽

pp. 272-279 ◽

Cited By ~ 2

Author(s):

Archana Donavalli ◽

Manjeet Rege ◽

Xumin Liu ◽

Kourosh Jafari-Khouzani

Keyword(s):

Matrix Factorization ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Low Rank ◽

Data Sets ◽

Rank Matrix ◽

Low Rank Matrix

Download Full-text

A Novel Self-Adaptive Clustering Algorithm for Dynamic Data

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-34487-9_6 ◽

2012 ◽

pp. 42-49 ◽

Cited By ~ 1

Author(s):

Ming Liu ◽

Lei Lin ◽

Lili Shan ◽

Chengjie Sun

Keyword(s):

Clustering Algorithm ◽

Dynamic Data ◽

Adaptive Clustering ◽

Self Adaptive

Download Full-text

Searching for Pulsating Stars Using Clustering Algorithms

Proceedings of the International Astronomical Union ◽

10.1017/s1743921318002855 ◽

2017 ◽

Vol 14 (S339) ◽

pp. 310-313

Author(s):

R. Kgoadi ◽

I. Whittingham ◽

C. Engelbrecht

Keyword(s):

Clustering Algorithms ◽

Large Data ◽

Relevant Information ◽

Variable Stars ◽

Data Sets ◽

Specific Class ◽

Pulsating Stars ◽

Expectation Maximisation ◽

Input Variables ◽

Physical Features

AbstractClustering algorithms constitute a multi-disciplinary analytical tool commonly used to summarise large data sets. Astronomical classifications are based on similarity, where celestial objects are assigned to a specific class according to specific physical features. The aim of this project is to obtain relevant information from high-dimensional data (at least three input variables in a data-frame) derived from stellar light-curves using a number of clustering algorithms such as K-means and Expectation Maximisation. In addition to identifying the best performing algorithm, we also identify a subset of features that best define stellar groups. Three methodologies are applied to a sample of Kepler time series in the temperature range 6500–19,000 K. In that spectral range, at least four classes of variable stars are expected to be found: δ Scuti, γ Doradus, Slowly Pulsating B (SPB), and (the still equivocal) Maia stars.

Download Full-text