Data clustering algorithms based on Swarm Intelligence

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

Data Clustering Algorithms Using Rough Sets

Handbook of Research on Computational Intelligence for Engineering, Science, and Business ◽

10.4018/978-1-4666-2518-1.ch012 ◽

2013 ◽

pp. 297-327 ◽

Cited By ~ 6

Author(s):

B.K. Tripathy ◽

Adhir Ghosh

Keyword(s):

Comparative Study ◽

Rough Set ◽

Fuzzy Clustering ◽

Fuzzy Set ◽

Rough Sets ◽

Data Clustering ◽

Clustering Algorithms ◽

Clustering Methods ◽

Future Studies ◽

Multiple Clusters

Developing Data Clustering algorithms have been pursued by researchers since the introduction of k-means algorithm (Macqueen 1967; Lloyd 1982). These algorithms were subsequently modified to handle categorical data. In order to handle the situations where objects can have memberships in multiple clusters, fuzzy clustering and rough clustering methods were introduced (Lingras et al 2003, 2004a). There are many extensions of these initial algorithms (Lingras et al 2004b; Lingras 2007; Mitra 2004; Peters 2006, 2007). The MMR algorithm (Parmar et al 2007), its extensions (Tripathy et al 2009, 2011a, 2011b) and the MADE algorithm (Herawan et al 2010) use rough set techniques for clustering. In this chapter, the authors focus on rough set based clustering algorithms and provide a comparative study of all the fuzzy set based and rough set based clustering algorithms in terms of their efficiency. They also present problems for future studies in the direction of the topics covered.

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Featureless Data Clustering

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch009 ◽

2010 ◽

pp. 141-164 ◽

Cited By ~ 2

Author(s):

Wilson Wong

Keyword(s):

Data Clustering ◽

Dominant Role ◽

Clustering Algorithms ◽

Adaptive Clustering ◽

Feature Based ◽

Clustering Approach ◽

Semantic Computation ◽

The Many ◽

Time Required ◽

Existing Data

Feature-based semantic measurements have played a dominant role in conventional data clustering algorithms for many existing applications. However, the applicability of existing data clustering approaches to a wider range of applications is limited due to issues such as complexity involved in semantic computation, long pre-processing time required for feature preparation, and poor extensibility of semantic measurement due to non-incremental feature source. This chapter first summarises the many commonly used clustering algorithms and feature-based semantic measurements, and then highlights the shortcomings to make way for the proposal of an adaptive clustering approach based on featureless semantic measurements. The chapter concludes with experiments demonstrating the performance and wide applicability of the proposed clustering approach.

Download Full-text

A Review on Swarm Intelligence Algorithms Applied for Data Clustering

Advances in Intelligent Systems and Computing - Proceedings of International Conference on Intelligent Computing, Information and Control Systems ◽

10.1007/978-981-15-8443-5_36 ◽

2021 ◽

pp. 429-440

Author(s):

N. Yashaswini Gowda ◽

B. R. Lakshmikantha

Keyword(s):

Swarm Intelligence ◽

Data Clustering

Download Full-text

A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2019040102 ◽

2019 ◽

Vol 9 (2) ◽

pp. 16-30 ◽

Cited By ~ 1

Author(s):

Hind Bangui ◽

Mouzhi Ge ◽

Barbora Buhnova

Keyword(s):

Big Data ◽

Internet Of Things ◽

Mobile Networks ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Future Internet ◽

Research Challenges ◽

Initial Stage ◽

Big Data Technologies

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

Download Full-text

Single-cell RNA-seq data clustering: A survey with performance comparison study

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020400053 ◽

2020 ◽

Vol 18 (04) ◽

pp. 2040005

Author(s):

Ruiyi Li ◽

Jihong Guan ◽

Shuigeng Zhou

Keyword(s):

Single Cell ◽

Data Clustering ◽

Performance Metrics ◽

Clustering Algorithms ◽

Cell Types ◽

Performance Comparison ◽

Cellular Heterogeneity ◽

Clustering Methods ◽

Multiple Perspectives ◽

Underlying Mechanisms

Clustering analysis has been widely applied to single-cell RNA-sequencing (scRNA-seq) data to discover cell types and cell states. Algorithms developed in recent years have greatly helped the understanding of cellular heterogeneity and the underlying mechanisms of biological processes. However, these algorithms often use different techniques, were evaluated on different datasets and compared with some of their counterparts usually using different performance metrics. Consequently, there lacks an accurate and complete picture of their merits and demerits, which makes it difficult for users to select proper algorithms for analyzing their data. To fill this gap, we first do a review on the major existing scRNA-seq data clustering methods, and then conduct a comprehensive performance comparison among them from multiple perspectives. We consider 13 state of the art scRNA-seq data clustering algorithms, and collect 12 publicly available real scRNA-seq datasets from the existing works to evaluate and compare these algorithms. Our comparative study shows that the existing methods are very diverse in performance. Even the top-performance algorithms do not perform well on all datasets, especially those with complex structures. This suggests that further research is required to explore more stable, accurate, and efficient clustering algorithms for scRNA-seq data.

Download Full-text

Swarm Intelligence-Based Clustering Algorithms: A Survey

Unsupervised Learning Algorithms ◽

10.1007/978-3-319-24211-8_12 ◽

2016 ◽

pp. 303-341 ◽

Cited By ~ 9

Author(s):

Tülin İnkaya ◽

Sinan Kayalıgil ◽

Nur Evin Özdemirel

Keyword(s):

Swarm Intelligence ◽

Clustering Algorithms

Download Full-text

Effective Data Clustering Algorithms

Advances in Intelligent Systems and Computing - Soft Computing: Theories and Applications ◽

10.1007/978-981-13-0589-4_39 ◽

2018 ◽

pp. 419-432 ◽

Cited By ~ 2

Author(s):

Kamalpreet Bindra ◽

Anuranjan Mishra ◽

Suryakant

Keyword(s):

Data Clustering ◽

Clustering Algorithms

Download Full-text

Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data

Algorithms ◽

10.3390/a11110177 ◽

2018 ◽

Vol 11 (11) ◽

pp. 177 ◽

Cited By ~ 2

Author(s):

Xuedong Gao ◽

Minghan Yang

Keyword(s):

Machine Learning ◽

Categorical Data ◽

Data Clustering ◽

Information Gain ◽

Clustering Algorithms ◽

Number Of Clusters ◽

Cluster Compactness ◽

Clustering Validation ◽

Categorical Data Clustering

Clustering is one of the main tasks of machine learning. Internal clustering validation indexes (CVIs) are used to measure the quality of several clustered partitions to determine the local optimal clustering results in an unsupervised manner, and can act as the objective function of clustering algorithms. In this paper, we first studied several well-known internal CVIs for categorical data clustering, and proved the ineffectiveness of evaluating the partitions of different numbers of clusters without any inter-cluster separation measures or assumptions; the accurateness of separation, along with its coordination with the intra-cluster compactness measures, can notably affect performance. Then, aiming to enhance the internal clustering validation measurement, we proposed a new internal CVI—clustering utility based on the averaged information gain of isolating each cluster (CUBAGE)—which measures both the compactness and the separation of the partition. The experimental results supported our findings with regard to the existing internal CVIs, and showed that the proposed CUBAGE outperforms other internal CVIs with or without a pre-known number of clusters.

Download Full-text