Chaotic Tornadogenesis Optimization Algorithm for Data Clustering Problems

This article describes how clustering is an attractive and major task in data mining in which particular set of objects are grouped according to their similarities based on some criteria. Among the numerous algorithms, k-Means is the best and efficient in address clustering problems. Any expert system is said to be good, only if it returns the optimal data clusters. The challenge of optimal clustering lies in finding the optimal number of clusters and identifying all the data groups correctly which is a NP-hard problem. Recently a new optimization algorithm TOA was developed to address these problems. However, the standard TOA is too often trapped at the local optima and premature convergence. To overcome this, this article proposes CTOA. The main objective of embedding chaotic maps into standard TOA is to compute and automatically adapt the internal parameters. The proposed CTOA is first benchmarked on standard mathematical functions and later applied to 10 data clustering problems. The obtained graphical and statistical results along with comparisons illustrate the capabilities of CTOA regarding accuracy and robustness

Download Full-text

Integrating Grasshopper Optimization Algorithm with Local Search for Solving Data Clustering Problems

International Journal of Computational Intelligence Systems ◽

10.2991/ijcis.d.210203.008 ◽

2021 ◽

Vol 14 (1) ◽

pp. 783

Author(s):

M. A. El-Shorbagy ◽

A. Y. Ayoub

Keyword(s):

Local Search ◽

Optimization Algorithm ◽

Data Clustering ◽

Grasshopper Optimization Algorithm ◽

Grasshopper Optimization ◽

Clustering Problems

Download Full-text

Multi-Objective Genetic Algorithm for Robust Clustering with Unknown Number of Clusters

International Journal of Applied Evolutionary Computation ◽

10.4018/jaec.2012010101 ◽

2012 ◽

Vol 3 (1) ◽

pp. 1-20

Author(s):

Amit Banerjee

Keyword(s):

Genetic Algorithm ◽

Data Clustering ◽

Optimal Number ◽

Least Trimmed Squares ◽

Cluster Assignment ◽

Objective Criterion ◽

Number Of Clusters ◽

Multi Objective ◽

Multi Objective Genetic Algorithm ◽

Optimal Number Of Clusters

In this paper, a multi-objective genetic algorithm for data clustering based on the robust fuzzy least trimmed squares estimator is presented. The proposed clustering methodology addresses two critical issues in unsupervised data clustering – the ability to produce meaningful partition in noisy data, and the requirement that the number of clusters be known a priori. The multi-objective genetic algorithm-driven clustering technique optimizes the number of clusters as well as cluster assignment, and cluster prototypes. A two-parameter, mapped, fixed point coding scheme is used to represent assignment of data into the true retained set and the noisy trimmed set, and the optimal number of clusters in the retained set. A three-objective criterion is also used as the minimization functional for the multi-objective genetic algorithm. Results on well-known data sets from literature suggest that the proposed methodology is superior to conventional fuzzy clustering algorithms that assume a known value for optimal number of clusters.

Download Full-text

A Dimensionality reduced Text data clustering with prediction of optimal number of clusters

International Journal of Applied Research on Information Technology and Computing ◽

10.5958/j.0975-8070.2.2.010 ◽

2011 ◽

Vol 2 (2) ◽

pp. 41 ◽

Cited By ~ 3

Author(s):

M. Ramakrishna Murty ◽

JVR Murthy ◽

Prasad Reddy ◽

Suresh Chandra Satapathy

Keyword(s):

Data Clustering ◽

Optimal Number ◽

Text Data ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

Communications in Computer and Information Science - Knowledge and Systems Sciences ◽

10.1007/978-981-15-1209-4_1 ◽

2019 ◽

pp. 1-17 ◽

Cited By ~ 3

Author(s):

Duy-Tai Dinh ◽

Tsutomu Fujinami ◽

Van-Nam Huynh

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Optimal Number ◽

Number Of Clusters ◽

Silhouette Coefficient ◽

Categorical Data Clustering ◽

Optimal Number Of Clusters

Download Full-text

A novel krill herd algorithm with orthogonality and its application to data clustering

Intelligent Data Analysis ◽

10.3233/ida-195056 ◽

2021 ◽

Vol 25 (3) ◽

pp. 605-626

Author(s):

Chen Zhao ◽

Zhongxin Liu ◽

Zengqiang Chen ◽

Yao Ning

Keyword(s):

Data Clustering ◽

Historical Data ◽

Improved Method ◽

Local Optima ◽

Krill Herd Algorithm ◽

Krill Herd ◽

Promising Solution ◽

First Time ◽

Clustering Problems ◽

Orthogonal Learning

Krill herd algorithm (KHA) is an emerging nature-inspired approach that has been successfully applied to optimization. However, KHA may get stuck into local optima owing to its poor exploitation. In this paper, the orthogonal learning (OL) mechanism is incorporated to enhance the performance of KHA for the first time, then an improved method named orthogonal krill herd algorithm (OKHA) is obtained. Compared with the existing hybridizations of KHA, OKHA could discover more useful information from historical data and construct a more promising solution. The proposed algorithm is applied to solve CEC2017 numerical problems, and its robustness is verified based on the simulation results. Moreover, OKHA is applied to tackle data clustering problems selected from the UCI Machine Learning Repository. The experimental results illustrate that OKHA is superior to or at least competitive with other representative clustering techniques.

Download Full-text

Multi-Swarm Whale Optimization Algorithm for Data Clustering Problems using Multiple Cooperative Strategies

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2018.08.04 ◽

2018 ◽

Vol 10 (8) ◽

pp. 36-53 ◽

Cited By ~ 4

Author(s):

Ravi Kumar Saidala ◽

◽

Nagaraju Devarakonda

Keyword(s):

Optimization Algorithm ◽

Data Clustering ◽

Whale Optimization Algorithm ◽

Cooperative Strategies ◽

Whale Optimization ◽

Clustering Problems

Download Full-text

Data Clustering Using Moth-Flame Optimization Algorithm

Sensors ◽

10.3390/s21124086 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4086

Author(s):

Tribhuvan Singh ◽

Nitin Saxena ◽

Manju Khurana ◽

Dilbag Singh ◽

Mohamed Abdalla ◽

...

Keyword(s):

Data Clustering ◽

Research Work ◽

Local Optima ◽

Suggested Approach ◽

Wide Range ◽

Benchmark Datasets ◽

The Mean ◽

Moth Flame Optimization Algorithm ◽

Clustering Problems ◽

State Of Art

A k-means algorithm is a method for clustering that has already gained a wide range of acceptability. However, its performance extremely depends on the opening cluster centers. Besides, due to weak exploration capability, it is easily stuck at local optima. Recently, a new metaheuristic called Moth Flame Optimizer (MFO) is proposed to handle complex problems. MFO simulates the moths intelligence, known as transverse orientation, used to navigate in nature. In various research work, the performance of MFO is found quite satisfactory. This paper suggests a novel heuristic approach based on the MFO to solve data clustering problems. To validate the competitiveness of the proposed approach, various experiments have been conducted using Shape and UCI benchmark datasets. The proposed approach is compared with five state-of-art algorithms over twelve datasets. The mean performance of the proposed algorithm is superior on 10 datasets and comparable in remaining two datasets. The analysis of experimental results confirms the efficacy of the suggested approach.

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01995 ◽

2010 ◽

Vol 30 (8) ◽

pp. 1995-1998 ◽

Cited By ~ 18

Author(s):

Shi-bing ZHOU ◽

Zhen-yuan XU ◽

Xu-qing TANG

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Current Bioinformatics ◽

10.2174/1574893613666180601080008 ◽

2018 ◽

Vol 14 (1) ◽

pp. 11-23 ◽

Cited By ~ 3

Author(s):

Lin Zhang ◽

Yanling He ◽

Huaizhi Wang ◽

Hui Liu ◽

Yufei Huang ◽

...

Keyword(s):

Clustering Analysis ◽

Methylation Level ◽

Optimal Number ◽

Generative Model ◽

Methylation Data ◽

Sequencing Data ◽

Number Of Clusters ◽

Rna Methylation ◽

Clustering Effect ◽

Optimal Number Of Clusters

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Download Full-text

Solving knapsack problems using a binary gaining sharing knowledge-based optimization algorithm

Complex & Intelligent Systems ◽

10.1007/s40747-021-00351-8 ◽

2021 ◽

Author(s):

Prachi Agrawal ◽

Talari Ganesh ◽

Ali Wagdy Mohamed

Keyword(s):

Life Span ◽

Population Size ◽

Linear Function ◽

Optimization Algorithm ◽

Optimization Problems ◽

Search Space ◽

Knapsack Problems ◽

Local Optima ◽

Knowledge Based ◽

Two Stages

AbstractThis article proposes a novel binary version of recently developed Gaining Sharing knowledge-based optimization algorithm (GSK) to solve binary optimization problems. GSK algorithm is based on the concept of how humans acquire and share knowledge during their life span. A binary version of GSK named novel binary Gaining Sharing knowledge-based optimization algorithm (NBGSK) depends on mainly two binary stages: binary junior gaining sharing stage and binary senior gaining sharing stage with knowledge factor 1. These two stages enable NBGSK for exploring and exploitation of the search space efficiently and effectively to solve problems in binary space. Moreover, to enhance the performance of NBGSK and prevent the solutions from trapping into local optima, NBGSK with population size reduction (PR-NBGSK) is introduced. It decreases the population size gradually with a linear function. The proposed NBGSK and PR-NBGSK applied to set of knapsack instances with small and large dimensions, which shows that NBGSK and PR-NBGSK are more efficient and effective in terms of convergence, robustness, and accuracy.

Download Full-text