A Genetic XK-Means Algorithm with Empty Cluster Reassignment

K-Means is a well known and widely used classical clustering algorithm. It is easy to fall into local optimum and it is sensitive to the initial choice of cluster centers. XK-Means (eXploratory K-Means) has been introduced in the literature by adding an exploratory disturbance onto the vector of cluster centers, so as to jump out of the local optimum and reduce the sensitivity to the initial centers. However, empty clusters may appear during the iteration of XK-Means, causing damage to the efficiency of the algorithm. The aim of this paper is to introduce an empty-cluster-reassignment technique and use it to modify XK-Means, resulting in an EXK-Means clustering algorithm. Furthermore, we combine the EXK-Means with genetic mechanism to form a genetic XK-Means algorithm with empty-cluster-reassignment, referred to as GEXK-Means clustering algorithm. The convergence of GEXK-Means to the global optimum is theoretically proved. Numerical experiments on a few real world clustering problems are carried out, showing the advantage of EXK-Means over XK-Means, and the advantage of GEXK-Means over EXK-Means, XK-Means, K-Means and GXK-Means (genetic XK-Means).

Download Full-text

A HYBRID CLUSTERING ALGORITHM COMBINING CLOUD MODEL IWO AND K-MEANS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414500153 ◽

2014 ◽

Vol 28 (06) ◽

pp. 1450015 ◽

Cited By ~ 14

Author(s):

GUO PAN ◽

KENLI LI ◽

AIJIA OUYANG ◽

XU ZHOU ◽

YUMING XU

Keyword(s):

Iterative Process ◽

Clustering Algorithm ◽

Cloud Model ◽

Initial Guess ◽

Experimental Results ◽

Local Optimum ◽

Invasive Weed Optimization ◽

Invasive Weed ◽

Hybrid Clustering ◽

Clustering Problems

In order to overcome the drawbacks of the K-means (KM) for clustering problems such as excessively depending on the initial guess values and easily getting into local optimum, a clustering algorithm of invasive weed optimization (IWO) and KM based on the cloud model has been proposed in the paper. The so-called cloud model IWO (CMIWO) is adopted to direct the search of KM algorithm to ensure that the population has a definite evolution direction in the iterative process, thus improving the performance of CMIWO K-means (CMIWOKM) algorithm in terms of convergence speed, computing precision and algorithm robustness. The experimental results show that the proposed algorithm has such advantages as higher accuracy, faster constringency, and stronger stability.

Download Full-text

Sentiment Analysis on Microblogging with K-Means Clustering and Artificial Bee Colony

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500172 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950017 ◽

Cited By ~ 4

Author(s):

Korawit Orkphol ◽

Wu Yang

Keyword(s):

Artificial Bee Colony ◽

Clustering Algorithm ◽

Global Optimum ◽

High Dimensional ◽

Local Optimum ◽

Short Message ◽

Initial State ◽

Bee Colony ◽

Analysis Technique ◽

Value Decomposition

Microblogging is a type of blog used by people to express their opinions, attitudes, and feelings toward entities with a short message and this message is easily shared through the network of connected people. Knowing their sentiments would be beneficial for decision-making, planning, visualization, and so on. Grouping similar microblogging messages can convey some meaningful sentiments toward an entity. This task can be accomplished by using a simple and fast clustering algorithm, [Formula: see text]-means. As the microblogging messages are short and noisy they cause high sparseness and high-dimensional dataset. To overcome this problem, term frequency–inverse document frequency (tf–idf) technique is employed for selecting the relevant features, and singular value decomposition (SVD) technique is employed for reducing the high-dimensional dataset while still retaining the most relevant features. These two techniques adjust dataset to improve the [Formula: see text]-means efficiently. Another problem comes from [Formula: see text]-means itself. [Formula: see text]-means result relies on the initial state of centroids, the random initial state of centroids usually causes convergence to a local optimum. To find a global optimum, artificial bee colony (ABC), a novel swarm intelligence algorithm, is employed to find the best initial state of centroids. Silhouette analysis technique is also used to find optimal [Formula: see text]. After clustering into [Formula: see text] groups, each group will be scored by SentiWordNet and we analyzed the sentiment polarities of each group. Our approach shows that combining various techniques (i.e., tf–idf, SVD, and ABC) can significantly improve [Formula: see text]-means result (41% from normal [Formula: see text]-means).

Download Full-text

Three-Mode Fuzzy Co-Clustering Based on Probabilistic Concept and Comparison with FCM-Type Algorithms

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0478 ◽

2021 ◽

Vol 25 (4) ◽

pp. 478-488

Author(s):

Katsuhiro Honda ◽

◽

Issei Hayashi ◽

Seiki Ubukata ◽

Akira Notsu

Keyword(s):

Real World ◽

Numerical Experiments ◽

Clustering Algorithm ◽

Single Parameter ◽

Promising Technique ◽

Evaluation Data ◽

Fuzzy Partition ◽

Artificial Dataset ◽

Probabilistic Concept ◽

Characteristic Features

Three-mode fuzzy co-clustering is a promising technique for analyzing relational co-occurrence information among three mode elements. The conventional FCM-type algorithms achieved simultaneous fuzzy partition of three mode elements based on the fuzzy c-means (FCM) concept, and then, they often suffer from careful tuning of three independent fuzzification parameters. In this paper, a novel three-mode fuzzy co-clustering algorithm is proposed by modifying the conventional aggregation criterion of three elements based on a probabilistic concept. The fuzziness degree of three-mode partition can be easily tuned only with a single parameter under the guideline of the probabilistic standard. The characteristic features of the proposed method are compared with the conventional algorithms through numerical experiments using an artificial dataset and are demonstrated in application to a real world dataset of MovieLens movie evaluation data.

Download Full-text

K-Means Clustering Algorithm Based on Chaotic Adaptive Artificial Bee Colony

Algorithms ◽

10.3390/a14020053 ◽

2021 ◽

Vol 14 (2) ◽

pp. 53

Author(s):

Qibing Jin ◽

Nan Lin ◽

Yuming Zhang

Keyword(s):

Artificial Bee Colony ◽

Clustering Algorithm ◽

Heuristic Algorithms ◽

Fitness Function ◽

Step Length ◽

Standard Test ◽

Local Optimum ◽

Bee Colony ◽

Clustering Problems ◽

Speed And Accuracy

K-Means Clustering is a popular technique in data analysis and data mining. To remedy the defects of relying on the initialization and converging towards the local minimum in the K-Means Clustering (KMC) algorithm, a chaotic adaptive artificial bee colony algorithm (CAABC) clustering algorithm is presented to optimally partition objects into K clusters in this study. This algorithm adopts the max–min distance product method for initialization. In addition, a new fitness function is adapted to the KMC algorithm. This paper also reports that the iteration abides by the adaptive search strategy, and Fuch chaotic disturbance is added to avoid converging on local optimum. The step length decreases linearly during the iteration. In order to overcome the shortcomings of the classic ABC algorithm, the simulated annealing criterion is introduced to the CAABC. Finally, the confluent algorithm is compared with other stochastic heuristic algorithms on the 20 standard test functions and 11 datasets. The results demonstrate that improvements in CAABA-K-means have an advantage on speed and accuracy of convergence over some conventional algorithms for solving clustering problems.

Download Full-text

An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

Mathematical Problems in Engineering ◽

10.1155/2014/486075 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 7

Author(s):

Kang Zhang ◽

Xingsheng Gu

Keyword(s):

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Affinity Propagation ◽

Mixed Data ◽

Clustering Methods ◽

Affinity Propagation Clustering ◽

Real World Datasets ◽

Data Objects ◽

Clustering Problems

Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP) algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

Download Full-text

Opinion Dynamics Optimization by Varying Susceptibility to Persuasion via Non-Convex Local Search

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3466617 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-34

Author(s):

Rediet Abebe ◽

T.-H. HUBERT Chan ◽

Jon Kleinberg ◽

Zhibin Liang ◽

David Parkes ◽

...

Keyword(s):

Local Search ◽

Objective Function ◽

Iterative Process ◽

Large Scale ◽

Opinion Dynamics ◽

Theoretical Models ◽

Global Optimum ◽

Local Optimum ◽

Opinion Formation ◽

Long Line

A long line of work in social psychology has studied variations in people’s susceptibility to persuasion—the extent to which they are willing to modify their opinions on a topic. This body of literature suggests an interesting perspective on theoretical models of opinion formation by interacting parties in a network: in addition to considering interventions that directly modify people’s intrinsic opinions, it is also natural to consider interventions that modify people’s susceptibility to persuasion. In this work, motivated by this fact, we propose an influence optimization problem. Specifically, we adopt a popular model for social opinion dynamics, where each agent has some fixed innate opinion, and a resistance that measures the importance it places on its innate opinion; agents influence one another’s opinions through an iterative process. Under certain conditions, this iterative process converges to some equilibrium opinion vector. For the unbudgeted variant of the problem, the goal is to modify the resistance of any number of agents (within some given range) such that the sum of the equilibrium opinions is minimized; for the budgeted variant, in addition the algorithm is given upfront a restriction on the number of agents whose resistance may be modified. We prove that the objective function is in general non-convex. Hence, formulating the problem as a convex program as in an early version of this work (Abebe et al., KDD’18) might have potential correctness issues. We instead analyze the structure of the objective function, and show that any local optimum is also a global optimum, which is somehow surprising as the objective function might not be convex. Furthermore, we combine the iterative process and the local search paradigm to design very efficient algorithms that can solve the unbudgeted variant of the problem optimally on large-scale graphs containing millions of nodes. Finally, we propose and evaluate experimentally a family of heuristics for the budgeted variant of the problem.

Download Full-text

An Optimal and Stable Algorithm for Clustering Numerical Data

Algorithms ◽

10.3390/a14070197 ◽

2021 ◽

Vol 14 (7) ◽

pp. 197

Author(s):

Ali Seman ◽

Azizian Mohd Sapawi

Keyword(s):

Standard Deviation ◽

Real World ◽

Clustering Algorithm ◽

Numerical Data ◽

Zero Point ◽

The Other ◽

Suitable Alternative ◽

Stable Algorithm ◽

Real World Applications

In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools.

Download Full-text

A Clustering Algorithm in Stream Data Using Strong Coreset

Journal of Interconnection Networks ◽

10.1142/s0219265921430118 ◽

2021 ◽

Author(s):

Manmohan Singh ◽

Rajendra Pamula ◽

Alok Kumar

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Local Optimum ◽

Reduction Algorithm ◽

Stream Data ◽

Stream Data Mining ◽

Clustering Approach ◽

Approximation Guarantee ◽

Competitive Algorithms ◽

Learning Data

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.

Download Full-text

Application of Honey-Bee Mating Optimization to Naphtha Reforming Reactor

International Journal of Chemical Reactor Engineering ◽

10.1515/ijcre-2013-0035 ◽

2013 ◽

Vol 11 (1) ◽

pp. 293-308 ◽

Cited By ~ 2

Author(s):

Somayeh Karimi ◽

Navid Mostoufi ◽

Rahmat Sotudeh-Gharebagh

Keyword(s):

Objective Function ◽

Honey Bee ◽

Process Model ◽

Global Optimum ◽

Inlet Temperature ◽

Local Optimum ◽

Naphtha Reforming ◽

Decision Variables ◽

Algorithm Comparison ◽

Honey Bee Mating Optimization

Abstract Modeling and optimization of the process of continuous catalytic reforming (CCR) of naphtha was investigated. The process model is based on a network of four main reactions which was proved to be quite effective in terms of industrial application. Temperatures of the inlet of four reactors were selected as the decision variables. The honey-bee mating optimization (HBMO) and the genetic algorithm (GA) were applied to solve the optimization problem and the results of these two methods were compared. The profit was considered as the objective function which was subject to maximization. Optimization of the CCR moving bed reactors to reach maximum profit was carried out by the HBMO algorithm and the inlet temperature reactors were considered as decision variables. The optimization results showed that an increase of 3.01% in the profit can be reached based on the results of the HBMO algorithm. Comparison of the performance of optimization by the HBMO and the GA for the naphtha reforming model showed that the HBMO is an effective and rapid converging technique which can reach a better optimum results than the GA. The results showed that the HBMO has a better performance than the GA in finding the global optimum with fewer number of objective function evaluations. Also, it was shown that the HBMO is less likely to get stuck in a local optimum.

Download Full-text

A Multiple Mitosis Genetic Algorithm

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v8.i3.pp252-258 ◽

2019 ◽

Vol 8 (3) ◽

pp. 252

Author(s):

K. Kamil ◽

K.H Chong ◽

H. Hashim ◽

S.A. Shaaya

Keyword(s):

Genetic Algorithm ◽

Optimization Problem ◽

Total Population ◽

Global Optimum ◽

Natural Process ◽

Local Optimum ◽

Simple Genetic Algorithm ◽

Cell Reproduction ◽

Solve Optimization Problem ◽

High Diversity

<p>Genetic algorithm is a well-known metaheuristic method to solve optimization problem mimic the natural process of cell reproduction. Having great advantages on solving optimization problem makes this method popular among researchers to improve the performance of simple Genetic Algorithm and apply it in many areas. However, Genetic Algorithm has its own weakness of less diversity which cause premature convergence where the potential answer trapped in its local optimum. This paper proposed a method Multiple Mitosis Genetic Algorithm to improve the performance of simple Genetic Algorithm to promote high diversity of high-quality individuals by having 3 different steps which are set multiplying factor before the crossover process, conduct multiple mitosis crossover and introduce mini loop in each generation. Results shows that the percentage of great quality individuals improve until 90 percent of total population to find the global optimum.</p>

Download Full-text