scholarly journals A Genetic XK-Means Algorithm with Empty Cluster Reassignment

Symmetry ◽  
2019 ◽  
Vol 11 (6) ◽  
pp. 744 ◽  
Author(s):  
Chun Hua ◽  
Feng Li ◽  
Chao Zhang ◽  
Jie Yang ◽  
Wei Wu

K-Means is a well known and widely used classical clustering algorithm. It is easy to fall into local optimum and it is sensitive to the initial choice of cluster centers. XK-Means (eXploratory K-Means) has been introduced in the literature by adding an exploratory disturbance onto the vector of cluster centers, so as to jump out of the local optimum and reduce the sensitivity to the initial centers. However, empty clusters may appear during the iteration of XK-Means, causing damage to the efficiency of the algorithm. The aim of this paper is to introduce an empty-cluster-reassignment technique and use it to modify XK-Means, resulting in an EXK-Means clustering algorithm. Furthermore, we combine the EXK-Means with genetic mechanism to form a genetic XK-Means algorithm with empty-cluster-reassignment, referred to as GEXK-Means clustering algorithm. The convergence of GEXK-Means to the global optimum is theoretically proved. Numerical experiments on a few real world clustering problems are carried out, showing the advantage of EXK-Means over XK-Means, and the advantage of GEXK-Means over EXK-Means, XK-Means, K-Means and GXK-Means (genetic XK-Means).

Author(s):  
GUO PAN ◽  
KENLI LI ◽  
AIJIA OUYANG ◽  
XU ZHOU ◽  
YUMING XU

In order to overcome the drawbacks of the K-means (KM) for clustering problems such as excessively depending on the initial guess values and easily getting into local optimum, a clustering algorithm of invasive weed optimization (IWO) and KM based on the cloud model has been proposed in the paper. The so-called cloud model IWO (CMIWO) is adopted to direct the search of KM algorithm to ensure that the population has a definite evolution direction in the iterative process, thus improving the performance of CMIWO K-means (CMIWOKM) algorithm in terms of convergence speed, computing precision and algorithm robustness. The experimental results show that the proposed algorithm has such advantages as higher accuracy, faster constringency, and stronger stability.


Author(s):  
Korawit Orkphol ◽  
Wu Yang

Microblogging is a type of blog used by people to express their opinions, attitudes, and feelings toward entities with a short message and this message is easily shared through the network of connected people. Knowing their sentiments would be beneficial for decision-making, planning, visualization, and so on. Grouping similar microblogging messages can convey some meaningful sentiments toward an entity. This task can be accomplished by using a simple and fast clustering algorithm, [Formula: see text]-means. As the microblogging messages are short and noisy they cause high sparseness and high-dimensional dataset. To overcome this problem, term frequency–inverse document frequency (tf–idf) technique is employed for selecting the relevant features, and singular value decomposition (SVD) technique is employed for reducing the high-dimensional dataset while still retaining the most relevant features. These two techniques adjust dataset to improve the [Formula: see text]-means efficiently. Another problem comes from [Formula: see text]-means itself. [Formula: see text]-means result relies on the initial state of centroids, the random initial state of centroids usually causes convergence to a local optimum. To find a global optimum, artificial bee colony (ABC), a novel swarm intelligence algorithm, is employed to find the best initial state of centroids. Silhouette analysis technique is also used to find optimal [Formula: see text]. After clustering into [Formula: see text] groups, each group will be scored by SentiWordNet and we analyzed the sentiment polarities of each group. Our approach shows that combining various techniques (i.e., tf–idf, SVD, and ABC) can significantly improve [Formula: see text]-means result (41% from normal [Formula: see text]-means).


Author(s):  
Katsuhiro Honda ◽  
◽  
Issei Hayashi ◽  
Seiki Ubukata ◽  
Akira Notsu

Three-mode fuzzy co-clustering is a promising technique for analyzing relational co-occurrence information among three mode elements. The conventional FCM-type algorithms achieved simultaneous fuzzy partition of three mode elements based on the fuzzy c-means (FCM) concept, and then, they often suffer from careful tuning of three independent fuzzification parameters. In this paper, a novel three-mode fuzzy co-clustering algorithm is proposed by modifying the conventional aggregation criterion of three elements based on a probabilistic concept. The fuzziness degree of three-mode partition can be easily tuned only with a single parameter under the guideline of the probabilistic standard. The characteristic features of the proposed method are compared with the conventional algorithms through numerical experiments using an artificial dataset and are demonstrated in application to a real world dataset of MovieLens movie evaluation data.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 53
Author(s):  
Qibing Jin ◽  
Nan Lin ◽  
Yuming Zhang

K-Means Clustering is a popular technique in data analysis and data mining. To remedy the defects of relying on the initialization and converging towards the local minimum in the K-Means Clustering (KMC) algorithm, a chaotic adaptive artificial bee colony algorithm (CAABC) clustering algorithm is presented to optimally partition objects into K clusters in this study. This algorithm adopts the max–min distance product method for initialization. In addition, a new fitness function is adapted to the KMC algorithm. This paper also reports that the iteration abides by the adaptive search strategy, and Fuch chaotic disturbance is added to avoid converging on local optimum. The step length decreases linearly during the iteration. In order to overcome the shortcomings of the classic ABC algorithm, the simulated annealing criterion is introduced to the CAABC. Finally, the confluent algorithm is compared with other stochastic heuristic algorithms on the 20 standard test functions and 11 datasets. The results demonstrate that improvements in CAABA-K-means have an advantage on speed and accuracy of convergence over some conventional algorithms for solving clustering problems.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Kang Zhang ◽  
Xingsheng Gu

Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP) algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-34
Author(s):  
Rediet Abebe ◽  
T.-H. HUBERT Chan ◽  
Jon Kleinberg ◽  
Zhibin Liang ◽  
David Parkes ◽  
...  

A long line of work in social psychology has studied variations in people’s susceptibility to persuasion—the extent to which they are willing to modify their opinions on a topic. This body of literature suggests an interesting perspective on theoretical models of opinion formation by interacting parties in a network: in addition to considering interventions that directly modify people’s intrinsic opinions, it is also natural to consider interventions that modify people’s susceptibility to persuasion. In this work, motivated by this fact, we propose an influence optimization problem. Specifically, we adopt a popular model for social opinion dynamics, where each agent has some fixed innate opinion, and a resistance that measures the importance it places on its innate opinion; agents influence one another’s opinions through an iterative process. Under certain conditions, this iterative process converges to some equilibrium opinion vector. For the unbudgeted variant of the problem, the goal is to modify the resistance of any number of agents (within some given range) such that the sum of the equilibrium opinions is minimized; for the budgeted variant, in addition the algorithm is given upfront a restriction on the number of agents whose resistance may be modified. We prove that the objective function is in general non-convex. Hence, formulating the problem as a convex program as in an early version of this work (Abebe et al., KDD’18) might have potential correctness issues. We instead analyze the structure of the objective function, and show that any local optimum is also a global optimum, which is somehow surprising as the objective function might not be convex. Furthermore, we combine the iterative process and the local search paradigm to design very efficient algorithms that can solve the unbudgeted variant of the problem optimally on large-scale graphs containing millions of nodes. Finally, we propose and evaluate experimentally a family of heuristics for the budgeted variant of the problem.


Algorithms ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 197
Author(s):  
Ali Seman ◽  
Azizian Mohd Sapawi

In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools.


Author(s):  
Manmohan Singh ◽  
Rajendra Pamula ◽  
Alok Kumar

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.


2013 ◽  
Vol 11 (1) ◽  
pp. 293-308 ◽  
Author(s):  
Somayeh Karimi ◽  
Navid Mostoufi ◽  
Rahmat Sotudeh-Gharebagh

Abstract Modeling and optimization of the process of continuous catalytic reforming (CCR) of naphtha was investigated. The process model is based on a network of four main reactions which was proved to be quite effective in terms of industrial application. Temperatures of the inlet of four reactors were selected as the decision variables. The honey-bee mating optimization (HBMO) and the genetic algorithm (GA) were applied to solve the optimization problem and the results of these two methods were compared. The profit was considered as the objective function which was subject to maximization. Optimization of the CCR moving bed reactors to reach maximum profit was carried out by the HBMO algorithm and the inlet temperature reactors were considered as decision variables. The optimization results showed that an increase of 3.01% in the profit can be reached based on the results of the HBMO algorithm. Comparison of the performance of optimization by the HBMO and the GA for the naphtha reforming model showed that the HBMO is an effective and rapid converging technique which can reach a better optimum results than the GA. The results showed that the HBMO has a better performance than the GA in finding the global optimum with fewer number of objective function evaluations. Also, it was shown that the HBMO is less likely to get stuck in a local optimum.


Author(s):  
K. Kamil ◽  
K.H Chong ◽  
H. Hashim ◽  
S.A. Shaaya

<p>Genetic algorithm is a well-known metaheuristic method to solve optimization problem mimic the natural process of cell reproduction. Having great advantages on solving optimization problem makes this method popular among researchers to improve the performance of simple Genetic Algorithm and apply it in many areas. However, Genetic Algorithm has its own weakness of less diversity which cause premature convergence where the potential answer trapped in its local optimum.  This paper proposed a method Multiple Mitosis Genetic Algorithm to improve the performance of simple Genetic Algorithm to promote high diversity of high-quality individuals by having 3 different steps which are set multiplying factor before the crossover process, conduct multiple mitosis crossover and introduce mini loop in each generation. Results shows that the percentage of great quality individuals improve until 90 percent of total population to find the global optimum.</p>


Sign in / Sign up

Export Citation Format

Share Document