scholarly journals An Improved Animal Migration Optimization Algorithm for Clustering Analysis

2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Mingzhi Ma ◽  
Qifang Luo ◽  
Yongquan Zhou ◽  
Xin Chen ◽  
Liangliang Li

Animal migration optimization (AMO) is one of the most recently introduced algorithms based on the behavior of animal swarm migration. This paper presents an improved AMO algorithm (IAMO), which significantly improves the original AMO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique and it is used in many fields. The well-known method in solving clustering problems isk-means clustering algorithm; however, it highly depends on the initial solution and is easy to fall into local optimum. To improve the defects of thek-means method, this paper used IAMO for the clustering problem and experiment on synthetic and real life data sets. The simulation results show that the algorithm has a better performance than that of thek-means, PSO, CPSO, ABC, CABC, and AMO algorithm for solving the clustering problem.

Author(s):  
SANGHAMITRA BANDYOPADHYAY ◽  
UJJWAL MAULIK ◽  
MALAY KUMAR PAKHIRA

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.


2013 ◽  
Vol 3 (4) ◽  
pp. 1-14 ◽  
Author(s):  
S. Sampath ◽  
B. Ramya

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Hong Peng ◽  
Xiaohui Luo ◽  
Zhisheng Gao ◽  
Jun Wang ◽  
Zheng Pei

P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive tok-means algorithm and several evolutionary clustering algorithms recently reported in the literature.


2014 ◽  
Vol 2014 ◽  
pp. 1-16 ◽  
Author(s):  
Xin Chen ◽  
Yongquan Zhou ◽  
Qifang Luo

Clustering is a popular data analysis and data mining technique. Thek-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of thek-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.


2021 ◽  
Vol 25 (6) ◽  
pp. 1507-1524
Author(s):  
Chunying Zhang ◽  
Ruiyan Gao ◽  
Jiahao Wang ◽  
Song Chen ◽  
Fengchun Liu ◽  
...  

In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.


2008 ◽  
pp. 1231-1249
Author(s):  
Jaehoon Kim ◽  
Seong Park

Much of the research regarding streaming data has focused only on real time querying and analysis of recent data stream allowable in memory. However, as data stream mining, or tracking of past data streams, is often required, it becomes necessary to store large volumes of streaming data in stable storage. Moreover, as stable storage has restricted capacity, past data stream must be summarized. The summarization must be performed periodically because streaming data flows continuously, quickly, and endlessly. Therefore, in this paper, we propose an efficient periodic summarization method with a flexible storage allocation. It improves the overall estimation error by flexibly adjusting the size of the summarized data of each local time section. Additionally, as the processing overhead of compression and the disk I/O cost of decompression can be an important factor for quick summarization, we also consider setting the proper size of data stream to be summarized at a time. Some experimental results with artificial data sets as well as real life data show that our flexible approach is more efficient than the existing fixed approach.


2008 ◽  
Vol 20 (4) ◽  
pp. 1042-1064
Author(s):  
Maciej Pedzisz ◽  
Danilo P. Mandic

A homomorphic feedforward network (HFFN) for nonlinear adaptive filtering is introduced. This is achieved by a two-layer feedforward architecture with an exponential hidden layer and logarithmic preprocessing step. This way, the overall input-output relationship can be seen as a generalized Volterra model, or as a bank of homomorphic filters. Gradient-based learning for this architecture is introduced, together with some practical issues related to the choice of optimal learning parameters and weight initialization. The performance and convergence speed are verified by analysis and extensive simulations. For rigor, the simulations are conducted on artificial and real-life data, and the performances are compared against those obtained by a sigmoidal feedforward network (FFN) with identical topology. The proposed HFFN proved to be a viable alternative to FFNs, especially in the critical case of online learning on small- and medium-scale data sets.


2013 ◽  
Vol 411-414 ◽  
pp. 1884-1893
Author(s):  
Yong Chun Cao ◽  
Ya Bin Shao ◽  
Shuang Liang Tian ◽  
Zheng Qi Cai

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.


Author(s):  
Mohamed Ibrahim Mohamed ◽  
Laba Handique ◽  
Subrata Chakraborty ◽  
Nadeem Shafique Butt ◽  
Haitham M. Yousof

In this article an attempt is made to introduce a new extension of the Fréchet model called the Xgamma Fréchet model. Some of its properties are derived. The estimation of the parameters via different estimation methods are discussed. The performances of the proposed estimation methods are investigated through simulations as well as real life data sets. The potentiality of the proposed model is established through modelling of two real life data sets. The results have shown clear preference for the proposed model compared to several know competing ones.


Sign in / Sign up

Export Citation Format

Share Document