Ant Custering Algorithms

Author(s):  
Yu-Chiun Chiou ◽  
Shih-Ta Chou

This paper proposes three ant clustering algorithms (ACAs): ACA-1, ACA-2 and ACA-3. The core logic of the proposed ACAs is to modify the ant colony metaheuristic by reformulating the clustering problem into a network problem. For a clustering problem of N objects and K clusters, a fully connected network of N nodes is formed with link costs, representing the dissimilarity of any two nodes it connects. K ants are then to collect their own nodes according to the link costs and following the pheromone trail laid by previous ants. The proposed three ACAs have been validated on a small-scale problem solved by a total enumeration method. The solution effectiveness at different problem scales consistently shows that ACA-2 outperforms among these three ACAs. A further comparison of ACA-2 with other commonly used clustering methods, including agglomerative hierarchy clustering algorithm (AHCA), K-means algorithm (KMA) and genetic clustering algorithm (GCA), shows that ACA-2 significantly outperforms them in solution effectiveness for the most of cases and also performs considerably better in solution stability as the problem scales or the number of clusters gets larger.

2010 ◽  
Vol 1 (1) ◽  
pp. 1-15 ◽  
Author(s):  
Yu-Chiun Chiou ◽  
Shih-Ta Chou

This paper proposes three ant clustering algorithms (ACAs): ACA-1, ACA-2 and ACA-3. The core logic of the proposed ACAs is to modify the ant colony metaheuristic by reformulating the clustering problem into a network problem. For a clustering problem of N objects and K clusters, a fully connected network of N nodes is formed with link costs, representing the dissimilarity of any two nodes it connects. K ants are then to collect their own nodes according to the link costs and following the pheromone trail laid by previous ants. The proposed three ACAs have been validated on a small-scale problem solved by a total enumeration method. The solution effectiveness at different problem scales consistently shows that ACA-2 outperforms among these three ACAs. A further comparison of ACA-2 with other commonly used clustering methods, including agglomerative hierarchy clustering algorithm (AHCA), K-means algorithm (KMA) and genetic clustering algorithm (GCA), shows that ACA-2 significantly outperforms them in solution effectiveness for the most of cases and also performs considerably better in solution stability as the problem scales or the number of clusters gets larger.


2007 ◽  
Vol 16 (06) ◽  
pp. 919-934
Author(s):  
YONGGUO LIU ◽  
XIAORONG PU ◽  
YIDONG SHEN ◽  
ZHANG YI ◽  
XIAOFENG LIAO

In this article, a new genetic clustering algorithm called the Improved Hybrid Genetic Clustering Algorithm (IHGCA) is proposed to deal with the clustering problem under the criterion of minimum sum of squares clustering. In IHGCA, the improvement operation including five local iteration methods is developed to tune the individual and accelerate the convergence speed of the clustering algorithm, and the partition-absorption mutation operation is designed to reassign objects among different clusters. By experimental simulations, its superiority over some known genetic clustering methods is demonstrated.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Chao Tong ◽  
Jianwei Niu ◽  
Bin Dai ◽  
Zhongyu Xie

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster’s core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.


2021 ◽  
Vol 10 (4) ◽  
pp. 2170-2180
Author(s):  
Untari N. Wisesty ◽  
Tati Rajab Mengko

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.


2021 ◽  
Author(s):  
Yizhang Wang ◽  
Di Wang ◽  
You Zhou ◽  
Chai Quek ◽  
Xiaofeng Zhang

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>


2017 ◽  
Vol 9 (2) ◽  
pp. 195-213
Author(s):  
Richárd Forster ◽  
Ágnes Fülöp

AbstractThe reconstruction and analyze of measured data play important role in the research of high energy particle physics. This leads to new results in both experimental and theoretical physics. This requires algorithm improvements and high computer capacity. Clustering algorithm makes it possible to get to know the jet structure more accurately. More granular parallelization of the kt cluster algorithms was explored by combining it with the hierarchical clustering methods used in network evaluations. The kt method allows to know the development of particles due to the collision of high-energy nucleus-nucleus. The hierarchical clustering algorithms works on graphs, so the particle information used by the standard kt algorithm was first transformed into an appropriate graph, representing the network of particles. Testing was done using data samples from the Alice offine library, which contains the required modules to simulate the ALICE detector that is a dedicated Pb-Pb detector. The proposed algorithm was compared to the FastJet toolkit's standard longitudinal invariant kt implementation. Parallelizing the standard non-optimized version of this algorithm utilizing the available CPU architecture proved to be 1:6 times faster, than the standard implementation, while the proposed solution in this paper was able to achieve a 12 times faster computing performance, also being scalable enough to efficiently run on GPUs.


2011 ◽  
Vol 121-126 ◽  
pp. 4675-4679
Author(s):  
Ming Wei Leng ◽  
Xiao Yun Chen ◽  
Jian Jun Cheng ◽  
Long Jie Li

In many data mining domains, labeled data is very expensive to generate, how to make the best use of labeled data to guide the process of unlabeled clustering is the core problem of semi-supervised clustering. Most of semi-supervised clustering algorithms require a certain amount of labeled data and need set the values of some parameters, different values maybe have different results. In view of this, a new algorithm, called semi-supervised clustering algorithm based on small size of labeled data, is presented, which can use the small size of labeled data to expand labeled dataset by labeling their k-nearest neighbors and only one parameter. We demonstrate our clustering algorithm with three UCI datasets, compared with SSDBSCAN[4] and KNN, the experimental results confirm that accuracy of our clustering algorithm is close to that of KNN classification algorithm.


Author(s):  
Kevin E. Voges

Cluster analysis is a fundamental data reduction technique used in the physical and social sciences. It is of potential interest to managers in Information Science, as it can be used to identify user needs though segmenting users such as Web site visitors. In addition, the theory of Rough sets is the subject of intense interest in computational intelligence research. The extension of this theory into rough clustering provides an important and potentially useful addition to the range of cluster analysis techniques available to the manager. Cluster analysis is defined as the grouping of “individuals or objects into clusters so that objects in the same cluster are more similar to one another than they are to objects in other clusters” (Hair, Black, Babin, Anderson, & Tatham, 2006). There are a number of comprehensive introductions to cluster analysis (Abonyi & Feil, 2007; Arabie, Hubert, & De Soete, 1994; Cramer, 2003; Everitt, Landau, & Leese, 2001; Gan, Ma, & Wu, 2007; Härdle & Hlávka, 2007). Techniques are often classified as hierarchical or nonhierarchical (Hair et al., 2006), and the most commonly used nonhierarchical technique is the k-means approach developed by MacQueen (1967). Recently, techniques based on developments in computational intelligence have also been used as clustering algorithms. For example, the theory of fuzzy sets developed by Zadeh (1965), which introduced the concept of partial set membership, has been applied to clustering (Abonyi & Feil, 2007; Dumitrescu, Lazzerini, & Jain, 2000). Another technique receiving considerable attention is the theory of rough sets (Pawlak, 1982), which has led to clustering algorithms referred to as rough clustering (do Prado, Engel, & Filho, 2002; Kumar, Krishna, Bapi, & De, 2007; Parmar, Wu, & Blackhurst, 2007; Voges, Pope, & Brown, 2002). This article provides brief introductions to k-means cluster analysis, rough sets theory, and rough clustering, and compares k-means clustering and rough clustering. It shows that rough clustering provides a more flexible solution to the clustering problem, and can be conceptualized as extracting concepts from the data, rather than strictly delineated subgroupings (Pawlak, 1991). Traditional clustering methods generate extensional descriptions of groups (i.e., which objects are members of each cluster), whereas clustering techniques based on rough sets theory generate intentional descriptions (i.e., what are the main characteristics of each cluster) (do Prado et al., 2002). These different goals suggest that both k-means clustering and rough clustering have their place in the data analyst’s and the information manager’s toolbox.


Author(s):  
Yasunori Endo ◽  
◽  
Arisa Taniguchi ◽  
Yukihiro Hamasuna ◽  
◽  
...  

Clustering is an unsupervised classification technique for data analysis. In general, each datum in real space is transformed into a point in a pattern space to apply clustering methods. Data cannot often be represented by a point, however, because of its uncertainty, e.g., measurement error margin and missing values in data. In this paper, we will introduce quadratic penalty-vector regularization to handle such uncertain data using Hard c-Means (HCM), which is one of the most typical clustering algorithms. We first propose a new clustering algorithm called hard c-means using quadratic penalty-vector regularization for uncertain data (HCMP). Second, we propose sequential extraction hard c-means using quadratic penalty-vector regularization (SHCMP) to handle datasets whose cluster number is unknown. Furthermore, we verify the effectiveness of our proposed algorithms through numerical examples.


2017 ◽  
Vol 13 (2) ◽  
pp. 1-12 ◽  
Author(s):  
Jungmok Ma

One of major obstacles in the application of the k-means clustering algorithm is the selection of the number of clusters k. The multi-attribute utility theory (MAUT)-based k-means clustering algorithm is proposed to tackle the problem by incorporating user preferences. Using MAUT, the decision maker's value structure for the number of clusters and other attributes can be quantitatively modeled, and it can be used as an objective function of the k-means. A target clustering problem for military targeting process is used to demonstrate the MAUT-based k-means and provide a comparative study. The result shows that the existing clustering algorithms do not necessarily reflect user preferences while the MAUT-based k-means provides a systematic framework of preferences modeling in cluster analysis.


Sign in / Sign up

Export Citation Format

Share Document