On the stability of software clustering algorithms

Author(s):  
V. Tzerpos ◽  
R.C. Holt
2016 ◽  
Vol 54 (3) ◽  
pp. 300 ◽  
Author(s):  
Mai Dinh Sinh ◽  
Le Hung Trinh ◽  
Ngo Thanh Long

This paper proposes a method of combining fuzzy probability and fuzzy clustering algorithm to classify on multispectral satellite images by relying on fuzzy probability to calculate the number of clusters and the centroid of clusters then using fuzzy clustering to classifying land-cover on the satellite image. In fact, the classification algorithms, the initialization of the clusters and the initial centroid of clusters have great influence on the stability of the algorithms, dealing time and classification results; the unsupervised classification algorithms such as k-Means, c-Means, Iso-data are used quite common for many problems, but the disadvantages is the low accuracy and unstable, especially when dealing with the problems on the satellite image. Results of the algorithm which are proposed show significant reduction of noise in the clusters and comparison with various clustering algorithms like k-means, iso-data, so on. 


Entropy ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. 951
Author(s):  
Jérémie Sublime ◽  
Guénaël Cabanes ◽  
Basarab Matei

The aim of collaborative clustering is to enhance the performances of clustering algorithms by enabling them to work together and exchange their information to tackle difficult data sets. The fundamental concept of collaboration is that clustering algorithms operate locally but collaborate by exchanging information about the local structures found by each algorithm. This kind of collaborative learning can be beneficial to a wide number of tasks including multi-view clustering, clustering of distributed data with privacy constraints, multi-expert clustering and multi-scale analysis. Within this context, the main difficulty of collaborative clustering is to determine how to weight the influence of the different clustering methods with the goal of maximizing the final results and minimizing the risk of negative collaborations—where the results are worse after collaboration than before. In this paper, we study how the quality and diversity of the different collaborators, but also the stability of the partitions can influence the final results. We propose both a theoretical analysis based on mathematical optimization, and a second study based on empirical results. Our findings show that on the one hand, in the absence of a clear criterion to optimize, a low diversity pool of solution with a high stability are the best option to ensure good performances. And on the other hand, if there is a known criterion to maximize, it is best to rely on a higher diversity pool of solution with a high quality on the said criterion. While our approach focuses on entropy based collaborative clustering, we believe that most of our results could be extended to other collaborative algorithms.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1141 ◽  
Author(s):  
Angelo Duò ◽  
Mark D. Robinson ◽  
Charlotte Soneson

Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 12 clustering algorithms, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using 9 publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. The R scripts providing an extensible framework for the evaluation of new methods and data sets are available on GitHub (https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison).


2012 ◽  
Vol 44 (1) ◽  
pp. 33-46 ◽  
Author(s):  
Mark Shtern ◽  
Vassilios Tzerpos

2021 ◽  
Author(s):  
Mohammed Saad Talib ◽  
Aslinda Hassan ◽  
Zuraida Abal Abas ◽  
Ali A Mohammed ◽  
Arif Razzaq ◽  
...  

Abstract The development of the technology and connected devices such as internet of things (IoT), internet of vehicles (IoV), and 5G motivate the researchers to give more attention in the field. Clustering is a key factor in vehicular ad-hoc network (VANET) where a number of vehicles join to form a group based on common characteristics. Vehicles are distinguished by their high mobility in ad hoc vehicle networks. Changes frequently occur in the topology of VANET, causing continuous failures in network communication. In such a dynamic environment, the creation and maintenance of a stable cluster are significant challenges. The evaluation of stability in VANET clustering is an important part to evaluate the clustering approaches. In this paper, a mathematical technique (based on the birth-death process) is created to evaluate the clusters stability based on the number of leaving and joining vehicles to each cluster after its creation. The stability of the created clusters is tested by checking the number of vehicles in each cluster at different successive times. These tests indicate the joining and leaving vehicles to each cluster and their effects on the cluster stability. When the results of the technique show that the standard deviation is small for each cluster, it can be concluded that the proposed clustering algorithm is able to achieve stability in cluster maintaining phase.


Sign in / Sign up

Export Citation Format

Share Document