An Ensemble of Locally Reliable Cluster Solutions

Clustering ensemble indicates to an approach in which a number of (usually weak) base clusterings are performed and their consensus clustering is used as the final clustering. Knowing democratic decisions are better than dictatorial decisions, it seems clear and simple that ensemble (here, clustering ensemble) decisions are better than simple model (here, clustering) decisions. But it is not guaranteed that every ensemble is better than a simple model. An ensemble is considered to be a better ensemble if their members are valid or high-quality and if they participate according to their qualities in constructing consensus clustering. In this paper, we propose a clustering ensemble framework that uses a simple clustering algorithm based on kmedoids clustering algorithm. Our simple clustering algorithm guarantees that the discovered clusters are valid. From another point, it is also guaranteed that our clustering ensemble framework uses a mechanism to make use of each discovered cluster according to its quality. To do this mechanism an auxiliary ensemble named reference set is created by running several kmeans clustering algorithms.

Download Full-text

An Ensemble Clusterer Framework based on Valid and Diverse Basic Small Clusters

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622021500309 ◽

2021 ◽

pp. 1-31

Author(s):

Tao Sun ◽

Saeed Mashdour ◽

Mohammad Reza Mahmoudi

Keyword(s):

Clustering Algorithm ◽

State Of The Art ◽

Clustering Algorithms ◽

Clustering Ensemble ◽

High Quality ◽

Consensus Function ◽

Consensus Functions ◽

Consensus Partition ◽

Small Clusters ◽

Minimum Quality

Clustering ensemble is a new problem where it is aimed to extract a clustering out of a pool of base clusterings. The pool of base clusterings is sometimes referred to as ensemble. An ensemble is to be considered to be a suitable one, if its members are diverse and any of them has a minimum quality. The method that maps an ensemble into an output partition (called also as consensus partition) is named consensus function. The consensus function should find a consensus partition that all of the ensemble members agree on it as much as possible. In this paper, a novel clustering ensemble framework that guarantees generation of a pool of the base clusterings with the both conditions (diversity among ensemble members and high-quality members) is introduced. According to its limitations, a novel consensus function is also introduced. We experimentally show that the proposed clustering ensemble framework is scalable, efficient and general. Using different base clustering algorithms, we show that our improved base clustering algorithm is better. Also, among different consensus functions, we show the effectiveness of our consensus function. Finally, comparing with the state of the art, we find that the clustering ensemble framework is comparable or even better in terms of scalability and efficacy.

Download Full-text

Clustering Ensemble for Identifying Defective Wafer Bin Map in Semiconductor Manufacturing

Mathematical Problems in Engineering ◽

10.1155/2015/707358 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Chia-Yu Hsu

Keyword(s):

Semiconductor Manufacturing ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Human Vision ◽

Clustering Ensemble ◽

Data Space ◽

Ensemble Approach ◽

Squared Error ◽

Defect Pattern ◽

Bin Map

Wafer bin map (WBM) represents specific defect pattern that provides information for diagnosing root causes of low yield in semiconductor manufacturing. In practice, most semiconductor engineers use subjective and time-consuming eyeball analysis to assess WBM patterns. Given shrinking feature sizes and increasing wafer sizes, various types of WBMs occur; thus, relying on human vision to judge defect patterns is complex, inconsistent, and unreliable. In this study, a clustering ensemble approach is proposed to bridge the gap, facilitating WBM pattern extraction and assisting engineer to recognize systematic defect patterns efficiently. The clustering ensemble approach not only generates diverse clusters in data space, but also integrates them in label space. First, the mountain function is used to transform data by using pattern density. Subsequently,k-means and particle swarm optimization (PSO) clustering algorithms are used to generate diversity partitions and various label results. Finally, the adaptive response theory (ART) neural network is used to attain consensus partitions and integration. An experiment was conducted to evaluate the effectiveness of proposed WBMs clustering ensemble approach. Several criterions in terms of sum of squared error, precision, recall, andF-measure were used for evaluating clustering results. The numerical results showed that the proposed approach outperforms the other individual clustering algorithm.

Download Full-text

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

10.21203/rs.3.rs-58011/v3 ◽

2021 ◽

Author(s):

Congming Shi ◽

Bingtao Wei ◽

Shoulin Wei ◽

Wen Wang ◽

Hai Liu ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Machine Learning Method ◽

Cluster Number ◽

Number Of Clusters ◽

Public Dataset ◽

Optimal Cluster ◽

Better Than ◽

Optimal Number Of Clusters

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.

Download Full-text

Spectral Clustering Based on Sparse Representation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3822 ◽

2014 ◽

Vol 556-562 ◽

pp. 3822-3826

Author(s):

Chen Xiao Hu ◽

Xian Chun Zou

Keyword(s):

Sparse Representation ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Similarity Metrics ◽

Distance Metrics ◽

Information Propagation ◽

Discriminative Ability ◽

Two Samples ◽

Better Than

Spectral clustering is an efficient clustering algorithm based the information propagation between neighborhood nodes. Its performance is largely dependent on the distance metrics, thus it is possible to boost its performance by adapting more reliable distance metric. Given the advantages of sparse representation in discriminative ability, robust to noisy and more faithfully to measure the similarity between two samples, we propose an sparse representation algorithm based on sparse representation. The experimental study on several datasets shows that, the proposed algorithm performs better than the sparse clustering algorithms based on other similarity metrics.

Download Full-text

From Paths to Routes: A Method for Path Classification

Frontiers in Behavioral Neuroscience ◽

10.3389/fnbeh.2020.610560 ◽

2021 ◽

Vol 14 ◽

Author(s):

Andrea Gonsek ◽

Manon Jeschke ◽

Silvia Rönnau ◽

Olivier J. N. Bertrand

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Distance Functions ◽

Consensus Clustering ◽

Time Warping ◽

Cluttered Environment ◽

Qualitative Classification ◽

One Step ◽

Novel Method ◽

Dynamic Time

Many animals establish, learn and optimize routes between locations to commute efficiently. One step in understanding route following is defining measures of similarities between the paths taken by the animals. Paths have commonly been compared by using several descriptors (e.g., the speed, distance traveled, or the amount of meandering) or were visually classified into categories by the experimenters. However, similar quantities obtained from such descriptors do not guarantee similar paths, and qualitative classification by experimenters is prone to observer biases. Here we propose a novel method to classify paths based on their similarity with different distance functions and clustering algorithms based on the trajectories of bumblebees flying through a cluttered environment. We established a method based on two distance functions (Dynamic Time Warping and Fréchet Distance). For all combinations of trajectories, the distance was calculated with each measure. Based on these distance values, we grouped similar trajectories by applying the Monte Carlo Reference-Based Consensus Clustering algorithm. Our procedure provides new options for trajectory analysis based on path similarities in a variety of experimental paradigms.

Download Full-text

Service Partition Method Based on Particle Swarm Fuzzy Clustering

Wireless Communications and Mobile Computing ◽

10.1155/2021/7225552 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Hong Xia ◽

Qingyi Dong ◽

Hui Gao ◽

Yanping Chen ◽

ZhongMin Wang

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Particle Swarm ◽

Cluster Center ◽

Fuzzy Clustering Algorithm ◽

Partition Method ◽

Service Data ◽

Optimal Cluster ◽

Better Than

It is difficult to accurately classify a service into specific service clusters for the multirelationships between services. To solve this problem, this paper proposes a service partition method based on particle swarm fuzzy clustering, which can effectively consider multirelationships between services by using a fuzzy clustering algorithm. Firstly, the algorithm for automatically determining the number of clusters is to determine the number of service clusters based on the density of the service core point. Secondly, the fuzzy c -means combined with particle swarm optimization algorithm to find the optimal cluster center of the service. Finally, the fuzzy clustering algorithm uses the improved Gram-cosine similarity to obtain the final results. Extensive experiments on real web service data show that our method is better than mainstream clustering algorithms in accuracy.

Download Full-text

Cluster Ensemble and Multi-Objective Clustering Methods

Pattern Recognition Technologies and Applications ◽

10.4018/978-1-59904-807-9.ch015 ◽

2008 ◽

pp. 325-343 ◽

Cited By ~ 4

Author(s):

Katti Faceli ◽

Andre C.P.L.F. de Carvalho ◽

Marcilio C.P. de Souto

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Ensemble Methods ◽

Data Exploration ◽

Clustering Methods ◽

Clustering Ensemble ◽

Cluster Ensemble ◽

Multi Objective ◽

Ensemble Algorithm ◽

New Algorithms

Clustering is an important tool for data exploration. Several clustering algorithms exist, and new algorithms are frequently proposed in the literature. These algorithms have been very successful in a large number of real-world problems. However, there is no clustering algorithm, optimizing only a single criterion, able to reveal all types of structures (homogeneous or heterogeneous) present in a dataset. In order to deal with this problem, several multi-objective clustering and cluster ensemble methods have been proposed in the literature, including our multi-objective clustering ensemble algorithm. In this chapter, we present an overview of these methods, which, to a great extent, are based on the combination of various aspects of traditional clustering algorithms.

Download Full-text

Exploring performance of clustering methods on document sentiment analysis

Journal of Information Science ◽

10.1177/0165551515617374 ◽

2016 ◽

Vol 43 (1) ◽

pp. 54-74 ◽

Cited By ~ 14

Author(s):

Baojun Ma ◽

Hua Yuan ◽

Ye Wu

Keyword(s):

Sentiment Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Studies ◽

Experimental Results ◽

Clustering Methods ◽

Term Weighting ◽

Weighting Method ◽

Clustering Techniques ◽

Better Than

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.

Download Full-text

A SEQUENCE-ELEMENT-BASED HIERARCHICAL CLUSTERING ALGORITHM FOR CATEGORICAL SEQUENCE DATA

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622005001398 ◽

2005 ◽

Vol 04 (01) ◽

pp. 81-96 ◽

Cited By ~ 5

Author(s):

SEUNG-JOON OH ◽

JAE-YEARN KIM

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Sequence Data ◽

Clustering Algorithms ◽

Scientific Data ◽

Sequence Element ◽

Hierarchical Clustering Algorithm ◽

Synthetic Datasets ◽

Better Than

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, few existing clustering algorithms consider sequentiality. In this paper, we study how to cluster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences. In the proposed measure, subsets of a sequence are considered, and the more identical subsets there are, the more similar the two sequences. In addition, we propose a hierarchical clustering algorithm and an efficient method for measuring similarity. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.

Download Full-text

Identify High-Quality Protein Structural Models by EnhancedK-Means

BioMed Research International ◽

10.1155/2017/7294519 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Hongjie Wu ◽

Haiou Li ◽

Min Jiang ◽

Cheng Chen ◽

Qiang Lv ◽

...

Keyword(s):

Structure Prediction ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Structural Models ◽

Comparative Modeling ◽

Critical Issue ◽

Dimensional Structure ◽

Structure Identification ◽

High Quality ◽

High Quality Protein

Background.One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases.Results.Here, we proposed two enhancedK-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basicK-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed thatSK-means andK-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER.Conclusions.We observed that the classicK-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. BothSK-means andK-means++ demonstrated substantial improvements relative to results from SPICKER and classicalK-means.

Download Full-text