Determination of the optimal number of clusters using a spectral clustering optimization

AbstractThe systematic processing of unstructured communication data as well as the milestone of pattern recognition in order to determine communication groups in negotiations bears many challenges in Machine Learning. In particular, the so-called curse of dimensionality makes the pattern recognition process demanding and requires further research in the negotiation environment. In this paper, various selected renowned clustering approaches are evaluated with regard to their pattern recognition potential based on high-dimensional negotiation communication data. A research approach is presented to evaluate the application potential of selected methods via a holistic framework including three main evaluation milestones: the determination of optimal number of clusters, the main clustering application, and the performance evaluation. Hence, quantified Term Document Matrices are initially pre-processed and afterwards used as underlying databases to investigate the pattern recognition potential of clustering techniques by considering the information regarding the optimal number of clusters and by measuring the respective internal as well as external performances. The overall research results show that certain cluster separations are recommended by internal and external performance measures by means of a holistic evaluation approach, whereas three of the clustering separations are eliminated based on the evaluation results.

Download Full-text

Determination of The Optimal Number of Clusters: A Fuzzy-set based Method

IEEE Transactions on Fuzzy Systems ◽

10.1109/tfuzz.2021.3118113 ◽

2021 ◽

pp. 1-1

Author(s):

Sy Dzung Nguyen ◽

Vu Song Thuy Nguyen ◽

Nhat Truong Pham

Keyword(s):

Fuzzy Set ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

Clustering optimization in RFM analysis Based on k-Means

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i1.pp470-477 ◽

2020 ◽

Vol 18 (1) ◽

pp. 470 ◽

Cited By ~ 1

Author(s):

Rendra Gustriansyah ◽

Nazori Suhandi ◽

Fery Antony

Keyword(s):

Market Segmentation ◽

Optimal Number ◽

Management Process ◽

Mining Method ◽

Stock Management ◽

Number Of Clusters ◽

Silhouette Index ◽

Clustering Optimization ◽

Product Segmentation ◽

Optimal Number Of Clusters

RFM stands for Recency, Frequency, and Monetary. RFM is a simple but effective method that can be applied to market segmentation. RFM analysis is used to analyze customer’s behavior which consists of how recently the customers have purchased (recency), how often customer’s purchases (frequency), and how much money customers spend (monetary). In this study, RFM analysis has been used for product segmentation is to be arrayed in terms of recent sales (R), frequent sales (F), and the total money spent (M) using the data mining method. This study has proposed a new procedure for RFM analysis (in product segmentation) using the k-Means method and eight indexes of validity to determine the optimal number of clusters namely Elbow Method, Silhouette Index, Calinski-Harabasz Index, Davies-Bouldin Index, Ratkowski Index, Hubert Index, Ball-Hall Index, and Krzanowski-Lai Index, which can improve the objectivity and similarity of data in product segmentation so that it can improve the accuracy of the stock management process. The evaluation results showed that the optimal number of clusters for the k-Means method applied in the RFM analysis consists of three clusters (segmentation) with a variance value of 0.19113.

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01995 ◽

2010 ◽

Vol 30 (8) ◽

pp. 1995-1998 ◽

Cited By ~ 18

Author(s):

Shi-bing ZHOU ◽

Zhen-yuan XU ◽

Xu-qing TANG

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Current Bioinformatics ◽

10.2174/1574893613666180601080008 ◽

2018 ◽

Vol 14 (1) ◽

pp. 11-23 ◽

Cited By ~ 3

Author(s):

Lin Zhang ◽

Yanling He ◽

Huaizhi Wang ◽

Hui Liu ◽

Yufei Huang ◽

...

Keyword(s):

Clustering Analysis ◽

Methylation Level ◽

Optimal Number ◽

Generative Model ◽

Methylation Data ◽

Sequencing Data ◽

Number Of Clusters ◽

Rna Methylation ◽

Clustering Effect ◽

Optimal Number Of Clusters

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Download Full-text

Determination of the optimal number of clusters using a spectral clustering optimization

Determination of the optimal number of clusters in harmonic data classification

A differential evolution algorithm based automatic determination of optimal number of clusters validated by fuzzy intercluster hostility index

Determination of Optimal Number of Clusters in Wireless Sensor Networks

Determination of Optimal Number of Clusters in Cure Using Representative Points

[OA142] Validation framework for automated determination of the optimal number of clusters in [F-18]FET-PET brain images

Analytical Comparison of Clustering Techniques for the Recognition of Communication Patterns

Determination of The Optimal Number of Clusters: A Fuzzy-set based Method

Clustering optimization in RFM analysis Based on k-Means

Method for determining optimal number of clusters in K-means clustering algorithm

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Export Citation Format