Determination of the optimal number of clusters using a spectral clustering optimization

2016 ◽  
Vol 65 ◽  
pp. 304-314 ◽  
Author(s):  
Angel Mur ◽  
Raquel Dormido ◽  
Natividad Duro ◽  
Sebastian Dormido-Canto ◽  
Jesús Vega
2018 ◽  
Vol 6 (2) ◽  
pp. 313-320 ◽  
Author(s):  
Khumukcham Robindro ◽  
◽  
Bisheshwar Khumukcham ◽  
Ksh. Nilakanta Singh ◽  
◽  
...  

2018 ◽  
Vol 52 ◽  
pp. 54
Author(s):  
Elham Yousefzadeh ◽  
Bedor Abualhaj ◽  
Karen-Anett Büsing ◽  
Peter Kletting ◽  
Gerhard Glatting

Author(s):  
Muhammed-Fatih Kaya ◽  
Mareike Schoop

AbstractThe systematic processing of unstructured communication data as well as the milestone of pattern recognition in order to determine communication groups in negotiations bears many challenges in Machine Learning. In particular, the so-called curse of dimensionality makes the pattern recognition process demanding and requires further research in the negotiation environment. In this paper, various selected renowned clustering approaches are evaluated with regard to their pattern recognition potential based on high-dimensional negotiation communication data. A research approach is presented to evaluate the application potential of selected methods via a holistic framework including three main evaluation milestones: the determination of optimal number of clusters, the main clustering application, and the performance evaluation. Hence, quantified Term Document Matrices are initially pre-processed and afterwards used as underlying databases to investigate the pattern recognition potential of clustering techniques by considering the information regarding the optimal number of clusters and by measuring the respective internal as well as external performances. The overall research results show that certain cluster separations are recommended by internal and external performance measures by means of a holistic evaluation approach, whereas three of the clustering separations are eliminated based on the evaluation results.


Author(s):  
Rendra Gustriansyah ◽  
Nazori Suhandi ◽  
Fery Antony

RFM stands for Recency, Frequency, and Monetary. RFM is a simple but effective method that can be applied to market segmentation. RFM analysis is used to analyze customer’s behavior which consists of how recently the customers have purchased (recency), how often customer’s purchases (frequency), and how much money customers spend (monetary). In this study, RFM analysis has been used for product segmentation is to be arrayed in terms of recent sales (R), frequent sales (F), and the total money spent (M) using the data mining method. This study has proposed a new procedure for RFM analysis (in product segmentation) using the k-Means method and eight indexes of validity to determine the optimal number of clusters namely Elbow Method, Silhouette Index, Calinski-Harabasz Index, Davies-Bouldin Index, Ratkowski Index, Hubert Index, Ball-Hall Index, and Krzanowski-Lai Index, which can improve the objectivity and similarity of data in product segmentation so that it can improve the accuracy of the stock management process. The evaluation results showed that the optimal number of clusters for the k-Means method applied in the RFM analysis consists of three clusters (segmentation) with a variance value of 0.19113.


2018 ◽  
Vol 14 (1) ◽  
pp. 11-23 ◽  
Author(s):  
Lin Zhang ◽  
Yanling He ◽  
Huaizhi Wang ◽  
Hui Liu ◽  
Yufei Huang ◽  
...  

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. <P><P> Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. <P><P> Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. <P><P> Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. <P><P> Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. <P><P> Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.


Sign in / Sign up

Export Citation Format

Share Document