Method for determining optimal number of clusters in <I>K</I>-means clustering algorithm

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.

Download Full-text

A clustering algorithm for solving the vehicle routing assignment problem in polynomial time

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i1.22231 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1

Author(s):

L. W. Rizkallah ◽

M. F. Ahmed ◽

N. M. Darwish

Keyword(s):

Vehicle Routing ◽

Polynomial Time ◽

Vehicle Routing Problem ◽

Assignment Problem ◽

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Routing Problem ◽

Major Increase ◽

Optimal Number Of Clusters

The Vehicle Routing Problem (VRP) consists of a group of customers that needs to be served. Each customer has a certain demand of goods. A central depot having a fleet of vehicles is responsible for supplying the customers with their demands. The problem is composed of two sub-problems: The first sub-problem is an assignment problem where both the vehicles that will be used as well as the customers assigned to each vehicle are determined. The second sub-problem is the routing problem in which for each vehicle having a number of cus-tomers assigned to it, the order of visits of the customers is determined. Optimal number of vehicles as well as optimal total distance should be achieved. In this paper, an approach for solving the first sub-problem, the assignment problem, is presented. In the approach, a clustering algorithm is proposed for finding the optimal number of vehicles by grouping the customers into clusters where each cluster is visited by one vehicle. This work presents a polynomial time clustering algorithm for finding the optimal number of clusters. Also, a solution to the assignment problem is provided. The proposed approach was evaluated using Solomon’s C1 benchmarks where it reached optimal number of clusters for all the benchmarks in this category. The proposed approach succeeds in solving the assignment problem in VRP achieving a solving time that surpasses the state-of-the-art approaches provided in the literature. It also provides a means of working with varying num-ber of customers without major increase in solving time.

Download Full-text

The Genetic Structure of Slovak Spotted Cattle Based on Genome-wide Analysis

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun202068010057 ◽

2020 ◽

Vol 68 (1) ◽

pp. 57-61

Author(s):

Kristína Lehocká ◽

Barbora Olšanská ◽

Radovan Kasarda ◽

Ondrej Kadlečík ◽

Anna Trakovická ◽

...

Keyword(s):

Bayesian Analysis ◽

Clustering Algorithm ◽

Information Criterion ◽

Optimal Number ◽

Number Of Clusters ◽

Membership Probability ◽

Production Type ◽

Genome Wide ◽

Genetic Clusters ◽

Optimal Number Of Clusters

The objective of the study was to determine the membership probability and level of admixture among Slovak Spotted cattle and historically related breeds (Ayshire, Holstein, Swiss Simmental and Slovak Pinzgau). The analysis was based on the panel of 35 934 SNPs that were used for genotyping of 423 individuals. The optimal number of clusters was estimated in two ways; by analysis of Bayesian information criterion and Bayesian clustering algorithm. The optimal number of clusters ranged from 3 to 5, depending on the applied approach. Subsequently, the population structure was tested by discriminant analysis of principal components (DAPC) and unsupervised Bayesian analysis based on the correlated allele frequencies model. The first discriminant function revealed three genetic clusters in population resulting from the production type and origin of analysed breeds. The unsupervised Bayesian analysis showed similar results, where the highest level of admixture was found between Slovak Pinzgau and Slovak Spotted cattle (0.6%). Despite that, the results of this study clearly showed that the Slovak Spotted cattle is genetically separated from other breeds that were involved in its grading-up process.

Download Full-text

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

10.21203/rs.3.rs-58011/v1 ◽

2020 ◽

Author(s):

Congming Shi ◽

Bingtao Wei ◽

Shoulin Wei ◽

Wen Wang ◽

Hai Liu ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Machine Learning Method ◽

Learning Method ◽

Cluster Number ◽

Number Of Clusters ◽

Optimal Cluster ◽

Better Than ◽

Optimal Number Of Clusters

Abstract Clustering, as a traditional machine learning method, is still playing a significant role in data analysis. The most of clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on manual identification of the elbow points on the visualization curve, which will lead to the experienced analysts not being able to clearly identify the elbow point from the plotted curve when the plotted curve being fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to work out a statistical metric estimating an optimal cluster number when clustering on a dataset. Firstly, the average degree of distortion obtained by Elbow method is normalized to the range of 0 to10; Secondly, the normalized results are used to calculate Cosine of intersection angles between elbow points; Thirdly, the above calculated Cosine of intersection angles and Arccosine theorem are used to compute the intersection angles between elbow points; Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a public well-known dataset demonstrated that the estimated optimal cluster number output by our newly proposed method is better than widely used Silhouette method.

Download Full-text

Improved FCM Algorithm Based on K-Means and Granular Computing

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0119 ◽

2015 ◽

Vol 24 (2) ◽

pp. 215-222 ◽

Cited By ~ 2

Author(s):

Wei Jia Lu ◽

Zhuang Zhi Yan

Keyword(s):

Computational Complexity ◽

Granular Computing ◽

Clustering Algorithm ◽

Research Area ◽

Optimal Number ◽

Number Of Clusters ◽

Fcm Algorithm ◽

Fuzzy Algorithms ◽

Strong Representation ◽

Optimal Number Of Clusters

AbstractThe fuzzy clustering algorithm has been widely used in the research area and production and life. However, the conventional fuzzy algorithms have a disadvantage of high computational complexity. This article proposes an improved fuzzy C-means (FCM) algorithm based on K-means and principle of granularity. This algorithm is aiming at solving the problems of optimal number of clusters and sensitivity to the data initialization in the conventional FCM methods. The initialization stage of the K-medoid cluster, which is different from others, has a strong representation and is capable of detecting data with different sizes. Meanwhile, through the combination of the granular computing and FCM, the optimal number of clusters is obtained by choosing accurate validity functions. Finally, the detailed clustering process of the proposed algorithm is presented, and its performance is validated by simulation tests. The test results show that the proposed improved FCM algorithm has enhanced clustering performance in the computational complexity, running time, cluster effectiveness compared with the existing FCM algorithms.

Download Full-text

Clustering algorithm based on optimal number of clusters for wireless multimedia sensor networks

International Conference on Cyberspace Technology (CCT 2014) ◽

10.1049/cp.2014.1346 ◽

2014 ◽

Author(s):

Cao Jian ◽

Han Zenghong ◽

Qin Shaohua ◽

Sun Yan

Keyword(s):

Sensor Networks ◽

Clustering Algorithm ◽

Optimal Number ◽

Wireless Multimedia Sensor Networks ◽

Wireless Multimedia ◽

Number Of Clusters ◽

Multimedia Sensor Networks ◽

Optimal Number Of Clusters

Download Full-text

Improved Fuzzy C-Means Based on the Optimal Number of Clusters

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.392.803 ◽

2013 ◽

Vol 392 ◽

pp. 803-807 ◽

Cited By ~ 1

Author(s):

Xue Bo Feng ◽

Fang Yao ◽

Zhi Gang Li ◽

Xiao Jing Yang

Keyword(s):

Convergence Rate ◽

Clustering Algorithm ◽

Optimal Number ◽

Data Set ◽

Number Of Clusters ◽

Fuzzy C Means ◽

Initial Cluster ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Optimal Number Of Clusters

According to the number of cluster centers, initial cluster centers, fuzzy factor, iterations and threshold, Fuzzy C-means clustering algorithm (FCM) clusters the data set. FCM will encounter the initialization problem of clustering prototype. Firstly, the article combines the maximum and minimum distance algorithm and K-means algorithm to determine the number of clusters and the initial cluster centers. Secondly, the article determines the optimal number of clusters with Silhouette indicators. Finally, the article improves the convergence rate of FCM by revising membership constantly. The improved FCM has good clustering effect, enhances the optimized capability, and improves the efficiency and effectiveness of the clustering. It has better tightness in the class, scatter among classes and cluster stability and faster convergence rate than the traditional FCM clustering method.

Download Full-text

Load balancing for Software Defined Network using Machine learning

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.876 ◽

2021 ◽

Vol 12 (2) ◽

pp. 527-535

Author(s):

Aashish kumar, Et. al.

Keyword(s):

Machine Learning ◽

Load Balancing ◽

Clustering Algorithm ◽

Optimal Number ◽

Software Defined Network ◽

Huge Number ◽

Number Of Clusters ◽

Intelligent Technique ◽

Dbscan Algorithm ◽

Optimal Number Of Clusters

Software-Defined Networking is one of the most revolutionary and prominent technology in the field of networking. It solves the problem that our traditional network faces. Still it can face a problem of bottleneck and can be overloaded. To overcome this issue, various researcher has it given various works but they are based on two or three-parameter to perform load balancing and also they are static or dynamic. We have proposed an intelligent technique that forwards the packet i.e. TCP/UDP packet traffic based on several parameters (based on 12 parameters discussed in the latter part of this section). Based on these parameters, we have applied the trained machine using KMeans [1] and DBSCAN [2] clustering algorithm and also determine the optimal number of clusters. We have tested it on the huge number of packet that are 5000, 10000, 20000, 50000, 100000, 10000000.We have also compared there results of the KMeans and DBSCAN algorithm and also discussed researchers view

Download Full-text

Grouping of Districts Based on Poverty Factors in Papua Province Uses The K-Medoids Algorithm

Enthusiastic : International Journal of Applied Statistics and Data Science ◽

10.20885/enthusiastic.vol1.iss2.art6 ◽

2021 ◽

Vol 1 (2) ◽

pp. 94-102

Author(s):

Afdelia Novianti ◽

Irsyifa Mayzela Afnan ◽

Rafi Ilmi Badri Utama ◽

Edy Widodo

Keyword(s):

Life Expectancy ◽

Clustering Algorithm ◽

Agricultural Sector ◽

Optimal Number ◽

Literacy Rate ◽

Number Of Clusters ◽

The Third ◽

Education And Employment ◽

Poor Population ◽

Optimal Number Of Clusters

Poverty is an essential issue for every country, including Indonesia. Poverty can be caused by the scarcity of basic necessities or the difficulty of accessing education and employment. In 2019 Papua Province became the province with the highest poverty percentage at 27.53%. Seeing this, the district groupings formed in describing poverty conditions in Papua Province are based on similar characteristics using the variables Percentage of Poor Population, Gross Regional Domestic Product, Open Unemployment Rate, Life Expectancy, Literacy Rate, and Population Working in the Agricultural Sector using K-medoids clustering algorithm. The results of this study indicate that the optimal number of clusters to describe poverty conditions in Papua Province is 4 clusters with a variance of 0.012, where the first cluster consists of 10 districts, the second cluster consists of 5 districts, the third cluster consists of 12 districts, and the fourth cluster consists of 2 districts.

Download Full-text

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

10.21203/rs.3.rs-58011/v2 ◽

2020 ◽

Author(s):

Congming Shi ◽

Bingtao Wei ◽

Shoulin Wei ◽

Wen Wang ◽

Hai Liu ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Machine Learning Method ◽

Learning Method ◽

Cluster Number ◽

Number Of Clusters ◽

Optimal Cluster ◽

Better Than ◽

Optimal Number Of Clusters

Abstract Clustering, as a traditional machine learning method, is still playing a significant role in data analysis. The most of clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on manual identification of the elbow points on the visualization curve, which will lead to the experienced analysts not being able to clearly identify the elbow point from the plotted curve when the plotted curve being fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to work out a statistical metric estimating an optimal cluster number when clustering on a dataset. Firstly, the average degree of distortion obtained by Elbow method is normalized to the range of 0 to10; Secondly, the normalized results are used to calculate Cosine of intersection angles between elbow points; Thirdly, the above calculated Cosine of intersection angles and Arccosine theorem are used to compute the intersection angles between elbow points; Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a public well-known dataset (Iris Dataset) demonstrated that the estimated optimal cluster number output by our newly proposed method is better than widely used Silhouette method.

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

A clustering algorithm for solving the vehicle routing assignment problem in polynomial time

The Genetic Structure of Slovak Spotted Cattle Based on Genome-wide Analysis

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

Improved FCM Algorithm Based on K-Means and Granular Computing

Clustering algorithm based on optimal number of clusters for wireless multimedia sensor networks

Improved Fuzzy C-Means Based on the Optimal Number of Clusters

Load balancing for Software Defined Network using Machine learning

Grouping of Districts Based on Poverty Factors in Papua Province Uses The K-Medoids Algorithm

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

Export Citation Format