GIVING FUZZINESS TO SPATIAL CLUSTERS: A NEW INDEX FOR CHOOSING THE OPTIMAL NUMBER OF CLUSTERS

2013 ◽  
Vol 22 (03) ◽  
pp. 1350009 ◽  
Author(s):  
GEORGE GREKOUSIS

Choosing the optimal number of clusters is a key issue in cluster analysis. Especially when dealing with more spatial clustering, things tend to be more complicated. Cluster validation helps to determine the appropriate number of clusters present in a dataset. Furthermore, cluster validation evaluates and assesses the results of clustering algorithms. There are numerous methods and techniques for choosing the optimal number of clusters via crisp and fuzzy clustering. In this paper, we introduce a new index for fuzzy clustering to determine the optimal number of clusters. This index is not another metric for calculating compactness or separation among partitions. Instead, the index uses several existing indices to give a degree, or fuzziness, to the optimal number of clusters. In this way, not only do the objects in a fuzzy cluster get a membership value, but the number of clusters to be partitioned is given a value as well. The new index is used in the fuzzy c-means algorithm for the geodemographic segmentation of 285 postal codes.

2021 ◽  
pp. 1-16
Author(s):  
Aikaterini Karanikola ◽  
Charalampos M. Liapis ◽  
Sotiris Kotsiantis

In short, clustering is the process of partitioning a given set of objects into groups containing highly related instances. This relation is determined by a specific distance metric with which the intra-cluster similarity is estimated. Finding an optimal number of such partitions is usually the key step in the entire process, yet a rather difficult one. Selecting an unsuitable number of clusters might lead to incorrect conclusions and, consequently, to wrong decisions: the term “optimal” is quite ambiguous. Furthermore, various inherent characteristics of the datasets, such as clusters that overlap or clusters containing subclusters, will most often increase the level of difficulty of the task. Thus, the methods used to detect similarities and the parameter selection of the partition algorithm have a major impact on the quality of the groups and the identification of their optimal number. Given that each dataset constitutes a rather distinct case, validity indices are indicators introduced to address the problem of selecting such an optimal number of clusters. In this work, an extensive set of well-known validity indices, based on the approach of the so-called relative criteria, are examined comparatively. A total of 26 cluster validation measures were investigated in two distinct case studies: one in real-world and one in artificially generated data. To ensure a certain degree of difficulty, both real-world and generated data were selected to exhibit variations and inhomogeneity. Each of the indices is being deployed under the schemes of 9 different clustering methods, which incorporate 5 different distance metrics. All results are presented in various explanatory forms.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Yongli Liu ◽  
Xiaoyang Zhang ◽  
Jingli Chen ◽  
Hao Chao

Because traditional fuzzy clustering validity indices need to specify the number of clusters and are sensitive to noise data, we propose a validity index for fuzzy clustering, named CSBM (compactness separateness bipartite modularity), based on bipartite modularity. CSBM enhances the robustness by combining intraclass compactness and interclass separateness and can automatically determine the optimal number of clusters. In order to estimate the performance of CSBM, we carried out experiments on six real datasets and compared CSBM with other six prominent indices. Experimental results show that the CSBM index performs the best in terms of robustness while accurately detecting the number of clusters.


2017 ◽  
Vol 65 (4) ◽  
pp. 359-365 ◽  
Author(s):  
Javier Senent-Aparicio ◽  
Jesús Soto ◽  
Julio Pérez-Sánchez ◽  
Jorge Garrido

AbstractOne of the most important problems faced in hydrology is the estimation of flood magnitudes and frequencies in ungauged basins. Hydrological regionalisation is used to transfer information from gauged watersheds to ungauged watersheds. However, to obtain reliable results, the watersheds involved must have a similar hydrological behaviour. In this study, two different clustering approaches are used and compared to identify the hydrologically homogeneous regions. Fuzzy C-Means algorithm (FCM), which is widely used for regionalisation studies, needs the calculation of cluster validity indices in order to determine the optimal number of clusters. Fuzzy Minimals algorithm (FM), which presents an advantage compared with others fuzzy clustering algorithms, does not need to know a priori the number of clusters, so cluster validity indices are not used. Regional homogeneity test based on L-moments approach is used to check homogeneity of regions identified by both cluster analysis approaches. The validation of the FM algorithm in deriving homogeneous regions for flood frequency analysis is illustrated through its application to data from the watersheds in Alto Genil (South Spain). According to the results, FM algorithm is recommended for identifying the hydrologically homogeneous regions for regional frequency analysis.


2010 ◽  
Vol 13 (4) ◽  
pp. 652-660 ◽  
Author(s):  
M. J. Monem ◽  
S. M. Hashemy

Improving the current operation and maintenance activities is one of the main steps in achieving higher performance of irrigation networks. Improving the irrigation network management, influenced by different spatial and temporal parameters, is confronted with special difficulties. One of the controversial issues often faced by decision-makers is how to cope with the spatial diversity of irrigation systems. Homogeneous area detection out of the irrigation networks could improve the current management of networks. The idea behind this research is to present a quantitative benchmark for exploring the homogeneous areas with similar physical attributes out of the network region. Five physical attributes, such as length, capacity, number of intakes, number of conveyance structures and the covered irrigated area for each canal reach, are used for spatial clustering. Two fuzzy clustering algorithms, namely FCM and GK, are applied to the Ghazvin irrigation network. Using a clustering validity index, SC, shows that the GK algorithm is the more appropriate tool for clustering of the considered dataset. According to the results the optimal number of clusters for the Ghazvin irrigation project is derived as nine clusters and the irrigated district is classified into nine homogeneous areas. Physical homogeneous regions provide a context for better and easier decision-making.


2021 ◽  
Author(s):  
Congming Shi ◽  
Bingtao Wei ◽  
Shoulin Wei ◽  
Wen Wang ◽  
Hai Liu ◽  
...  

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.


2021 ◽  
Vol 3 (4) ◽  
pp. 327-334
Author(s):  
Difa Lazuardi Aditya ◽  
Devi Fitrianah

The customer is a stakeholder for a business, to maintain and increase customer enthusiasm and develop it for the company's performance, it is necessary to do customer segmentation which aims to find out potential customers. This study uses purchase transaction data from Brand Limback customers in the period 2020. The use of RFM (Recency, Frecuency, Monetary) analysis helps in determining the attributes used for customer segmentation. To determine the optimal number of clusters from the RFM dataset, the Elbow method is applied. The datasets generated from RFM are grouped using the Fuzzy C-Means and K-Means algorithms, the two algorithms will compare the quality in the formation of clusters using the Silhoutte Coefficient and Davies-Bouldin Index methods. Customer segmentation from the RFM dataset that has been clustered produces 7 optimal clusters, namely Cluster 0 is a bronze customer. Cluster 1 is a silver customer. Cluster 2 is a gold customer. Cluster 3 is a platinum customer. Cluster 4 is a diamond customer. Cluster 5 is a super customer, and cluster 6 is a superstar customer. The cluster validation of k-means using the silhouette coefficient produces a value of 0.934 while the Davies bouldin index produces a value of 0.155 and the validation results of the fuzzy c-means algorithm using the silhouette coefficient produces a value of 0.921 while the Davies bouldin index produces a value of 0.145.


2016 ◽  
Vol 25 (02) ◽  
pp. 1650003
Author(s):  
S. Revathy ◽  
B. Parvathavarthini ◽  
S. Shiny Caroline

Cluster validation is an essential technique in all cluster applications. Several validation methods measure the accuracy of cluster structure. Typical methods are geometric, where only distance and membership form the core of validation. Yao's decision theory is a novel approach for cluster validation, which evolved loss calculations and probabilistic based measure for determining the cluster quality. Conventional rough set algorithms have utilized this validity measure. This paper propagates decision theory, an unprecedented validation scheme for Rough-Fuzzy clustering by resolving loss and probability calculations to predict the risk measure in clustering techniques. Experiments with synthetic and UCI datasets have been performed, proven to deduce the optimal number of clusters overcoming the downsides of traditional validation frameworks. The proposed index can also be applied to other clustering algorithms and extends the usefulness in business oriented data mining.


2010 ◽  
Vol 40-41 ◽  
pp. 174-182
Author(s):  
Wei Jin Chen ◽  
Huai Lin Dong ◽  
Qing Feng Wu ◽  
Ling Lin

The evaluation of clustering validity is important for clustering analysis, and is one of the hottest spots of cluster analysis. The quality of the evaluation of clustering is that optimal number of clusters is reasonable. For fuzzy clustering, the paper surveys the widely known fuzzy clustering validity evaluation based on the methods of fuzzy partition, geometry structure and statistics.


Sign in / Sign up

Export Citation Format

Share Document