cluster validation
Recently Published Documents


TOTAL DOCUMENTS

159
(FIVE YEARS 24)

H-INDEX

20
(FIVE YEARS 1)

2021 ◽  
pp. 1-16
Author(s):  
Aikaterini Karanikola ◽  
Charalampos M. Liapis ◽  
Sotiris Kotsiantis

In short, clustering is the process of partitioning a given set of objects into groups containing highly related instances. This relation is determined by a specific distance metric with which the intra-cluster similarity is estimated. Finding an optimal number of such partitions is usually the key step in the entire process, yet a rather difficult one. Selecting an unsuitable number of clusters might lead to incorrect conclusions and, consequently, to wrong decisions: the term “optimal” is quite ambiguous. Furthermore, various inherent characteristics of the datasets, such as clusters that overlap or clusters containing subclusters, will most often increase the level of difficulty of the task. Thus, the methods used to detect similarities and the parameter selection of the partition algorithm have a major impact on the quality of the groups and the identification of their optimal number. Given that each dataset constitutes a rather distinct case, validity indices are indicators introduced to address the problem of selecting such an optimal number of clusters. In this work, an extensive set of well-known validity indices, based on the approach of the so-called relative criteria, are examined comparatively. A total of 26 cluster validation measures were investigated in two distinct case studies: one in real-world and one in artificially generated data. To ensure a certain degree of difficulty, both real-world and generated data were selected to exhibit variations and inhomogeneity. Each of the indices is being deployed under the schemes of 9 different clustering methods, which incorporate 5 different distance metrics. All results are presented in various explanatory forms.


2021 ◽  
Vol 3 (4) ◽  
pp. 327-334
Author(s):  
Difa Lazuardi Aditya ◽  
Devi Fitrianah

The customer is a stakeholder for a business, to maintain and increase customer enthusiasm and develop it for the company's performance, it is necessary to do customer segmentation which aims to find out potential customers. This study uses purchase transaction data from Brand Limback customers in the period 2020. The use of RFM (Recency, Frecuency, Monetary) analysis helps in determining the attributes used for customer segmentation. To determine the optimal number of clusters from the RFM dataset, the Elbow method is applied. The datasets generated from RFM are grouped using the Fuzzy C-Means and K-Means algorithms, the two algorithms will compare the quality in the formation of clusters using the Silhoutte Coefficient and Davies-Bouldin Index methods. Customer segmentation from the RFM dataset that has been clustered produces 7 optimal clusters, namely Cluster 0 is a bronze customer. Cluster 1 is a silver customer. Cluster 2 is a gold customer. Cluster 3 is a platinum customer. Cluster 4 is a diamond customer. Cluster 5 is a super customer, and cluster 6 is a superstar customer. The cluster validation of k-means using the silhouette coefficient produces a value of 0.934 while the Davies bouldin index produces a value of 0.155 and the validation results of the fuzzy c-means algorithm using the silhouette coefficient produces a value of 0.921 while the Davies bouldin index produces a value of 0.145.


2021 ◽  
Vol 4 (S3) ◽  
Author(s):  
Alexander Bogensperger ◽  
Yann Fabel

AbstractWith increasing digitization, new opportunities emerge concerning the availability and use of data in the energy sector. A comprehensive literature review shows an abundance in available unsupervised clustering algorithms as well as internal, relative and external cluster validation indices (cvi) to evaluate the results. Yet, the comparison of different clustering results on the same dataset, executed with different algorithms and a specific practical goal in mind still proves scientifically challenging. A large variety of cvi are described and consolidated in commonly used composite indices (e.g. Davies-Bouldin-Index, silhouette-Index, Dunn-Index). Previous works show the challenges surrounding these composite indices since they serve a generalized cluster quality evaluation. However, this does not suit individual clustering goals in many cases. The presented paper introduces the current state of science, existing cluster validation indices and proposes a practical method to combine them to an individual composite index, using Multi Criteria Decision Analysis (mcda). The methodology is applied on two energy economic use cases for clustering load profiles of bidirectional electric vehicles and municipalities.


2021 ◽  
pp. 108223
Author(s):  
Behnam Tavakkol ◽  
Jeongsub Choi ◽  
Myong Kee Jeong ◽  
Susan L. Albin

2021 ◽  
Vol 1863 (1) ◽  
pp. 012069
Author(s):  
Weksi Budiaji ◽  
Rifqi Ahmad Riyanto ◽  
Suherna

Author(s):  
David M. Benoit ◽  
Donald A. Jackson ◽  
Cindy Chu

A major strength of the guild approach is its ability to simplify community analysis by aggregating species with similar roles or functions into groups. These groups can be used to study a number of important ecological concepts, including functional diversity, community response to disturbance, and food-web dynamics. Despite increased use, guild membership can be based on subjective criteria that are arbitrarily chosen, leading to inconsistencies across studies. Additionally, studies using the guild approach generally ignore ontogenetic changes in diet and habitat use and therefore, do not fully capture the complexity of aquatic communities. Although these issues have been discussed in the literature, much has changed since the last review was published a decade ago. In our examination, we discuss data requirements and consequences of data availability and reliability on guild formation. We identify bootstrapping and permutation techniques developed to address limitations through cluster validation and the identification of ontogenetic shifts prior to guild delineation. Lastly, we provide a step-by-step guide to guild analysis, accompanied by a decision tree, to facilitate objective and informed guild creation.


Sign in / Sign up

Export Citation Format

Share Document