A New Validity Index Based on Fuzzy Energy and Fuzzy Entropy Measures in Fuzzy Clustering Problems

Mapping Intimacies ◽

10.20944/preprints202009.0525.v1 ◽

2020 ◽

Author(s):

Ferdinando Di Martino ◽

salvatore sessa

Keyword(s):

Fuzzy Clustering ◽

Fuzzy Entropy ◽

Validity Index ◽

Number Of Clusters ◽

Machine Learning Classification ◽

Time Consumption ◽

Validity Indices ◽

Clustering Quality ◽

Fuzzy Energy

Two well-known drawbacks in fuzzy clustering are the requirement of assign in advance the number of clusters and random initialization of cluster centers.; the quality of the final fuzzy clusters depends heavily on the initial choice of the number of clusters and the initialization of the clusters, then it is necessary to apply a validity index to measure the compactness and the separability of the final clusters and run the clustering algorithm several times. We propose a new fuzzy C-means algorithm in which a validity index based on the concepts of maximum fuzzy energy and minimum fuzzy entropy is applied to initialize the cluster centers and to find the optimal number of clusters and initial cluster centers in order to obtain a good clustering quality, without increasing time consumption. We test our algorithm on UCI machine learning classification datasets comparing the results with the ones obtained by using well-known validity indices and variations of FCM using optimization algorithms in the initialization phase. The comparison results show that our algorithm represents an optimal trade-off between the quality of clustering and the time consumption.

Download Full-text

A New Validity Index Based on Fuzzy Energy and Fuzzy Entropy Measures in Fuzzy Clustering Problems

Entropy ◽

10.3390/e22111200 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1200

Author(s):

Ferdinando Di Martino ◽

Salvatore Sessa

Keyword(s):

Fuzzy Clustering ◽

Fuzzy Entropy ◽

Validity Index ◽

Number Of Clusters ◽

Machine Learning Classification ◽

Time Consumption ◽

Fuzzy C Means ◽

Validity Indices ◽

Fuzzy Energy

Two well-known drawbacks in fuzzy clustering are the requirement of assigning in advance the number of clusters and random initialization of cluster centers. The quality of the final fuzzy clusters depends heavily on the initial choice of the number of clusters and the initialization of the clusters, then, it is necessary to apply a validity index to measure the compactness and the separability of the final clusters and run the clustering algorithm several times. We propose a new fuzzy C-means algorithm in which a validity index based on the concepts of maximum fuzzy energy and minimum fuzzy entropy is applied to initialize the cluster centers and to find the optimal number of clusters and initial cluster centers in order to obtain a good clustering quality, without increasing time consumption. We test our algorithm on UCI (University of California at Irvine) machine learning classification datasets comparing the results with the ones obtained by using well-known validity indices and variations of fuzzy C-means by using optimization algorithms in the initialization phase. The comparison results show that our algorithm represents an optimal trade-off between the quality of clustering and the time consumption.

Download Full-text

A Validity Index for Fuzzy Clustering Based on Bipartite Modularity

Journal of Electrical and Computer Engineering ◽

10.1155/2019/2719617 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Yongli Liu ◽

Xiaoyang Zhang ◽

Jingli Chen ◽

Hao Chao

Keyword(s):

Fuzzy Clustering ◽

Optimal Number ◽

Experimental Results ◽

Validity Index ◽

Number Of Clusters ◽

Validity Indices ◽

Noise Data ◽

Clustering Validity ◽

Optimal Number Of Clusters

Because traditional fuzzy clustering validity indices need to specify the number of clusters and are sensitive to noise data, we propose a validity index for fuzzy clustering, named CSBM (compactness separateness bipartite modularity), based on bipartite modularity. CSBM enhances the robustness by combining intraclass compactness and interclass separateness and can automatically determine the optimal number of clusters. In order to estimate the performance of CSBM, we carried out experiments on six real datasets and compared CSBM with other six prominent indices. Experimental results show that the CSBM index performs the best in terms of robustness while accurately detecting the number of clusters.

Download Full-text

Number of Clusters and the Quality of Hybrid Predictive Models in Analytical CRM

Studies in Logic, Grammar and Rhetoric ◽

10.2478/slgr-2014-0022 ◽

2014 ◽

Vol 37 (1) ◽

pp. 141-157 ◽

Cited By ~ 1

Author(s):

Mariusz Łapczyński ◽

Bartłomiej Jefmański

Keyword(s):

Predictive Models ◽

Cluster Validity ◽

Number Of Clusters ◽

Model Combining ◽

Cluster Validity Indices ◽

Validity Indices ◽

And Cluster Analysis ◽

Analytical Tools ◽

F Measure

Abstract Making more accurate marketing decisions by managers requires building effective predictive models. Typically, these models specify the probability of customer belonging to a particular category, group or segment. The analytical CRM categories refer to customers interested in starting cooperation with the company (acquisition models), customers who purchase additional products (cross- and up-sell models) or customers intending to resign from the cooperation (churn models). During building predictive models researchers use analytical tools from various disciplines with an emphasis on their best performance. This article attempts to build a hybrid predictive model combining decision trees (C&RT algorithm) and cluster analysis (k-means). During experiments five different cluster validity indices and eight datasets were used. The performance of models was evaluated by using popular measures such as: accuracy, precision, recall, G-mean, F-measure and lift in the first and in the second decile. The authors tried to find a connection between the number of clusters and models' quality.

Download Full-text

Investigating cluster validation metrics for optimal number of clusters determination

Intelligent Decision Technologies ◽

10.3233/idt-210187 ◽

2021 ◽

pp. 1-16

Author(s):

Aikaterini Karanikola ◽

Charalampos M. Liapis ◽

Sotiris Kotsiantis

Keyword(s):

Real World ◽

Optimal Number ◽

Cluster Validation ◽

Clustering Methods ◽

Number Of Clusters ◽

Validity Indices ◽

Selection Of ◽

Specific Distance ◽

Optimal Number Of Clusters

In short, clustering is the process of partitioning a given set of objects into groups containing highly related instances. This relation is determined by a specific distance metric with which the intra-cluster similarity is estimated. Finding an optimal number of such partitions is usually the key step in the entire process, yet a rather difficult one. Selecting an unsuitable number of clusters might lead to incorrect conclusions and, consequently, to wrong decisions: the term “optimal” is quite ambiguous. Furthermore, various inherent characteristics of the datasets, such as clusters that overlap or clusters containing subclusters, will most often increase the level of difficulty of the task. Thus, the methods used to detect similarities and the parameter selection of the partition algorithm have a major impact on the quality of the groups and the identification of their optimal number. Given that each dataset constitutes a rather distinct case, validity indices are indicators introduced to address the problem of selecting such an optimal number of clusters. In this work, an extensive set of well-known validity indices, based on the approach of the so-called relative criteria, are examined comparatively. A total of 26 cluster validation measures were investigated in two distinct case studies: one in real-world and one in artificially generated data. To ensure a certain degree of difficulty, both real-world and generated data were selected to exhibit variations and inhomogeneity. Each of the indices is being deployed under the schemes of 9 different clustering methods, which incorporate 5 different distance metrics. All results are presented in various explanatory forms.

Download Full-text

A novel fuzzy clustering approach to regionalise watersheds with an automatic determination of optimal number of clusters

Journal of Hydrology and Hydromechanics ◽

10.1515/johh-2017-0024 ◽

2017 ◽

Vol 65 (4) ◽

pp. 359-365 ◽

Cited By ~ 1

Author(s):

Javier Senent-Aparicio ◽

Jesús Soto ◽

Julio Pérez-Sánchez ◽

Jorge Garrido

Keyword(s):

Frequency Analysis ◽

Fuzzy Clustering ◽

Optimal Number ◽

Regional Frequency Analysis ◽

Cluster Validity ◽

Number Of Clusters ◽

Cluster Validity Indices ◽

Validity Indices ◽

Homogeneous Regions ◽

Optimal Number Of Clusters

AbstractOne of the most important problems faced in hydrology is the estimation of flood magnitudes and frequencies in ungauged basins. Hydrological regionalisation is used to transfer information from gauged watersheds to ungauged watersheds. However, to obtain reliable results, the watersheds involved must have a similar hydrological behaviour. In this study, two different clustering approaches are used and compared to identify the hydrologically homogeneous regions. Fuzzy C-Means algorithm (FCM), which is widely used for regionalisation studies, needs the calculation of cluster validity indices in order to determine the optimal number of clusters. Fuzzy Minimals algorithm (FM), which presents an advantage compared with others fuzzy clustering algorithms, does not need to know a priori the number of clusters, so cluster validity indices are not used. Regional homogeneity test based on L-moments approach is used to check homogeneity of regions identified by both cluster analysis approaches. The validation of the FM algorithm in deriving homogeneous regions for flood frequency analysis is illustrated through its application to data from the watersheds in Alto Genil (South Spain). According to the results, FM algorithm is recommended for identifying the hydrologically homogeneous regions for regional frequency analysis.

Download Full-text

Cluster Validity Index to Determine the Optimal Number Clusters of Fuzzy Clustering for Classify Customer Buying Behavior

Journal of Development Research ◽

10.28926/jdr.v5i1.134 ◽

2021 ◽

Vol 5 (1) ◽

pp. 7-12

Author(s):

Salnan Ratih Asrriningtias

Keyword(s):

Fuzzy Clustering ◽

Optimal Number ◽

Buying Behavior ◽

Cluster Validity ◽

Cluster Validity Index ◽

Validity Index ◽

Number Of Clusters ◽

Best Value ◽

Fuzzy Clustering Method ◽

The Right

One of the strategies in order to compete in Batik MSMEs is to look at the characteristics of the customer. To make it easier to see the characteristics of customer buying behavior, it is necessary to classify customers based on similarity of characteristics using fuzzy clustering. One of the parameters that must be determined at the beginning of the fuzzy clustering method is the number of clusters. Increasing the number of clusters does not guarantee the best performance, but the right number of clusters greatly affects the performance of fuzzy clustering. So to get optimal number cluster, we can measured the result of clustering in each number cluster using the cluster validity index. From several types of cluster validity index, NPC give the best value. Optimal number cluster that obtained by the validity index is 2 and this number cluster give classify result with small variance value

Download Full-text

Enhancing clustering quality of geo-demographic analysis using context fuzzy clustering type-2 and particle swarm optimization

Applied Soft Computing ◽

10.1016/j.asoc.2014.04.025 ◽

2014 ◽

Vol 22 ◽

pp. 566-584 ◽

Cited By ~ 26

Author(s):

Le Hoang Son

Keyword(s):

Particle Swarm Optimization ◽

Fuzzy Clustering ◽

Particle Swarm ◽

Demographic Analysis ◽

Swarm Optimization ◽

Clustering Quality

Download Full-text

A new cluster validity index using maximum cluster spread based compactness measure

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-02-2016-0006 ◽

2016 ◽

Vol 9 (2) ◽

pp. 179-204 ◽

Cited By ~ 10

Author(s):

M. Arif Wani ◽

Romana Riyaz

Keyword(s):

Optimal Number ◽

Data Sets ◽

Cluster Validity ◽

Cluster Validity Index ◽

Validity Index ◽

Data Set ◽

Number Of Clusters ◽

Content Type ◽

Validity Indices ◽

Optimal Number Of Clusters

Purpose – The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities. The purpose of this paper is to propose a new cluster validity index (ARSD index) that works well on all types of data sets. Design/methodology/approach – The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster. A novel penalty function is proposed for determining the distinctness measure of clusters. Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index. The values of the six indices are computed for all nc ranging from (nc min, nc max) to obtain the optimal number of clusters present in a data set. The data sets used in the experiments include shaped, Gaussian-like and real data sets. Findings – Through extensive experimental study, it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices. This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results. Originality/value – The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities.

Download Full-text

Automatic Genetic Fuzzy c-Means

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0063 ◽

2018 ◽

Vol 29 (1) ◽

pp. 529-539

Author(s):

Khalid Jebari ◽

Abdelaziz Elmoujahid ◽

Aziz Ettouhami

Keyword(s):

Fitness Function ◽

Real Data ◽

Optimal Number ◽

Data Sets ◽

Number Of Clusters ◽

Fuzzy C Means ◽

Cluster Validity Indices ◽

Validity Indices ◽

Tournament Selection

Abstract Fuzzy c-means is an efficient algorithm that is amply used for data clustering. Nonetheless, when using this algorithm, the designer faces two crucial choices: choosing the optimal number of clusters and initializing the cluster centers. The two choices have a direct impact on the clustering outcome. This paper presents an improved algorithm called automatic genetic fuzzy c-means that evolves the number of clusters and provides the initial centroids. The proposed algorithm uses a genetic algorithm with a new crossover operator, a new mutation operator, and modified tournament selection; further, it defines a new fitness function based on three cluster validity indices. Real data sets are used to demonstrate the effectiveness, in terms of quality, of the proposed algorithm.

Download Full-text

Research on Fuzzy Clustering Validity

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.40-41.174 ◽

2010 ◽

Vol 40-41 ◽

pp. 174-182

Author(s):

Wei Jin Chen ◽

Huai Lin Dong ◽

Qing Feng Wu ◽

Ling Lin

Keyword(s):

Cluster Analysis ◽

Fuzzy Clustering ◽

Clustering Analysis ◽

Optimal Number ◽

Number Of Clusters ◽

Fuzzy Partition ◽

Geometry Structure ◽

Clustering Validity ◽

Optimal Number Of Clusters

The evaluation of clustering validity is important for clustering analysis, and is one of the hottest spots of cluster analysis. The quality of the evaluation of clustering is that optimal number of clusters is reasonable. For fuzzy clustering, the paper surveys the widely known fuzzy clustering validity evaluation based on the methods of fuzzy partition, geometry structure and statistics.

Download Full-text