Decision Theory, an Unprecedented Validation Scheme for Rough-Fuzzy Clustering

Cluster validation is an essential technique in all cluster applications. Several validation methods measure the accuracy of cluster structure. Typical methods are geometric, where only distance and membership form the core of validation. Yao's decision theory is a novel approach for cluster validation, which evolved loss calculations and probabilistic based measure for determining the cluster quality. Conventional rough set algorithms have utilized this validity measure. This paper propagates decision theory, an unprecedented validation scheme for Rough-Fuzzy clustering by resolving loss and probability calculations to predict the risk measure in clustering techniques. Experiments with synthetic and UCI datasets have been performed, proven to deduce the optimal number of clusters overcoming the downsides of traditional validation frameworks. The proposed index can also be applied to other clustering algorithms and extends the usefulness in business oriented data mining.

Download Full-text

GIVING FUZZINESS TO SPATIAL CLUSTERS: A NEW INDEX FOR CHOOSING THE OPTIMAL NUMBER OF CLUSTERS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500097 ◽

2013 ◽

Vol 22 (03) ◽

pp. 1350009 ◽

Cited By ~ 2

Author(s):

GEORGE GREKOUSIS

Keyword(s):

Fuzzy Clustering ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Optimal Number ◽

Fuzzy Cluster ◽

Cluster Validation ◽

Number Of Clusters ◽

A Value ◽

Membership Value ◽

Optimal Number Of Clusters

Choosing the optimal number of clusters is a key issue in cluster analysis. Especially when dealing with more spatial clustering, things tend to be more complicated. Cluster validation helps to determine the appropriate number of clusters present in a dataset. Furthermore, cluster validation evaluates and assesses the results of clustering algorithms. There are numerous methods and techniques for choosing the optimal number of clusters via crisp and fuzzy clustering. In this paper, we introduce a new index for fuzzy clustering to determine the optimal number of clusters. This index is not another metric for calculating compactness or separation among partitions. Instead, the index uses several existing indices to give a degree, or fuzziness, to the optimal number of clusters. In this way, not only do the objects in a fuzzy cluster get a membership value, but the number of clusters to be partitioned is given a value as well. The new index is used in the fuzzy c-means algorithm for the geodemographic segmentation of 285 postal codes.

Download Full-text

Extracting physical homogeneous regions out of irrigation networks using fuzzy clustering method: a case study for the Ghazvin canal irrigation network

Journal of Hydroinformatics ◽

10.2166/hydro.2010.058 ◽

2010 ◽

Vol 13 (4) ◽

pp. 652-660 ◽

Cited By ~ 7

Author(s):

M. J. Monem ◽

S. M. Hashemy

Keyword(s):

Fuzzy Clustering ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Spatial Diversity ◽

Optimal Number ◽

Irrigation Network ◽

Irrigation Networks ◽

Fuzzy Clustering Method ◽

Physical Attributes ◽

Homogeneous Regions

Improving the current operation and maintenance activities is one of the main steps in achieving higher performance of irrigation networks. Improving the irrigation network management, influenced by different spatial and temporal parameters, is confronted with special difficulties. One of the controversial issues often faced by decision-makers is how to cope with the spatial diversity of irrigation systems. Homogeneous area detection out of the irrigation networks could improve the current management of networks. The idea behind this research is to present a quantitative benchmark for exploring the homogeneous areas with similar physical attributes out of the network region. Five physical attributes, such as length, capacity, number of intakes, number of conveyance structures and the covered irrigated area for each canal reach, are used for spatial clustering. Two fuzzy clustering algorithms, namely FCM and GK, are applied to the Ghazvin irrigation network. Using a clustering validity index, SC, shows that the GK algorithm is the more appropriate tool for clustering of the considered dataset. According to the results the optimal number of clusters for the Ghazvin irrigation project is derived as nine clusters and the irrigated district is classified into nine homogeneous areas. Physical homogeneous regions provide a context for better and easier decision-making.

Download Full-text

Tensor Decomposition for Multilayer Networks Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013371 ◽

2019 ◽

Vol 33 ◽

pp. 3371-3378 ◽

Cited By ~ 2

Author(s):

Zitai Chen ◽

Chuan Chen ◽

Zibin Zheng ◽

Yi Zhu

Keyword(s):

Clustering Algorithms ◽

Cluster Structure ◽

Real Life ◽

Nonlinear Least Squares ◽

Tensor Decomposition ◽

Underlying Structure ◽

Network Clustering ◽

Multilayer Networks ◽

Novel Approach ◽

Real World Datasets

Clustering on multilayer networks has been shown to be a promising approach to enhance the accuracy. Various multilayer networks clustering algorithms assume all networks derive from a latent clustering structure, and jointly learn the compatible and complementary information from different networks to excavate one shared underlying structure. However, such an assumption is in conflict with many emerging real-life applications due to the existence of noisy/irrelevant networks. To address this issue, we propose Centroid-based Multilayer Network Clustering (CMNC), a novel approach which can divide irrelevant relationships into different network groups and uncover the cluster structure in each group simultaneously. The multilayer networks is represented within a unified tensor framework for simultaneously capturing multiple types of relationships between a set of entities. By imposing the rank-(Lr,Lr,1) block term decomposition with nonnegativity, we are able to have well interpretations on the multiple clustering results based on graph cut theory. Numerically, we transform this tensor decomposition problem to an unconstrained optimization, thus can solve it efficiently under the nonlinear least squares (NLS) framework. Extensive experimental results on synthetic and real-world datasets show the effectiveness and robustness of our method against noise and irrelevant data.

Download Full-text

Extended Single-Iteration Fuzzy C-Means, and Gustafson-Kessel Algorithms for Medium-Sized (106) Multisource Weber Problem

International Journal of Operations Research and Information Systems ◽

10.4018/ijoris.2019070101 ◽

2019 ◽

Vol 10 (3) ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Tarik Kucukdeniz ◽

Sakir Esnaf ◽

Engin Bayturk

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Medium Size ◽

Weber Problem ◽

Solution Approach ◽

Fuzzy C Means ◽

Novel Approach ◽

New Facilities

An uncapacitated multisource Weber problem involves finding facility locations for known customers. When this problem is restated as finding locations for additional new facilities, while keeping the current facilities, a new solution approach is needed. In this study, two new and cooperative fuzzy clustering algorithms are developed to solve a variant of the uncapacitated version of a multisource Weber problem (MWP). The first algorithm proposed is the extensive version of the single iteration fuzzy c-means (SIFCM) algorithm. The SIFCM algorithm assigns customers to existing facilities. The new extended SIFCM (ESIFCM), which is first proposed in this study, allocates discrete locations (coordinates) with the SIFCM and locates and allocates continuous locations (coordinates) with the original FCM simultaneously. If the SIFCM and the FCM, show differences between the successive cluster center values are still decreasing, share customer points among facilities. It is simply explained as single-iteration fuzzy c-means with fuzzy c-means. The second algorithm, also proposed here, runs like the ESIFCM. Instead of the FCM, a Gustafson-Kessel (GK) fuzzy clustering algorithm is used under the same framework. This algorithm is based on single-iteration (SIGK) and the GK algorithms. Numerical results are reported using two MWP problems in a class of a medium-size-data (106 bytes). Using clustering algorithms to locate and allocate the new facilities while keeping current facilities is a novel approach. When applied to the big problems, the speed of the proposed algorithms enable to find a solution while mathematical programming solution is not doable due to the great computational costs.

Download Full-text

A validity measure for fuzzy clustering and its use in selecting optimal number of clusters

Proceedings of IEEE 5th International Fuzzy Systems ◽

10.1109/fuzzy.1996.552318 ◽

2002 ◽

Cited By ~ 14

Author(s):

Hyun-Sook Rhee ◽

Kyung-Whan Oh

Keyword(s):

Fuzzy Clustering ◽

Optimal Number ◽

Number Of Clusters ◽

Validity Measure ◽

Optimal Number Of Clusters

Download Full-text

Comparison of Fuzzy Clustering Methods and Their Applications to Geophysics Data

Applied Computational Intelligence and Soft Computing ◽

10.1155/2009/876361 ◽

2009 ◽

Vol 2009 ◽

pp. 1-16 ◽

Cited By ~ 4

Author(s):

David J. Miller ◽

Carl A. Nelson ◽

Molly Boeka Cannon ◽

Kenneth P. Cannon

Keyword(s):

Fuzzy Clustering ◽

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Optimum Number ◽

Clustering Methods ◽

Real World Data ◽

Data Set ◽

World Data

Fuzzy clustering algorithms are helpful when there exists a dataset with subgroupings of points having indistinct boundaries and overlap between the clusters. Traditional methods have been extensively studied and used on real-world data, but require users to have some knowledge of the outcome a priori in order to determine how many clusters to look for. Additionally, iterative algorithms choose the optimal number of clusters based on one of several performance measures. In this study, the authors compare the performance of three algorithms (fuzzy c-means, Gustafson-Kessel, and an iterative version of Gustafson-Kessel) when clustering a traditional data set as well as real-world geophysics data that were collected from an archaeological site in Wyoming. Areas of interest in the were identified using a crisp cutoff value as well as a fuzzyα-cut to determine which provided better elimination of noise and non-relevant points. Results indicate that theα-cut method eliminates more noise than the crisp cutoff values and that the iterative version of the fuzzy clustering algorithm is able to select an optimum number of subclusters within a point set (in both the traditional and real-world data), leading to proper indication of regions of interest for further expert analysis

Download Full-text

THRESHOLD ACCEPTING BASED FUZZY CLUSTERING ALGORITHMS

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488506004229 ◽

2006 ◽

Vol 14 (05) ◽

pp. 617-632 ◽

Cited By ~ 5

Author(s):

V. RAVI ◽

MA BIN ◽

P. RAVI KUMAR

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Optimal Number ◽

Data Sets ◽

Threshold Accepting ◽

Cluster Validity Index ◽

Data Set ◽

Fcm Algorithm ◽

Neighbourhood Search

In this paper, two new fuzzy clustering algorithms are proposed based on the global optimization metaheuristic, Threshold Accepting. Their effectiveness is demonstrated in the case of five well-known medium sized data sets viz. Iris, Wine, Glass, E.Coli and Olive oil and a large data set Thyroid. In terms of the least objective functions value, these algorithms named TAFC-1 (Threshold Accepting based Fuzzy Clustering) and TAFC-2 outperformed the well-known Fuzzy C-Means (FCM) algorithm in the case of 4 data sets and in the remaining two data sets, FCM marginally outperformed the TAFC. Xie-Beni cluster validity index is used in arriving at the 'optimal' number of clusters for all the algorithms. Here a novel strategy is proposed whereby the FCM is invoked to find alternative decision vectors whenever the neighbourhood search fails in its pursuit. This hybrid scheme has worked well. In conclusion, these new algorithms can be used as viable and efficient alternatives to the FCM algorithm.

Download Full-text

A novel and fast MIMO fuzzy inference system based on a class of fuzzy clustering algorithms with interpretability and complexity analysis

Expert Systems with Applications ◽

10.1016/j.eswa.2017.04.045 ◽

2017 ◽

Vol 84 ◽

pp. 301-322 ◽

Cited By ~ 10

Author(s):

S. Askari

Keyword(s):

Fuzzy Clustering ◽

Fuzzy Inference System ◽

Fuzzy Inference ◽

Complexity Analysis ◽

Clustering Algorithms ◽

Inference System

Download Full-text

AN EXTENDED FUZZY CLUSTERING ALGORITHM AND ITS APPLICATION

Journal of Circuits System and Computers ◽

10.1142/s0218126695000175 ◽

1995 ◽

Vol 05 (02) ◽

pp. 239-259

Author(s):

SU HWAN KIM ◽

SEON WOOK KIM ◽

TAE WON RHEE

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Main Memory ◽

Color Image Segmentation ◽

Occurrence Rate ◽

Secondary Memory ◽

Worst Case ◽

Memory Space ◽

Fuzzy Clustering Algorithm

For data analyses, it is very important to combine data with similar attribute values into a categorically homogeneous subset, called a cluster, and this technique is called clustering. Generally crisp clustering algorithms are weak in noise, because each datum should be assigned to exactly one cluster. In order to solve the problem, a fuzzy c-means, a fuzzy maximum likelihood estimation, and an optimal fuzzy clustering algorithms in the fuzzy set theory have been proposed. They, however, require a lot of processing time because of exhaustive iteration with an amount of data and their memberships. Especially large memory space results in the degradation of performance in real-time processing applications, because it takes too much time to swap between the main memory and the secondary memory. To overcome these limitations, an extended fuzzy clustering algorithm based on an unsupervised optimal fuzzy clustering algorithm is proposed in this paper. This algorithm assigns a weight factor to each distinct datum considering its occurrence rate. Also, the proposed extended fuzzy clustering algorithm considers the degree of importances of each attribute, which determines the characteristics of the data. The worst case is that the whole data has an uniformly normal distribution, which means the importance of all attributes are the same. The proposed extended fuzzy clustering algorithm has better performance than the unsupervised optimal fuzzy clustering algorithm in terms of memory space and execution time in most cases. For simulation the proposed algorithm is applied to color image segmentation. Also automatic target detection and multipeak detection are considered as applications. These schemes can be applied to any other fuzzy clustering algorithms.

Download Full-text

Identification of domestic water consumption in a house based on fuzzy clustering algorithms

2009 IEEE International Conference on Systems, Man and Cybernetics ◽

10.1109/icsmc.2009.5346891 ◽

2009 ◽

Cited By ~ 1

Author(s):

M. A. Corona-Nakamura ◽

R. Ruelas ◽

B. Ojeda-Magana ◽

D. W. Carr Finch

Keyword(s):

Fuzzy Clustering ◽

Water Consumption ◽

Clustering Algorithms ◽

Domestic Water ◽

Domestic Water Consumption

Download Full-text