Decision Theory, an Unprecedented Validation Scheme for Rough-Fuzzy Clustering

2016 ◽  
Vol 25 (02) ◽  
pp. 1650003
Author(s):  
S. Revathy ◽  
B. Parvathavarthini ◽  
S. Shiny Caroline

Cluster validation is an essential technique in all cluster applications. Several validation methods measure the accuracy of cluster structure. Typical methods are geometric, where only distance and membership form the core of validation. Yao's decision theory is a novel approach for cluster validation, which evolved loss calculations and probabilistic based measure for determining the cluster quality. Conventional rough set algorithms have utilized this validity measure. This paper propagates decision theory, an unprecedented validation scheme for Rough-Fuzzy clustering by resolving loss and probability calculations to predict the risk measure in clustering techniques. Experiments with synthetic and UCI datasets have been performed, proven to deduce the optimal number of clusters overcoming the downsides of traditional validation frameworks. The proposed index can also be applied to other clustering algorithms and extends the usefulness in business oriented data mining.

2013 ◽  
Vol 22 (03) ◽  
pp. 1350009 ◽  
Author(s):  
GEORGE GREKOUSIS

Choosing the optimal number of clusters is a key issue in cluster analysis. Especially when dealing with more spatial clustering, things tend to be more complicated. Cluster validation helps to determine the appropriate number of clusters present in a dataset. Furthermore, cluster validation evaluates and assesses the results of clustering algorithms. There are numerous methods and techniques for choosing the optimal number of clusters via crisp and fuzzy clustering. In this paper, we introduce a new index for fuzzy clustering to determine the optimal number of clusters. This index is not another metric for calculating compactness or separation among partitions. Instead, the index uses several existing indices to give a degree, or fuzziness, to the optimal number of clusters. In this way, not only do the objects in a fuzzy cluster get a membership value, but the number of clusters to be partitioned is given a value as well. The new index is used in the fuzzy c-means algorithm for the geodemographic segmentation of 285 postal codes.


2010 ◽  
Vol 13 (4) ◽  
pp. 652-660 ◽  
Author(s):  
M. J. Monem ◽  
S. M. Hashemy

Improving the current operation and maintenance activities is one of the main steps in achieving higher performance of irrigation networks. Improving the irrigation network management, influenced by different spatial and temporal parameters, is confronted with special difficulties. One of the controversial issues often faced by decision-makers is how to cope with the spatial diversity of irrigation systems. Homogeneous area detection out of the irrigation networks could improve the current management of networks. The idea behind this research is to present a quantitative benchmark for exploring the homogeneous areas with similar physical attributes out of the network region. Five physical attributes, such as length, capacity, number of intakes, number of conveyance structures and the covered irrigated area for each canal reach, are used for spatial clustering. Two fuzzy clustering algorithms, namely FCM and GK, are applied to the Ghazvin irrigation network. Using a clustering validity index, SC, shows that the GK algorithm is the more appropriate tool for clustering of the considered dataset. According to the results the optimal number of clusters for the Ghazvin irrigation project is derived as nine clusters and the irrigated district is classified into nine homogeneous areas. Physical homogeneous regions provide a context for better and easier decision-making.


Author(s):  
Zitai Chen ◽  
Chuan Chen ◽  
Zibin Zheng ◽  
Yi Zhu

Clustering on multilayer networks has been shown to be a promising approach to enhance the accuracy. Various multilayer networks clustering algorithms assume all networks derive from a latent clustering structure, and jointly learn the compatible and complementary information from different networks to excavate one shared underlying structure. However, such an assumption is in conflict with many emerging real-life applications due to the existence of noisy/irrelevant networks. To address this issue, we propose Centroid-based Multilayer Network Clustering (CMNC), a novel approach which can divide irrelevant relationships into different network groups and uncover the cluster structure in each group simultaneously. The multilayer networks is represented within a unified tensor framework for simultaneously capturing multiple types of relationships between a set of entities. By imposing the rank-(Lr,Lr,1) block term decomposition with nonnegativity, we are able to have well interpretations on the multiple clustering results based on graph cut theory. Numerically, we transform this tensor decomposition problem to an unconstrained optimization, thus can solve it efficiently under the nonlinear least squares (NLS) framework. Extensive experimental results on synthetic and real-world datasets show the effectiveness and robustness of our method against noise and irrelevant data.


Author(s):  
Tarik Kucukdeniz ◽  
Sakir Esnaf ◽  
Engin Bayturk

An uncapacitated multisource Weber problem involves finding facility locations for known customers. When this problem is restated as finding locations for additional new facilities, while keeping the current facilities, a new solution approach is needed. In this study, two new and cooperative fuzzy clustering algorithms are developed to solve a variant of the uncapacitated version of a multisource Weber problem (MWP). The first algorithm proposed is the extensive version of the single iteration fuzzy c-means (SIFCM) algorithm. The SIFCM algorithm assigns customers to existing facilities. The new extended SIFCM (ESIFCM), which is first proposed in this study, allocates discrete locations (coordinates) with the SIFCM and locates and allocates continuous locations (coordinates) with the original FCM simultaneously. If the SIFCM and the FCM, show differences between the successive cluster center values are still decreasing, share customer points among facilities. It is simply explained as single-iteration fuzzy c-means with fuzzy c-means. The second algorithm, also proposed here, runs like the ESIFCM. Instead of the FCM, a Gustafson-Kessel (GK) fuzzy clustering algorithm is used under the same framework. This algorithm is based on single-iteration (SIGK) and the GK algorithms. Numerical results are reported using two MWP problems in a class of a medium-size-data (106 bytes). Using clustering algorithms to locate and allocate the new facilities while keeping current facilities is a novel approach. When applied to the big problems, the speed of the proposed algorithms enable to find a solution while mathematical programming solution is not doable due to the great computational costs.


2009 ◽  
Vol 2009 ◽  
pp. 1-16 ◽  
Author(s):  
David J. Miller ◽  
Carl A. Nelson ◽  
Molly Boeka Cannon ◽  
Kenneth P. Cannon

Fuzzy clustering algorithms are helpful when there exists a dataset with subgroupings of points having indistinct boundaries and overlap between the clusters. Traditional methods have been extensively studied and used on real-world data, but require users to have some knowledge of the outcome a priori in order to determine how many clusters to look for. Additionally, iterative algorithms choose the optimal number of clusters based on one of several performance measures. In this study, the authors compare the performance of three algorithms (fuzzy c-means, Gustafson-Kessel, and an iterative version of Gustafson-Kessel) when clustering a traditional data set as well as real-world geophysics data that were collected from an archaeological site in Wyoming. Areas of interest in the were identified using a crisp cutoff value as well as a fuzzyα-cut to determine which provided better elimination of noise and non-relevant points. Results indicate that theα-cut method eliminates more noise than the crisp cutoff values and that the iterative version of the fuzzy clustering algorithm is able to select an optimum number of subclusters within a point set (in both the traditional and real-world data), leading to proper indication of regions of interest for further expert analysis


Author(s):  
V. RAVI ◽  
MA BIN ◽  
P. RAVI KUMAR

In this paper, two new fuzzy clustering algorithms are proposed based on the global optimization metaheuristic, Threshold Accepting. Their effectiveness is demonstrated in the case of five well-known medium sized data sets viz. Iris, Wine, Glass, E.Coli and Olive oil and a large data set Thyroid. In terms of the least objective functions value, these algorithms named TAFC-1 (Threshold Accepting based Fuzzy Clustering) and TAFC-2 outperformed the well-known Fuzzy C-Means (FCM) algorithm in the case of 4 data sets and in the remaining two data sets, FCM marginally outperformed the TAFC. Xie-Beni cluster validity index is used in arriving at the 'optimal' number of clusters for all the algorithms. Here a novel strategy is proposed whereby the FCM is invoked to find alternative decision vectors whenever the neighbourhood search fails in its pursuit. This hybrid scheme has worked well. In conclusion, these new algorithms can be used as viable and efficient alternatives to the FCM algorithm.


1995 ◽  
Vol 05 (02) ◽  
pp. 239-259
Author(s):  
SU HWAN KIM ◽  
SEON WOOK KIM ◽  
TAE WON RHEE

For data analyses, it is very important to combine data with similar attribute values into a categorically homogeneous subset, called a cluster, and this technique is called clustering. Generally crisp clustering algorithms are weak in noise, because each datum should be assigned to exactly one cluster. In order to solve the problem, a fuzzy c-means, a fuzzy maximum likelihood estimation, and an optimal fuzzy clustering algorithms in the fuzzy set theory have been proposed. They, however, require a lot of processing time because of exhaustive iteration with an amount of data and their memberships. Especially large memory space results in the degradation of performance in real-time processing applications, because it takes too much time to swap between the main memory and the secondary memory. To overcome these limitations, an extended fuzzy clustering algorithm based on an unsupervised optimal fuzzy clustering algorithm is proposed in this paper. This algorithm assigns a weight factor to each distinct datum considering its occurrence rate. Also, the proposed extended fuzzy clustering algorithm considers the degree of importances of each attribute, which determines the characteristics of the data. The worst case is that the whole data has an uniformly normal distribution, which means the importance of all attributes are the same. The proposed extended fuzzy clustering algorithm has better performance than the unsupervised optimal fuzzy clustering algorithm in terms of memory space and execution time in most cases. For simulation the proposed algorithm is applied to color image segmentation. Also automatic target detection and multipeak detection are considered as applications. These schemes can be applied to any other fuzzy clustering algorithms.


Sign in / Sign up

Export Citation Format

Share Document