On Cluster Extraction from Relational Data UsingL1-Regularized Possibilistic Assignment Prototype Algorithm

Author(s):  
Yukihiro Hamasuna ◽  
◽  
Yasunori Endo ◽  

This paper proposes entropy-basedL1-regularized possibilistic clustering and a method of sequential cluster extraction from relational data.Sequential cluster extractionmeans that the algorithm extracts cluster one by one. The assignment prototype algorithmis a typical clustering method for relational data. The membership degree of each object to each cluster is calculated directly from dissimilarities between objects. An entropy-basedL1-regularized possibilistic assignment prototype algorithm is proposed first to induce belongingness for a membership grade. An algorithm of sequential cluster extraction based on the proposed method is constructed and the effectiveness of the proposed methods is shown through numerical examples.

Author(s):  
Yasunori Endo ◽  
◽  
Tomoyuki Suzuki ◽  
Naohiko Kinoshita ◽  
Yukihiro Hamasuna ◽  
...  

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.


Author(s):  
Yasunori Endo ◽  

The fuzzy non metric model is a kind of clustering method in which belongingness or the membership grade of each datum to each cluster is calculated directly from dissimilarities between data, and cluster centers are not used. In this paper, we first construct a new fuzzy non metric model with entropy regularization. Second, we kernelize the proposed method by introducing kernel functions. Third, we consider pairwise constraints with the proposed method. We then confirm the above methods through some simple numerical examples.


Author(s):  
Yasunori Endo ◽  
◽  
Ayako Heki ◽  
Yukihiro Hamasuna ◽  
◽  
...  

The non metricmodel is a kind of clustering method in which belongingness or the membership grade of each object in each cluster is calculated directly from dissimilarities between objects and in which cluster centers are not used. The clustering field has recently begun to focus on rough set representation instead of fuzzy set representation. Conventional clustering algorithms classify a set of objects into clusters with clear boundaries, that is, one object must belong to one cluster. Many objects in the real world, however, belong to more than one cluster because cluster boundaries overlap each other. Fuzzy set representation of clusters makes it possible for each object to belong to more than one cluster. The fuzzy degree of membership may, however, be too descriptive for interpreting clustering results. Rough set representation handles such cases. Clustering based on rough sets could provide a solution that is less restrictive than conventional clustering and more descriptive than fuzzy clustering. This paper covers two types of Rough-set-based Non Metric model (RNM). One algorithm is the Roughset-based Hard Non Metric model (RHNM) and the other is the Rough-set-based Fuzzy Non Metric model (RFNM). In both algorithms, clusters are represented by rough sets and each cluster consists of lower and upper approximation. The effectiveness of proposed algorithms is evaluated through numerical examples.


Author(s):  
Yuchi Kanzawa ◽  

In this paper, two types of fuzzy co-clustering algorithms are proposed. First, it is shown that the base of the objective function for the conventional fuzzy co-clustering method is very similar to the base for entropy-regularized fuzzy nonmetric model. Next, it is shown that the non-sense clustering problem in the conventional fuzzy co-clustering algorithms is identical to that in fuzzy nonmetric model algorithms, in the case that all dissimilarities among rows and columns are zero. Based on this discussion, a method is proposed applying entropy-regularized fuzzy nonmetric model after all dissimilarities among rows and columns are set to some values using a TIBA imputation technique. Furthermore, since relational fuzzy cmeans is similar to fuzzy nonmetricmodel, in the sense that both methods are designed for homogeneous relational data, a method is proposed applying entropyregularized relational fuzzyc-means after imputing all dissimilarities among rows and columns with TIBA. Some numerical examples are presented for the proposed methods.


2016 ◽  
Vol 41 (1) ◽  
pp. 45-76 ◽  
Author(s):  
Dmitri A. Viattchenin

AbstractThe paper deals with the problem of discovering fuzzy clusters with optimal number of elements in heuristic possibilistic clustering. The relational clustering procedure using a parameter that controls cluster sizes is considered and a technique for detecting the optimal number of elements in fuzzy clusters is proposed. The effectiveness of the proposed technique is illustrated through numerical examples. Experimental results are discussed and some preliminary conclusions are formulated.


2019 ◽  
Vol 15 (1) ◽  
pp. 19-38
Author(s):  
Toshihiro Osaragi

It is necessary to classify numerical values of spatial data when representing them on a map so that, visually, it can be as clearly understood as possible. Inevitably some loss of information from the original data occurs in the process of this classification. A gate loss of information might lead to a misunderstanding of the nature of original data. At the same time, when we understand the spatial distribution of attribute values, forming spatial clusters is regarded as an effective means, in which values can be regarded as statistically equivalent and distribute continuous in the same patches. In this study, a classification method for organizing spatial data is proposed, in which any loss of information is minimized. Also, a spatial clustering method based on Akaike's Information Criterion is proposed. Some numerical examples of their applications are shown using actual spatial data for the Tokyo metropolitan area.


Author(s):  
Kei Kitajima ◽  
Yasunori Endo ◽  
Yukihiro Hamasuna ◽  
◽  
◽  
...  

Clustering is a method of data analysis without the use of supervised data. Even-sized clustering based on optimization (ECBO) is a clustering algorithm that focuses on cluster size with the constraints that cluster sizes must be the same. However, this constraints makes ECBO inconvenient to apply in cases where a certain margin of cluster size is allowed. It is believed that this issue can be overcome by applying a fuzzy clustering method. Fuzzy clustering can represent the membership of data to clusters more flexible. In this paper, we propose a new even-sized clustering algorithm based on fuzzy clustering and verify its effectiveness through numerical examples.


2007 ◽  
Vol 6 (4) ◽  
pp. 541-546 ◽  
Author(s):  
Zhenping Xie ◽  
Shitong Wang ◽  
Dian You Zhang ◽  
F.L. Chung ◽  
Hanbin .

Author(s):  
Yukihiro Hamasuna ◽  
◽  
Yasunori Endo ◽  

Sequential cluster extraction algorithms are useful clustering methods that extract clusters one by one without the number of clusters having to be determined in advance. Typical examples of these algorithms are sequential hardc-means (SHCM) and possibilistic clustering (PCM) based algorithms. Two types ofL1-regularized possibilistic clustering are proposed to induce crisp and possibilistic allocation rules and to construct a novel sequential cluster extraction algorithm. The relationship between the proposed method and SHCM is also discussed. The effectiveness of the proposed method is verified through numerical examples. Results show that the entropy-based method yields better results for the Rand Index and the number of extracted clusters.


Sign in / Sign up

Export Citation Format

Share Document