On Objective-Based Rough Hard and Fuzzyc-Means Clustering

Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  

Clustering is one of the most popular unsupervised classification methods. In this paper, we focus on rough clustering methods based on rough-set representation. Rough k-Means (RKM) is one of the rough clustering method proposed by Lingras et al. Outputs of many clustering algorithms, including RKM depend strongly on initial values, so we must evaluate the validity of outputs. In the case of objectivebased clustering algorithms, the objective function is handled as the measure. It is difficult, however to evaluate the output in RKM, which is not objective-based. To solve this problem, we propose new objective-based rough clustering algorithms and verify theirs usefulness through numerical examples.

Author(s):  
Yasunori Endo ◽  
◽  
Arisa Taniguchi ◽  
Yukihiro Hamasuna ◽  
◽  
...  

Clustering is an unsupervised classification technique for data analysis. In general, each datum in real space is transformed into a point in a pattern space to apply clustering methods. Data cannot often be represented by a point, however, because of its uncertainty, e.g., measurement error margin and missing values in data. In this paper, we will introduce quadratic penalty-vector regularization to handle such uncertain data using Hard c-Means (HCM), which is one of the most typical clustering algorithms. We first propose a new clustering algorithm called hard c-means using quadratic penalty-vector regularization for uncertain data (HCMP). Second, we propose sequential extraction hard c-means using quadratic penalty-vector regularization (SHCMP) to handle datasets whose cluster number is unknown. Furthermore, we verify the effectiveness of our proposed algorithms through numerical examples.


Author(s):  
Yasunori Endo ◽  
◽  
Ayako Heki ◽  
Yukihiro Hamasuna ◽  
◽  
...  

The non metricmodel is a kind of clustering method in which belongingness or the membership grade of each object in each cluster is calculated directly from dissimilarities between objects and in which cluster centers are not used. The clustering field has recently begun to focus on rough set representation instead of fuzzy set representation. Conventional clustering algorithms classify a set of objects into clusters with clear boundaries, that is, one object must belong to one cluster. Many objects in the real world, however, belong to more than one cluster because cluster boundaries overlap each other. Fuzzy set representation of clusters makes it possible for each object to belong to more than one cluster. The fuzzy degree of membership may, however, be too descriptive for interpreting clustering results. Rough set representation handles such cases. Clustering based on rough sets could provide a solution that is less restrictive than conventional clustering and more descriptive than fuzzy clustering. This paper covers two types of Rough-set-based Non Metric model (RNM). One algorithm is the Roughset-based Hard Non Metric model (RHNM) and the other is the Rough-set-based Fuzzy Non Metric model (RFNM). In both algorithms, clusters are represented by rough sets and each cluster consists of lower and upper approximation. The effectiveness of proposed algorithms is evaluated through numerical examples.


Author(s):  
Yuchi Kanzawa ◽  

In this paper, two types of fuzzy co-clustering algorithms are proposed. First, it is shown that the base of the objective function for the conventional fuzzy co-clustering method is very similar to the base for entropy-regularized fuzzy nonmetric model. Next, it is shown that the non-sense clustering problem in the conventional fuzzy co-clustering algorithms is identical to that in fuzzy nonmetric model algorithms, in the case that all dissimilarities among rows and columns are zero. Based on this discussion, a method is proposed applying entropy-regularized fuzzy nonmetric model after all dissimilarities among rows and columns are set to some values using a TIBA imputation technique. Furthermore, since relational fuzzy cmeans is similar to fuzzy nonmetricmodel, in the sense that both methods are designed for homogeneous relational data, a method is proposed applying entropyregularized relational fuzzyc-means after imputing all dissimilarities among rows and columns with TIBA. Some numerical examples are presented for the proposed methods.


Author(s):  
Ming Cao ◽  
Qinke Peng ◽  
Ze-Gang Wei ◽  
Fei Liu ◽  
Yi-Fan Hou

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.


Author(s):  
B.K. Tripathy ◽  
Adhir Ghosh

Developing Data Clustering algorithms have been pursued by researchers since the introduction of k-means algorithm (Macqueen 1967; Lloyd 1982). These algorithms were subsequently modified to handle categorical data. In order to handle the situations where objects can have memberships in multiple clusters, fuzzy clustering and rough clustering methods were introduced (Lingras et al 2003, 2004a). There are many extensions of these initial algorithms (Lingras et al 2004b; Lingras 2007; Mitra 2004; Peters 2006, 2007). The MMR algorithm (Parmar et al 2007), its extensions (Tripathy et al 2009, 2011a, 2011b) and the MADE algorithm (Herawan et al 2010) use rough set techniques for clustering. In this chapter, the authors focus on rough set based clustering algorithms and provide a comparative study of all the fuzzy set based and rough set based clustering algorithms in terms of their efficiency. They also present problems for future studies in the direction of the topics covered.


Author(s):  
Yukihiro Hamasuna ◽  
◽  
Yasunori Endo ◽  
Sadaaki Miyamoto ◽  

Detecting various kinds of cluster shape is an important problem in the field of clustering. In general, it is difficult to obtain clusters with different sizes or shapes by single-objective function. From that sense, we have proposed the concept of clusterwise tolerance and constructed clustering algorithms based on it. In the field of data mining, regularization techniques are used in order to derive significant classifiers. In this paper, we propose another concept of clusterwise tolerance from the viewpoint of regularization. Moreover, we construct clustering algorithms for data with clusterwise tolerance based onL2- andL1-regularization. After that, we describe fuzzy classification functions of proposed algorithms. Finally, we show the effectiveness of proposed algorithms through numerical examples.


Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  
Yukihiro Hamasuna ◽  
◽  
...  

Clustering, a highly useful unsupervised classification, has been applied in many fields. When, for example, we use clustering to classify a set of objects, it generally ignores any uncertainty included in objects. This is because uncertainty is difficult to deal with and model. It is desirable, however, to handle individual objects as is so that we may classify objects more precisely. In this paper, we propose new clustering algorithms that handle objects having uncertainty by introducing penalty vectors. We show the theoretical relationship between our proposal and conventional algorithms verifying the effectiveness of our proposed algorithms through numerical examples.


Author(s):  
Yasunori Endo ◽  
◽  
Tomoyuki Suzuki ◽  
Naohiko Kinoshita ◽  
Yukihiro Hamasuna ◽  
...  

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.


Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  
Ken Onishi ◽  
◽  
...  

The rough clustering algorithm we proposed based on the optimization of objective function (RCM) has a problem because conventional rough clustering algorithm results do not ensure that solutions are optimal. To solve this problem, we propose rough clustering algorithms based on optimization of an objective function with fuzzy-set representation. This yields more flexible results than RCM. We verify algorithm effectiveness through numerical examples.


2005 ◽  
Vol 277-279 ◽  
pp. 343-348 ◽  
Author(s):  
Mi Young Shin ◽  
Seon Hee Park

Clustering methods have been often used to find biologically relevant groups of genes or conditions based on their expression levels. Since many functionally related genes tend to be coexpressed, by identifying groups of genes with similar expression profiles, the functionalities of unknown genes can be inferred from those of known genes in the same group. In this paper we address a novel clustering approach, called seed-based clustering, where seed genes are first systematically chosen by computational analysis of their expression profiles, and then the clusters are generated by using the seed genes as initial values for k-means clustering. The seed-based clustering method has strong mathematical foundations and requires only a few matrix computations for seed extraction. As a result, it provides stability of clustering results by eliminating randomness in the selection of initial values for cluster generation. Our empirical results reported here indicate that the entire clustering process can be systematically pursued using seedbased clustering, and that its performance is favorable compared to current approaches.


Sign in / Sign up

Export Citation Format

Share Document