scholarly journals Rough Set-Based Clustering Utilizing Probabilistic Memberships

Author(s):  
Seiki Ubukata ◽  
◽  
Hiroki Kato ◽  
Akira Notsu ◽  
Katsuhiro Honda

Representing the positive, possible, and boundary regions of clusters, rough set-based C-means clustering methods, such as generalized rough C-means (GRCM) and rough set C-means (RSCM), are promising for analyzing vague cluster shapes and realizing reliable classification. In this study, we consider rough set-based clustering approaches that utilize probabilistic memberships as variants of GRCM and RSCM, including π generalized rough C-means (πGRCM), π rough set C-means (πRSCM), and rough membership C-means (RMCM). πGRCM and πRSCM assign equal probabilities of cluster belonging according to Laplace’s principle of indifference, whereas RMCM assigns the probabilities according to rough memberships, which represent conditional probabilities based on the object’s neighborhood derived from a binary relation. In addition, we discuss the theoretical validity of our RMCM approach and compare it with other methods considered in this study. Furthermore, we conducted numerical experiments for evaluating the classification performances of the abovementioned methods. Based on our experimental results, the methods were found to be effective.

Author(s):  
B.K. Tripathy ◽  
Adhir Ghosh

Developing Data Clustering algorithms have been pursued by researchers since the introduction of k-means algorithm (Macqueen 1967; Lloyd 1982). These algorithms were subsequently modified to handle categorical data. In order to handle the situations where objects can have memberships in multiple clusters, fuzzy clustering and rough clustering methods were introduced (Lingras et al 2003, 2004a). There are many extensions of these initial algorithms (Lingras et al 2004b; Lingras 2007; Mitra 2004; Peters 2006, 2007). The MMR algorithm (Parmar et al 2007), its extensions (Tripathy et al 2009, 2011a, 2011b) and the MADE algorithm (Herawan et al 2010) use rough set techniques for clustering. In this chapter, the authors focus on rough set based clustering algorithms and provide a comparative study of all the fuzzy set based and rough set based clustering algorithms in terms of their efficiency. They also present problems for future studies in the direction of the topics covered.


Author(s):  
Rashid Ali ◽  
M. M. Sufyan Beg

Rank aggregation is the process of generating a single aggregated ranking for a given set of rankings. In industrial environment, there are many applications where rank aggregation must be applied. Rough set based rank aggregation is a user feedback based technique which mines ranking rules for rank aggregation using rough set theory. In this chapter, the authors discuss rough set based rank aggregation technique in light of Web search evaluation. Since there are many search engines available, which can be used by used by industrial houses to advertise their products, Web search evaluation is essential to decide which search engines to rely on. Here, the authors discuss the limitations of rough set based rank aggregation and present an improved version of the same, which is more suitable for aggregation of different techniques for Web search evaluation. In the improved version, the authors incorporate the confidence of the rules in predicting a class for a given set of data. They validate the mined ranking rules by comparing the predicted user feedback based ranking with the actual user feedback based ranking. They show their experimental results pertaining to the evaluation of seven public search engines using improved version of rough set based aggregation for a set of 37 queries.


Author(s):  
Koichi Yamada ◽  

We propose a way to lean probabilistic causal models using conditional causal probabilities (CCPs) to represent uncertainty of causalities. The CCP is a probability devised by Peng and Reggia representing the uncertainty that a cause actually causes an effect given the cause. The main advantage of using CCPs is that they represent exact probabilities of causalities that people recognize mentally, and that the number of probabilities used in the causal model is far smaller than that of conditional probabilities by all combinations of possible causes. Thus, Peng and Reggia assumed that CCPs are given by human experts as subjective ones, and did not discuss how to calculate them from data when a dataset was available. We address this problem, starting from a discussion about properties of data frequently given in practical problems, and shows that prior probabilities that should be learned may differ from those derived by counting data. We then discuss and propose how to learn prior probabilities and CCPs from data, and evaluate the proposed method through numerical experiments and analyze results to show that the precision of leaned models is satisfactory.


2014 ◽  
Vol 989-994 ◽  
pp. 1551-1554
Author(s):  
Fa Chao Li ◽  
Ming Li ◽  
Shuo Liu

Currently, rough set theory has been widely used in many fields. In rough set theory, how to measure the attributes significance of the data is a core content. In order to solve the problem that the existing attributes significance measure methods usually ignore the interaction among the attributes, the paper presents a measure method based on difference degree. When given a set, the proposed method first divides it into several subsets according to the value of condition attributes, and then computes the difference degree in the subsets. Secondly, the important attributes are selected based on the value of difference degree. Further the paper discussed some properties of the difference degree, and the experimental results shows the effectiveness of this method in the final.


Author(s):  
DUN LIU ◽  
TIANRUI LI ◽  
DECUI LIANG

By considering the risks in policy making procedure, a three-way decision approach based on the decision-theoretic rough set model is adopted to risk government decision-making. A three-way decision is made based on a pair of thresholds on conditional probabilities. A positive rule makes a decision of executing, a negative rule makes a decision of non-executing, and a boundary rule makes a decision of deferment. The loss functions are used to calculate the required two thresholds to describe the decision risk with the Bayesian decision procedure. A case study of government petroleum risk investment demonstrates the proposed method.


2014 ◽  
Vol 513-517 ◽  
pp. 973-977
Author(s):  
Zhi Li Pei ◽  
Jian Hong Qi ◽  
Li Sha Liu ◽  
Qing Hu Wang ◽  
Ming Yang Jiang ◽  
...  

In 2012, Wang Zuofei built up granularity-function and applied it to the measure of attribute importance and attribute reduction. On this basis, granularity-function based upon pessimistic and optimistic multi-granularity rough set is constructed. It is applied to the calculation of attribute importance and attribute reduction. According to the experimental results, the method can reduce the dimension of features and obviously improve the classification accuracy and efficiency.


Author(s):  
Seiki Ubukata ◽  
◽  
Sho Sekiya ◽  
Akira Notsu ◽  
Katsuhiro Honda

In the field of cluster analysis, rough set-based extensions of hard C-means (HCM; k-means) including rough C-means (RCM), rough set C-means (RSCM), and rough membership C-means (RMCM) are promising approaches for dealing with the certainty, possibility, uncertainty of belonging of object to clusters. Since C-means-type methods are strongly affected by noise, noise clustering approaches have been proposed. In noise clustering approaches, noise objects, which are far from any cluster center, are rejected for robust estimation. In this paper, we introduce noise rejection approaches for rough set-based C-means based on probabilistic memberships and propose noise RCM with membership normalization (NRCM-MN), noise RSCM with membership normalization (NRSCM-MN), and noise RMCM (NRMCM). In addition, visualization demonstration of the cluster boundaries on the two-dimensional plane of the proposed methods is carried out to confirm the characteristics of each method. Furthermore, the clustering performance is verified by numerical experiments using real-world datasets.


2018 ◽  
Vol 7 (3) ◽  
pp. 82-85
Author(s):  
A. George Louis Raja ◽  
F. Sagayaraj Francis ◽  
P. Sugumar

The existing semantic methods cluster the documents based on unabridged or abridged term comparisons. After clustering, these terms are not preserved, costing the cluster operation to be repeated in its entirety upon the arrival of new documents. Hence the semantic clustering methods can be considered as “on the go” methods. Re-clustering becomes unavoidable in all circumstances both in the Iterative and Incremental Clustering Methods. It would be more appropriate to build and evolve a lexicon with the derived keywords of the documents and to refer them in further cluster operations. The rationale is to deny re-clustering upon new documents and refer the Lexicon to formulate clusters until the quality of clusters is intact, and when it breaks above the threshold, the cluster operation can be repeated. Since re-clustering is delayed until a breakeven point, the process of re-clustering becomes faster. This process may incur additional runtime complexity, but would extremely simplify and speed up the process of re-clustering. This paper discusses about the construction of lexicons and its applications in clustering. The Keyword based Lexicon Construction Algorithm (KBLCA) is demonstrated to build lexicons and the breakeven point for re-clustering is proposed and described. The theory of denying re-clustering is briefed, along with experimental results.


2011 ◽  
Vol 267 ◽  
pp. 46-49 ◽  
Author(s):  
Ju Li ◽  
Wen Bin Xu ◽  
Wei Yuan Tu ◽  
Xing Wang ◽  
Wei Zhang ◽  
...  

Based on the study of customer relationship management. First, we got the data from the database, transformed the corresponding decision table, then got the data in decision-making table for further simplification, generated the final decision rules. and got good results, experimental results showed that the method provided some practical value.


Author(s):  
Guilong Liu ◽  
William Zhu

Rough set theory is an important technique in knowledge discovery in databases. Classical rough set theory proposed by Pawlak is based on equivalence relations, but many interesting and meaningful extensions have been made based on binary relations and coverings, respectively. This paper makes a comparison between covering rough sets and rough sets based on binary relations. This paper also focuses on the authors’ study of the condition under which the covering rough set can be generated by a binary relation and the binary relation based rough set can be generated by a covering.


Sign in / Sign up

Export Citation Format

Share Document