Entropy Based Attribute Reduction Algorithms for Rough Sets

2012 ◽  
Vol 268-270 ◽  
pp. 1859-1862
Author(s):  
Hua Yan

This paper presented a concept of knowledge entropy, and according to this concept the importance of attribute was defined. Algorithms for attribute reduction in rough sets based on concept of knowledge entropy was given, and an example analysis was done in this paper. For the example, the calculation result is coincident with the result calculated by using the traditional method of attribute reduction in rough set theory.

Author(s):  
ZHIMING ZHANG ◽  
JINGFENG TIAN

Intuitionistic fuzzy (IF) rough sets are the generalization of traditional rough sets obtained by combining the IF set theory and the rough set theory. The existing research on IF rough sets mainly concentrates on the establishment of lower and upper approximation operators using constructive and axiomatic approaches. Less effort has been put on the attribute reduction of databases based on IF rough sets. This paper systematically studies attribute reduction based on IF rough sets. Firstly, attribute reduction with traditional rough sets and some concepts of IF rough sets are reviewed. Then, we introduce some concepts and theorems of attribute reduction with IF rough sets, and completely investigate the structure of attribute reduction. Employing the discernibility matrix approach, an algorithm to find all attribute reductions is also presented. Finally, an example is proposed to illustrate our idea and method. Altogether, these findings lay a solid theoretical foundation for attribute reduction based on IF rough sets.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Tengfei Zhang ◽  
Fumin Ma ◽  
Jie Cao ◽  
Chen Peng ◽  
Dong Yue

Parallel attribute reduction is one of the most important topics in current research on rough set theory. Although some parallel algorithms were well documented, most of them are still faced with some challenges for effectively dealing with the complex heterogeneous data including categorical and numerical attributes. Aiming at this problem, a novel attribute reduction algorithm based on neighborhood multigranulation rough sets was developed to process the massive heterogeneous data in the parallel way. The MapReduce-based parallelization method for attribute reduction was proposed in the framework of neighborhood multigranulation rough sets. To improve the reduction efficiency, the hashing Map/Reduce functions were designed to speed up the positive region calculation. Thereafter, a quick parallel attribute reduction algorithm using MapReduce was developed. The effectiveness and superiority of this parallel algorithm were demonstrated by theoretical analysis and comparison experiments.


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 155 ◽  
Author(s):  
Lin Sun ◽  
Xiaoyu Zhang ◽  
Jiucheng Xu ◽  
Shiguang Zhang

Attribute reduction as an important preprocessing step for data mining, and has become a hot research topic in rough set theory. Neighborhood rough set theory can overcome the shortcoming that classical rough set theory may lose some useful information in the process of discretization for continuous-valued data sets. In this paper, to improve the classification performance of complex data, a novel attribute reduction method using neighborhood entropy measures, combining algebra view with information view, in neighborhood rough sets is proposed, which has the ability of dealing with continuous data whilst maintaining the classification information of original attributes. First, to efficiently analyze the uncertainty of knowledge in neighborhood rough sets, by combining neighborhood approximate precision with neighborhood entropy, a new average neighborhood entropy, based on the strong complementarity between the algebra definition of attribute significance and the definition of information view, is presented. Then, a concept of decision neighborhood entropy is investigated for handling the uncertainty and noisiness of neighborhood decision systems, which integrates the credibility degree with the coverage degree of neighborhood decision systems to fully reflect the decision ability of attributes. Moreover, some of their properties are derived and the relationships among these measures are established, which helps to understand the essence of knowledge content and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is proposed to improve the classification performance of complex data sets. The experimental results under an instance and several public data sets demonstrate that the proposed method is very effective for selecting the most relevant attributes with great classification performance.


Author(s):  
S. Arjun Raj ◽  
M. Vigneshwaran

In this article we use the rough set theory to generate the set of decision concepts in order to solve a medical problem.Based on officially published data by International Diabetes Federation (IDF), rough sets have been used to diagnose Diabetes.The lower and upper approximations of decision concepts and their boundary regions have been formulated here.


Author(s):  
Yanfang Liu ◽  
Hong Zhao ◽  
William Zhu

Rough set is mainly concerned with the approximations of objects through an equivalence relation on a universe. Matroid is a generalization of linear algebra and graph theory. Recently, a matroidal structure of rough sets is established and applied to the problem of attribute reduction which is an important application of rough set theory. In this paper, we propose a new matroidal structure of rough sets and call it a parametric matroid. On the one hand, for an equivalence relation on a universe, a parametric set family, with any subset of the universe as its parameter, is defined through the lower approximation operator. This parametric set family is proved to satisfy the independent set axiom of matroids, therefore a matroid is generated, and we call it a parametric matroid of the rough set. Through the lower approximation operator, three equivalent representations of the parametric set family are obtained. Moreover, the parametric matroid of the rough set is proved to be the direct sum of a partition-circuit matroid and a free matroid. On the other hand, partition-circuit matroids are well studied through the lower approximation number, and then we use it to investigate the parametric matroid of the rough set. Several characteristics of the parametric matroid of the rough set, such as independent sets, bases, circuits, the rank function and the closure operator, are expressed by the lower approximation number.


Complexity ◽  
2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Jianchuan Bai ◽  
Kewen Xia ◽  
Yongliang Lin ◽  
Panpan Wu

As an important processing step for rough set theory, attribute reduction aims at eliminating data redundancy and drawing useful information. Covering rough set, as a generalization of classical rough set theory, has attracted wide attention on both theory and application. By using the covering rough set, the process of continuous attribute discretization can be avoided. Firstly, this paper focuses on consistent covering rough set and reviews some basic concepts in consistent covering rough set theory. Then, we establish the model of attribute reduction and elaborate the steps of attribute reduction based on consistent covering rough set. Finally, we apply the studied method to actual lagging data. It can be proved that our method is feasible and the reduction results are recognized by Least Squares Support Vector Machine (LS-SVM) and Relevance Vector Machine (RVM). Furthermore, the recognition results are consistent with the actual test results of a gas well, which verifies the effectiveness and efficiency of the presented method.


Author(s):  
B. K. Tripathy

Granular Computing has emerged as a framework in which information granules are represented and manipulated by intelligent systems. Granular Computing forms a unified conceptual and computing platform. Rough set theory put forth by Pawlak is based upon single equivalence relation taken at a time. Therefore, from a granular computing point of view, it is single granular computing. In 2006, Qiang et al. introduced a multi-granular computing using rough set, which was called optimistic multigranular rough sets after the introduction of another type of multigranular computing using rough sets called pessimistic multigranular rough sets being introduced by them in 2010. Since then, several properties of multigranulations have been studied. In addition, these basic notions on multigranular rough sets have been introduced. Some of these, called the Neighborhood-Based Multigranular Rough Sets (NMGRS) and the Covering-Based Multigranular Rough Sets (CBMGRS), have been added recently. In this chapter, the authors discuss all these topics on multigranular computing and suggest some problems for further study.


Author(s):  
Benjamin Griffiths

Rough Set Theory (RST), since its introduction in Pawlak (1982), continues to develop as an effective tool in data mining. Within a set theoretical structure, its remit is closely concerned with the classification of objects to decision attribute values, based on their description by a number of condition attributes. With regards to RST, this classification is through the construction of ‘if .. then ..’ decision rules. The development of RST has been in many directions, amongst the earliest was with the allowance for miss-classification in the constructed decision rules, namely the Variable Precision Rough Sets model (VPRS) (Ziarko, 1993), the recent references for this include; Beynon (2001), Mi et al. (2004), and Slezak and Ziarko (2005). Further developments of RST have included; its operation within a fuzzy environment (Greco et al., 2006), and using a dominance relation based approach (Greco et al., 2004). The regular major international conferences of ‘International Conference on Rough Sets and Current Trends in Computing’ (RSCTC, 2004) and ‘International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing’ (RSFDGrC, 2005) continue to include RST research covering the varying directions of its development. This is true also for the associated book series entitled ‘Transactions on Rough Sets’ (Peters and Skowron, 2005), which further includes doctoral theses on this subject. What is true, is that RST is still evolving, with the eclectic attitude to its development meaning that the definitive concomitant RST data mining techniques are still to be realised. Grzymala-Busse and Ziarko (2000), in a defence of RST, discussed a number of points relevant to data mining, and also made comparisons between RST and other techniques. Within the area of data mining and the desire to identify relationships between condition attributes, the effectiveness of RST is particularly pertinent due to the inherent intent within RST type methodologies for data reduction and feature selection (Jensen and Shen, 2005). That is, subsets of condition attributes identified that perform the same role as all the condition attributes in a considered data set (termed ß-reducts in VPRS, see later). Chen (2001) addresses this, when discussing the original RST, they state it follows a reductionist approach and is lenient to inconsistent data (contradicting condition attributes - one aspect of underlying uncertainty). This encyclopaedia article describes and demonstrates the practical application of a RST type methodology in data mining, namely VPRS, using nascent software initially described in Griffiths and Beynon (2005). The use of VPRS, through its relative simplistic structure, outlines many of the rudiments of RST based methodologies. The software utilised is oriented towards ‘hands on’ data mining, with graphs presented that clearly elucidate ‘veins’ of possible information identified from ß-reducts, over different allowed levels of missclassification associated with the constructed decision rules (Beynon and Griffiths, 2004). Further findings are briefly reported when undertaking VPRS in a resampling environment, with leave-one-out and bootstrapping approaches adopted (Wisnowski et al., 2003). The importance of these results is in the identification of the more influential condition attributes, pertinent to accruing the most effective data mining results.


Author(s):  
Malcolm J. Beynon

Rough set theory (RST), since its introduction in Pawlak (1982), continues to develop as an effective tool in classification problems and decision support. In the majority of applications using RST based methodologies, there is the construction of ‘if .. then ..’ decision rules that are used to describe the results from an analysis. The variation of applications in management and decision making, using RST, recently includes discovering the operating rules of a Sicilian irrigation purpose reservoir (Barbagallo, Consoli, Pappalardo, Greco, & Zimbone, 2006), feature selection in customer relationship management (Tseng & Huang, 2007) and decisions that insurance companies make to satisfy customers’ needs (Shyng, Wang, Tzeng, & Wu, 2007). As a nascent symbolic machine learning technique, the popularity of RST is a direct consequence of its set theoretical operational processes, mitigating inhibiting issues associated with traditional techniques, such as within-group probability distribution assumptions (Beynon & Peel, 2001). Instead, the rudiments of the original RST are based on an indiscernibility relation, whereby objects are grouped into certain equivalence classes and inference taken from these groups. Characteristics like this mean that decision support will be built upon the underlying RST philosophy of “Let the data speak for itself” (Dunstch & Gediga, 1997). Recently, RST was viewed as being of fundamental importance in artificial intelligence and cognitive sciences, including decision analysis and decision support systems (Tseng & Huang, 2007). One of the first developments on RST was through the variable precision rough sets model (VPRSß), which allows a level of mis-classification to exist in the classification of objects, resulting in probabilistic rules (see Ziarko, 1993; Beynon, 2001; Li and Wang, 2004). VPRSß has specifically been applied as a potential decision support system with the UK Monopolies and Mergers Commission (Beynon & Driffield, 2005), predicting bank credit ratings (Griffiths & Beynon, 2005) and diffusion of medicaid home care programs (Kitchener, Beynon, & Harrington, 2004). Further developments of RST include extended variable precision rough sets (VPRSl,u), which infers asymmetric bounds on the possible classification and mis-classification of objects (Katzberg & Ziarko, 1996), dominance-based rough sets, which bases their approach around a dominance relation (Greco, Matarazzo, & Slowinski, 2004), fuzzy rough sets, which allows the grade of membership of objects to constructed sets (Greco, Inuiguchi, & Slowinski, 2006), and probabilistic bayesian rough sets model that considers an appropriate certainty gain function (Ziarko, 2005). A literal presentation of the diversity of work on RST can be viewed in the annual volumes of the Transactions on Rough Sets (most recent year 2006), also the annual conferences dedicated to RST and its developments (see for example, RSCTC, 2004). In this article, the theory underlying VPRSl,u is described, with its special case of VPRSß used in an example analysis. The utilisation of VPRSl,u, and VPRSß, is without loss of generality to other developments such as those referenced, its relative simplicity allows the non-proficient reader the opportunity to fully follow the details presented.


Sign in / Sign up

Export Citation Format

Share Document