Continuous attribute discretization algorithm of Rough Set based on k-means

Author(s):  
Xing Xiaoxue ◽  
Guan Xiuli ◽  
Shang Weiwei
Symmetry ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 1245
Author(s):  
Xiangyang Li ◽  
Yangyang Shen

Discretization based on rough sets is used to divide the space formed by continuous attribute values with as few breakpoint sets as possible, while maintaining the original indistinguishable relationship of the decision system, so as to accurately classify and identify related information. In this study, a discretization algorithm for incomplete economic information in rough set based on big data is proposed. First, the algorithm for filling-in incomplete economic information based on deep learning is used to supplement the incomplete economic information. Then, based on breakpoint discrimination, the algorithm for discretization in the rough set is used to implement the discretization based on rough set for supplementary economic information. The performance of this algorithm was tested using multiple sets of data and compared with other algorithms. Experimental results show that this algorithm is effective for discretization based on a rough set of incomplete economic information. When the number of incomplete economic information rough candidate breakpoints increases, it still has a higher computational efficiency and can effectively improve the integrity of incomplete economic information, and finally the application performance is superior.


Complexity ◽  
2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Jianchuan Bai ◽  
Kewen Xia ◽  
Yongliang Lin ◽  
Panpan Wu

As an important processing step for rough set theory, attribute reduction aims at eliminating data redundancy and drawing useful information. Covering rough set, as a generalization of classical rough set theory, has attracted wide attention on both theory and application. By using the covering rough set, the process of continuous attribute discretization can be avoided. Firstly, this paper focuses on consistent covering rough set and reviews some basic concepts in consistent covering rough set theory. Then, we establish the model of attribute reduction and elaborate the steps of attribute reduction based on consistent covering rough set. Finally, we apply the studied method to actual lagging data. It can be proved that our method is feasible and the reduction results are recognized by Least Squares Support Vector Machine (LS-SVM) and Relevance Vector Machine (RVM). Furthermore, the recognition results are consistent with the actual test results of a gas well, which verifies the effectiveness and efficiency of the presented method.


2013 ◽  
Vol 416-417 ◽  
pp. 1399-1403 ◽  
Author(s):  
Zhi Cai Shi ◽  
Yong Xiang Xia ◽  
Chao Gang Yu ◽  
Jin Zu Zhou

The discretization is one of the most important steps for the application of Rough set theory. In this paper, we analyzed the shortcomings of the current relative works. Then we proposed a novel discretization algorithm based on information loss and gave its mathematical description. This algorithm used information loss as the measure so as to reduce the loss of the information entropy during discretizating. The algorithm was applied to different samples with the same attributes from KDDcup99 and intrusion detection systems. The experimental results show that this algorithm is sensitive to the samples only for parts of all attributes. But it dose not compromise the effect of intrusion detection and it improves the response performance of intrusion detection remarkably.


Author(s):  
Qiong Chen ◽  
Mengxing Huang

AbstractFeature discretization is an important preprocessing technology for massive data in industrial control. It improves the efficiency of edge-cloud computing by transforming continuous features into discrete ones, so as to meet the requirements of high-quality cloud services. Compared with other discretization methods, the discretization based on rough set has achieved good results in many applications because it can make full use of the known knowledge base without any prior information. However, the equivalence class of rough set is an ordinary set, which is difficult to describe the fuzzy components in the data, and the accuracy is low in some complex data types in big data environment. Therefore, we propose a rough fuzzy model based discretization algorithm (RFMD). Firstly, we use fuzzy c-means clustering to get the membership of each sample to each category. Then, we fuzzify the equivalence class of rough set by the obtained membership, and establish the fitness function of genetic algorithm based on rough fuzzy model to select the optimal discrete breakpoints on the continuous features. Finally, we compare the proposed method with the discretization algorithm based on rough set, the discretization algorithm based on information entropy, and the discretization algorithm based on chi-square test on remote sensing datasets. The experimental results verify the effectiveness of our method.


2013 ◽  
Vol 278-280 ◽  
pp. 1167-1173
Author(s):  
Guo Qiang Sun ◽  
Hong Li Wang ◽  
Jing Hui Lu ◽  
Xing He

Rough set theory is mainly used for analysing, processing fuzzy and uncertain information and knowledge, but most of data that we usually gain are continuous data, rough set theory can pretreat these data and can gain satisfied discretization results. So, discretization of continuous attributes is an important part of rough set theory. Field Programmable Gate Array(FPGA) has been became the mainly platforms that realized design of digital system. In order to improve processing speed of discretization, this paper proposed a FPGA-based discretization algorithm of continuous attributes in rough ret that make use of the speed advantage of FPGA and combined attributes dependency degree. This method could save much time of pretreatment in rough ret and improve operation efficiency.


Author(s):  
Dong Xu ◽  
Xin Wang ◽  
Yulong Meng ◽  
Ziying Zhang

Discretization of multidimensional attributes can improve the training speed and accuracy of machine learning algorithm. At present, the discretization algorithms perform at a lower level, and most of them are single attribute discretization algorithm, ignoring the potential association between attributes. Based on this, we proposed a discretization algorithm based on forest optimization and rough set (FORDA) in this paper. To solve the problem of discretization of multi-dimensional attributes, the algorithm designs the appropriate value function according to the variable precision rough set theory, and then constructs the forest optimization network and iteratively searches for the optimal subset of breakpoints. The experimental results on the UCI datasets show that:compared with the current mainstream discretization algorithms, the algorithm can avoid local optimization, significantly improve the classification accuracy of the SVM classifier, and its discretization performance is better, which verifies the effectiveness of the algorithm.


2008 ◽  
Vol 2008 ◽  
pp. 1-13 ◽  
Author(s):  
Aboul ella Hassanien ◽  
Mohamed E. Abdelhafez ◽  
Hala S. Own

The main goal of this study is to investigate the relationship between psychosocial variables and diabetic children patients and to obtain a classifier function with which it was possible to classify the patients on the basis of assessed adherence level. The rough set theory is used to identify the most important attributes and to induce decision rules from 302 samples of Kuwaiti diabetic children patients aged 7–13 years old. To increase the efficiency of the classification process, rough sets with Boolean reasoning discretization algorithm is introduced to discretize the data, then the rough set reduction technique is applied to find all reducts of the data which contains the minimal subset of attributes that are associated with a class label for classification. Finally, the rough sets dependency rules are generated directly from all generated reducts. Rough confusion matrix is used to evaluate the performance of the predicted reducts and classes. A comparison between the obtained results using rough sets with decision tree, neural networks, and statistical discriminate analysis classifier algorithms has been made. Rough sets show a higher overall accuracy rates and generate more compact rules.


Sign in / Sign up

Export Citation Format

Share Document