Using rough set to induce dependencies between attributes where there are a large amount of missing values

Author(s):  
Feng Honghai ◽  
Liu Baoyan ◽  
He Liyun
Keyword(s):  
Author(s):  
Hemant Rana ◽  
Manohar Lal

Handling of missing attribute values are a big challenge for data analysis. For handling this type of problems, there are some well known approaches, including Rough Set Theory (RST) and classification via clustering. In the work reported here, RSES (Rough Set Exploration System) one of the tools based on RST approach, and WEKA (Waikato Environment for Knowledge Analysis), a data mining tool—based on classification via clustering—are used for predicting learning styles from given data, which possibly has missing values. The results of the experiments using the tools show that the problem of missing attribute values is better handled by RST approach as compared to the classification via clustering approach. Further, in respect of missing values, RSES yields better decision rules, if the missing values are simply ignored than the rules obtained by assigning some values in place of missing attribute values.


Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-17
Author(s):  
Zhaohao Wang ◽  
Xiaoping Zhang

How to effectively deal with missing values in incomplete information systems (IISs) according to the research target is still a key issue for investigating IISs. If the missing values in IISs are not handled properly, they will destroy the internal connection of data and reduce the efficiency of data usage. In this paper, in order to establish effective methods for filling missing values, we propose a new information system, namely, a fuzzy set-valued information system (FSvIS). By means of the similarity measures of fuzzy sets, we obtain several binary relations in FSvISs, and we investigate the relationship among them. This is a foundation for the researches on FSvISs in terms of rough set approach. Then, we provide an algorithm to fill the missing values in IISs with fuzzy set values. In fact, this algorithm can transform an IIS into an FSvIS. Furthermore, we also construct an algorithm to fill the missing values in IISs with set values (or real values). The effectiveness of these algorithms is analyzed. The results showed that the proposed algorithms achieve higher correct rate than traditional algorithms, and they have good stability. Finally, we discuss the importance of these algorithms for investigating IISs from the viewpoint of rough set theory.


Author(s):  
Yoshifumi Kusunoki ◽  
◽  
Masahiro Inuiguchi

In this paper, we study rough set models in information tables with missing values. The variable precision rough set model proposed by Ziarko tolerates misclassification error using a membership function in complete information tables. We generalize the variable precision rough set in information tables with missing values. Because of incompleteness, the membership degree of each objects becomes an interval value. We define six different approximate regions using the lower and upper bounds of membership functions. The properties of the proposed rough set model are investigated. Moreover we show that the proposed model is a generalization of rough set models based on similarity relations.


2021 ◽  
Vol 11 (4) ◽  
pp. 307-318
Author(s):  
Robert K. Nowicki ◽  
Robert Seliga ◽  
Dariusz Żelasko ◽  
Yoichi Hayashi

Abstract The paper presents a performance analysis of a selected few rough set–based classification systems. They are hybrid solutions designed to process information with missing values. Rough set-–based classification systems combine various classification methods, such as support vector machines, k–nearest neighbour, fuzzy systems, and neural networks with the rough set theory. When all input values take the form of real numbers, and they are available, the structure of the classifier returns to a non–rough set version. The performance of the four systems has been analysed based on the classification results obtained for benchmark databases downloaded from the machine learning repository of the University of California at Irvine.


2022 ◽  
pp. 016555152110695
Author(s):  
Ahmed Hamed ◽  
Mohamed Tahoun ◽  
Hamed Nassar

The original K-nearest neighbour ( KNN) algorithm was meant to classify homogeneous complete data, that is, data with only numerical features whose values exist completely. Thus, it faces problems when used with heterogeneous incomplete (HI) data, which has also categorical features and is plagued with missing values. Many solutions have been proposed over the years but most have pitfalls. For example, some solve heterogeneity by converting categorical features into numerical ones, inflicting structural damage. Others solve incompleteness by imputation or elimination, causing semantic disturbance. Almost all use the same K for all query objects, leading to misclassification. In the present work, we introduce KNNHI, a KNN-based algorithm for HI data classification that avoids all these pitfalls. Leveraging rough set theory, KNNHI preserves both categorical and numerical features, leaves missing values untouched and uses a different K for each query. The end result is an accurate classifier, as demonstrated by extensive experimentation on nine datasets mostly from the University of California Irvine repository, using a 10-fold cross-validation technique. We show that KNNHI outperforms six recently published KNN-based algorithms, in terms of precision, recall, accuracy and F-Score. In addition to its function as a mighty classifier, KNNHI can also serve as a K calculator, helping KNN-based algorithms that use a single K value for all queries that find the best such value. Sure enough, we show how four such algorithms improve their performance using the K obtained by KNNHI. Finally, KNNHI exhibits impressive resilience to the degree of incompleteness, degree of heterogeneity and the metric used to measure distance.


Author(s):  
Marcin Szeląg ◽  
Jerzy Błaszczyński ◽  
Roman Słowiński
Keyword(s):  

2015 ◽  
Vol 58 ◽  
pp. 235-246 ◽  
Author(s):  
Wen-Yang Lin ◽  
Lin Lan ◽  
Feng-Hsiung Huang ◽  
Min-Hsien Wang

2007 ◽  
Vol 2007 ◽  
pp. 1-13 ◽  
Author(s):  
E. A. Rady ◽  
M. M. E. Abd El-Monsef ◽  
W. A. Abd El-Latif

The key point of the tolerance relation or similarity relation presented in the literature is to assign a “null” value to all missing attribute values. In other words, a “null” value may be equal to any value in the domain of the attribute values. This may cause a serious effect in data analysis and decision analysis because the missing values are just “missed” but they do exist and have an influence on the decision. In this paper, we will introduce the modified similarity relation denoted by MSIM that is dependent on the number of missing values with respect to the number of the whole defined attributes for each object. According to the definition of MSIM, many problems concerning the generalized decisions are solved. This point may be used in scaling in statistics in a wide range. Also, a new definition of the discernibility matrix, deduction of the decision rules, and reducts in the present of the missing values are obtained.


Sign in / Sign up

Export Citation Format

Share Document