Feature Based Rule Learner in Noisy Environment Using Neighbourhood Rough Set Model

Author(s):  
Yang Liu ◽  
Luyang Jiao ◽  
Guohua Bai ◽  
Boqin Feng

From the perspective of cognitive informatics, cognition can be viewed as the acquisition of knowledge. In real-world applications, information systems usually contain some degree of noisy data. A new model proposed to deal with the hybrid-feature selection problem combines the neighbourhood approximation and variable precision rough set models. Then rule induction algorithm can learn from selected features in order to reduce the complexity of rule sets. Through proposed integration, the knowledge acquisition process becomes insensitive to the dimensionality of data with a pre-defined tolerance degree of noise and uncertainty for misclassification. When the authors apply the method to a Chinese diabetic diagnosis problem, the hybrid-attribute reduction method selected only five attributes from totally thirty-four measurements. Rule learner produced eight rules with average two attributes in the left part of an IF-THEN rule form, which is a manageable set of rules. The demonstrated experiment shows that the present approach is effective in handling real-world problems.

Author(s):  
Yang Liu ◽  
Luyang Jiao ◽  
Guohua Bai ◽  
Boqin Feng

From the perspective of cognitive informatics, cognition can be viewed as the acquisition of knowledge. In real-world applications, information systems usually contain some degree of noisy data. A new model proposed to deal with the hybrid-feature selection problem combines the neighbourhood approximation and variable precision rough set models. Then rule induction algorithm can learn from selected features in order to reduce the complexity of rule sets. Through proposed integration, the knowledge acquisition process becomes insensitive to the dimensionality of data with a pre-defined tolerance degree of noise and uncertainty for misclassification. When the authors apply the method to a Chinese diabetic diagnosis problem, the hybrid-attribute reduction method selected only five attributes from totally thirty-four measurements. Rule learner produced eight rules with average two attributes in the left part of an IF-THEN rule form, which is a manageable set of rules. The demonstrated experiment shows that the present approach is effective in handling real-world problems.


Author(s):  
ROLLY INTAN ◽  
MASAO MUKAIDONO

In 1982, Pawlak proposed the concept of rough sets with a practical purpose of representing indiscernibility of elements or objects in the presence of information systems. Even if it is easy to analyze, the rough set theory built on a partition induced by equivalence relation may not provide a realistic view of relationships between elements in real-world applications. Here, coverings of, or nonequivalence relations on, the universe can be considered to represent a more realistic model instead of a partition in which a generalized model of rough sets was proposed. In this paper, first a weak fuzzy similarity relation is introduced as a more realistic relation in representing the relationship between two elements of data in real-world applications. Fuzzy conditional probability relation is considered as a concrete example of the weak fuzzy similarity relation. Coverings of the universe is provided by fuzzy conditional probability relations. Generalized concepts of rough approximations and rough membership functions are proposed and defined based on coverings of the universe. Such generalization is considered as a kind of fuzzy rough set. A more generalized fuzzy rough set approximation of a given fuzzy set is proposed and discussed as an alternative to provide interval-value fuzzy sets. Their properties are examined.


2021 ◽  
Author(s):  
Andreas Christ Sølvsten Jørgensen ◽  
Atiyo Ghosh ◽  
Marc Sturrock ◽  
Vahid Shahrezaei

AbstractThe modelling of many real-world problems relies on computationally heavy simulations. Since statistical inference rests on repeated simulations to sample the parameter space, the high computational expense of these simulations can become a stumbling block. In this paper, we compare two ways to mitigate this issue based on machine learning methods. One approach is to construct lightweight surrogate models to substitute the simulations used in inference. Alternatively, one might altogether circumnavigate the need for Bayesian sampling schemes and directly estimate the posterior distribution. We focus on stochastic simulations that track autonomous agents and present two case studies of real-world applications: tumour growths and the spread of infectious diseases. We demonstrate that good accuracy in inference can be achieved with a relatively small number of simulations, making our machine learning approaches orders of magnitude faster than classical simulation-based methods that rely on sampling the parameter space. However, we find that while some methods generally produce more robust results than others, no algorithm offers a one-size-fits-all solution when attempting to infer model parameters from observations. Instead, one must choose the inference technique with the specific real-world application in mind. The stochastic nature of the considered real-world phenomena poses an additional challenge that can become insurmountable for some approaches. Overall, we find machine learning approaches that create direct inference machines to be promising for real-world applications. We present our findings as general guidelines for modelling practitioners.Author summaryComputer simulations play a vital role in modern science as they are commonly used to compare theory with observations. One can thus infer the properties of a observed system by comparing the data to the predicted behaviour in different scenarios. Each of these scenarios corresponds to a simulation with slightly different settings. However, since real-world problems are highly complex, the simulations often require extensive computational resources, making direct comparisons with data challenging, if not insurmountable. It is, therefore, necessary to resort to inference methods that mitigate this issue, but it is not clear-cut what path to choose for any specific research problem. In this paper, we provide general guidelines for how to make this choice. We do so by studying examples from oncology and epidemiology and by taking advantage of developments in machine learning. More specifically, we focus on simulations that track the behaviour of autonomous agents, such as single cells or individuals. We show that the best way forward is problem-dependent and highlight the methods that yield the most robust results across the different case studies. We demonstrate that these methods are highly promising and produce reliable results in a small fraction of the time required by classic approaches that rely on comparisons between data and individual simulations. Rather than relying on a single inference technique, we recommend employing several methods and selecting the most reliable based on predetermined criteria.


Kybernetes ◽  
2017 ◽  
Vol 46 (4) ◽  
pp. 693-705 ◽  
Author(s):  
Yasser F. Hassan

Purpose This paper aims to utilize machine learning and soft computing to propose a new method of rough sets using deep learning architecture for many real-world applications. Design/methodology/approach The objective of this work is to propose a model for deep rough set theory that uses more than decision table and approximating these tables to a classification system, i.e. the paper propose a novel framework of deep learning based on multi-decision tables. Findings The paper tries to coordinate the local properties of individual decision table to provide an appropriate global decision from the system. Research limitations/implications The rough set learning assumes the existence of a single decision table, whereas real-world decision problem implies several decisions with several different decision tables. The new proposed model can handle multi-decision tables. Practical implications The proposed classification model is implemented on social networks with preferred features which are freely distribute as social entities with accuracy around 91 per cent. Social implications The deep learning using rough sets theory simulate the way of brain thinking and can solve the problem of existence of different information about same problem in different decision systems Originality/value This paper utilizes machine learning and soft computing to propose a new method of rough sets using deep learning architecture for many real-world applications.


Author(s):  
Marisa Mohr ◽  
Florian Wilhelm ◽  
Ralf Möller

The estimation of the qualitative behaviour of fractional Brownian motion is an important topic for modelling real-world applications. Permutation entropy is a well-known approach to quantify the complexity of univariate time series in a scalar-valued representation. As an extension often used for outlier detection, weighted permutation entropy takes amplitudes within time series into account. As many real-world problems deal with multivariate time series, these measures need to be extended though. First, we introduce multivariate weighted permutation entropy, which is consistent with standard multivariate extensions of permutation entropy. Second, we investigate the behaviour of weighted permutation entropy on both univariate and multivariate fractional Brownian motion and show revealing results.


2010 ◽  
Vol 25 (4) ◽  
pp. 365-395 ◽  
Author(s):  
N. Mac Parthaláin ◽  
Q. Shen

AbstractRough set theory (RST) has enjoyed an enormous amount of attention in recent years and has been applied to many real-world problems including data mining, pattern recognition, and intelligent control. Much research has recently been carried out in respect of both the development of the underlying theory and the application to new problem domains. This paper attempts to summarize the advances in RST, its extensions, and their applications. It also identifies important areas which require further investigation. Typical example application domains are examined which demonstrate the success of the application of RST to a wide variety of areas and disciplines, and which also exhibit the strengths and limitations of the respective underlying approaches.


Author(s):  
Guoyin Wang ◽  
Jun Hu ◽  
Qinghua Zhang ◽  
Xianquan Liu ◽  
Jiaqing Zhou

Granular computing (GrC) is a label of theories, methodologies, techniques, and tools that make use of granules in the process of problem solving. The philosophy of granular computing has appeared in many fields, and it is likely playing a more and more important role in data mining. Rough set theory and fuzzy set theory, as two very important paradigms of granular computing, are often used to process vague information in data mining. In this chapter, based on the opinion of data is also a format for knowledge representation, a new understanding for data mining, domain-oriented data-driven data mining (3DM), is introduced at first. Its key idea is that data mining is a process of knowledge transformation. Then, the relationship of 3DM and GrC, especially from the view of rough set and fuzzy set, is discussed. Finally, some examples are used to illustrate how to solve real problems in data mining using granular computing. Combining rough set theory and fuzzy set theory, a flexible way for processing incomplete information systems is introduced firstly. Then, the uncertainty measure of covering based rough set is studied by converting a covering into a partition using an equivalence domain relation. Thirdly, a high efficient attribute reduction algorithm is developed by translating set operation of granules into logical operation of bit strings with bitmap technology. Finally, two rule generation algorithms are introduced, and experiment results show that the rule sets generated by these two algorithms are simpler than other similar algorithms.


Author(s):  
Yasuo Kudo ◽  
◽  
Tetsuya Murai ◽  

In this paper, we propose a parallel computation framework for a heuristic attribute reduction method. Attribute reduction is a key technique to use rough set theory as a tool in data mining. The authors have previously proposed a heuristic attribute reduction method to compute as many relative reducts as possible from a given dataset with numerous attributes. We parallelize our method by using open multiprocessing. We also evaluate the performance of a parallelized attribute reduction method by experiments.


Sign in / Sign up

Export Citation Format

Share Document