scholarly journals Using Diversities to Model the Reliability of N-version Machine Learning System

Author(s):  
Fumio Machida

N-version machine learning system (MLS) is an architectural approach to reduce error outputs from a system by redundant configuration using multiple machine learning (ML) modules. Improved system reliability achieved by N-version MLS inherently depends on how diverse ML models are employed and how diverse input data sets are given. However, neither error input spaces of individual ML models nor input data distributions are obtainable in practice, which is a fundamental barrier to understanding the reliability gain by N-version architecture. In this paper, we introduce two diversity measures quantifying the similarities of ML models’ capabilities and the interdependence of input data sets, respectively. The defined measures are used to formulate the reliability of an elemental N-version MLS called dependent double-modules double-inputs MLS. The system is assumed to fail when two ML modules output errors simultaneously for the same classification task. The reliabilities of different architecture options for this MLS are comprehensively analyzed through a compact matrix representation form of the proposed reliability model. Except for limiting cases, we observe that the architecture exploiting two diversities tends to achieve preferable reliability under reasonable assumptions. Intuitive relations between diversity parameters and architecture reliabilities are also demonstrated through numerical experiments with hypothetical settings.

2021 ◽  
Author(s):  
Fumio Machida

N-version machine learning system (MLS) is an architectural approach to reduce error outputs from a system by redundant configuration using multiple machine learning (ML) modules. Improved system reliability achieved by N-version MLS inherently depends on how diverse ML models are employed and how diverse input data sets are given. However, neither error input spaces of individual ML models nor input data distributions are obtainable in practice, which is a fundamental barrier to understanding the reliability gain by N-version architecture. In this paper, we introduce two diversity measures quantifying the similarities of ML models’ capabilities and the interdependence of input data sets, respectively. The defined measures are used to formulate the reliability of an elemental N-version MLS called dependent double-modules double-inputs MLS. The system is assumed to fail when two ML modules output errors simultaneously for the same classification task. The reliabilities of different architecture options for this MLS are comprehensively analyzed through a compact matrix representation form of the proposed reliability model. Except for limiting cases, we observe that the architecture exploiting two diversities tends to achieve preferable reliability under reasonable assumptions. Intuitive relations between diversity parameters and architecture reliabilities are also demonstrated through numerical experiments with hypothetical settings.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Samar Ali Shilbayeh ◽  
Sunil Vadera

Purpose This paper aims to describe the use of a meta-learning framework for recommending cost-sensitive classification methods with the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” Design/methodology/approach This paper describes the use of a meta-learning framework for recommending cost-sensitive classification methods for the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” The framework is based on the idea of applying machine learning techniques to discover knowledge about the performance of different machine learning algorithms. It includes components that repeatedly apply different classification methods on data sets and measures their performance. The characteristics of the data sets, combined with the algorithms and the performance provide the training examples. A decision tree algorithm is applied to the training examples to induce the knowledge, which can then be used to recommend algorithms for new data sets. The paper makes a contribution to both meta-learning and cost-sensitive machine learning approaches. Those both fields are not new, however, building a recommender that recommends the optimal case-sensitive approach for a given data problem is the contribution. The proposed solution is implemented in WEKA and evaluated by applying it on different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system. The developed solution takes the misclassification cost into consideration during the learning process, which is not available in the compared project. Findings The proposed solution is implemented in WEKA and evaluated by applying it to different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system. Originality/value The paper presents a major piece of new information in writing for the first time. Meta-learning work has been done before but this paper presents a new meta-learning framework that is costs sensitive.


2021 ◽  
Author(s):  
Jessica Röhner ◽  
Philipp Thoss ◽  
Astrid Schütz

Research has shown that even experts cannot detect faking above chance, but recent studies have suggested that machine learning may help in this endeavor. However, faking differs between faking conditions, previous efforts have not taken these differences into account, and faking indices have yet to be integrated into such approaches. We reanalyzed seven data sets (N = 1,039) with various faking conditions (high and low scores, different constructs, naïve and informed faking, faking with and without practice, different measures [self-reports vs. implicit association tests; IATs]). We investigated the extent to which and how machine learning classifiers could detect faking under these conditions and compared different input data (response patterns, scores, faking indices) and different classifiers (logistic regression, random forest, XGBoost). We also explored the features that classifiers used for detection. Our results show that machine learning has the potential to detect faking, but detection success varies between conditions from chance levels to 100%. There were differences in detection (e.g., detecting low-score faking was better than detecting high-score faking). For self-reports, response patterns and scores were comparable with regard to faking detection, whereas for IATs, faking indices and response patterns were superior to scores. Logistic regression and random forest worked about equally well and outperformed XGBoost. In most cases, classifiers used more than one feature (faking occurred over different pathways), and the features varied in their relevance. Our research supports the assumption of different faking processes and explains why detecting faking is a complex endeavor.


2001 ◽  
Vol 40 (05) ◽  
pp. 380-385 ◽  
Author(s):  
S. Mani ◽  
W. R. Shankle ◽  
M. J. Pazzani

Summary Objectives: The aim was to evaluate the potential for monotonicity constraints to bias machine learning systems to learn rules that were both accurate and meaningful. Methods: Two data sets, taken from problems as diverse as screening for dementia and assessing the risk of mental retardation, were collected and a rule learning system, with and without monotonicity constraints, was run on each. The rules were shown to experts, who were asked how willing they would be to use such rules in practice. The accuracy of the rules was also evaluated. Results: Rules learned with monotonicity constraints were at least as accurate as rules learned without such constraints. Experts were, on average, more willing to use the rules learned with the monotonicity constraints. Conclusions: The analysis of medical databases has the potential of improving patient outcomes and/or lowering the cost of health care delivery. Various techniques, from statistics, pattern recognition, machine learning, and neural networks, have been proposed to “mine” this data by uncovering patterns that may be used to guide decision making. This study suggests cognitive factors make learned models coherent and, therefore, credible to experts. One factor that influences the acceptance of learned models is consistency with existing medical knowledge.


Geophysics ◽  
2017 ◽  
Vol 82 (3) ◽  
pp. V163-V177 ◽  
Author(s):  
Yongna Jia ◽  
Jianwei Ma

Machine learning (ML) systems can automatically mine data sets for hidden features or relationships. Recently, ML methods have become increasingly used within many scientific fields. We have evaluated common applications of ML, and then we developed a novel method based on the classic ML method of support vector regression (SVR) for reconstructing seismic data from under-sampled or missing traces. First, the SVR method mines a continuous regression hyperplane from training data that indicates the hidden relationship between input data with missing traces and output completed data, and then it interpolates missing seismic traces for other input data by using the learned hyperplane. The key idea of our new ML method is significantly different from that of many previous interpolation methods. Our method depends on the characteristics of the training data, rather than the assumptions of linear events, sparsity, or low rank. Therefore, it can break out the previous assumptions or constraints and show universality to different data sets. In addition, our method dramatically reduces the manual workload; for example, it allows users to avoid selecting the window size parameters, as is required for methods based on the assumption of linear events. The ML method facilitates intelligent interpolation between data sets with similar geomorphological structures, which can significantly reduce costs in engineering applications. Furthermore, we combine a sparse transform called the data-driven tight frame (so-called compressed learning) with the SVR method to improve the training performance, in which the training is implemented in a sparse coefficient domain rather than in the data domain. Numerical experiments show the competitive performance of our method in comparison with the traditional [Formula: see text]-[Formula: see text] interpolation method.


2019 ◽  
Author(s):  
David Lary

The human body exhibits a variety of autonomic responses. For example, changing light intensity provokes a change in the pupil dilation. In the past, formulae for pupil size based on luminance have been derived using traditional empirical approaches. In this paper, we present a different approach to a similar task by using machine learning to ex- amine the multivariate non-linear autonomic response of pupil dilation as a function of a comprehensive suite of more than four hundred environmental parameters leading to the provision of quantitative empirical models. The objectively optimized empirical machine learning models use a multivariate non-linear non-parametric supervised regression algorithm employing an ensemble of regression trees which receive input data from both spectral and biometric data. The models for predicting the participant’s pupil diameters from the input data had a fidelity of at least 96.9% for both the training and independent validation data sets. The most important inputs were the light levels (irradiance) of the wavelengths near 562 nm. This coincides with the peak sensitivity of the long-wave photosensitive cones in the retina, which exhibit a maximum absorbance around λmax = 562.8 ± 4.7 nm.


2017 ◽  
Author(s):  
Askin Guler Yigitoglu ◽  
Thomas Harrison ◽  
Michael Scott Greenwood

2016 ◽  
Vol 3 (1) ◽  
Author(s):  
LAL SINGH ◽  
PARMEET SINGH ◽  
RAIHANA HABIB KANTH ◽  
PURUSHOTAM SINGH ◽  
SABIA AKHTER ◽  
...  

WOFOST version 7.1.3 is a computer model that simulates the growth and production of annual field crops. All the run options are operational through a graphical user interface named WOFOST Control Center version 1.8 (WCC). WCC facilitates selecting the production level, and input data sets on crop, soil, weather, crop calendar, hydrological field conditions, soil fertility parameters and the output options. The files with crop, soil and weather data are explained, as well as the run files and the output files. A general overview is given of the development and the applications of the model. Its underlying concepts are discussed briefly.


2021 ◽  
Vol 34 (2) ◽  
pp. 541-549 ◽  
Author(s):  
Leihong Wu ◽  
Ruili Huang ◽  
Igor V. Tetko ◽  
Zhonghua Xia ◽  
Joshua Xu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document