Individual Prediction Reliability Estimates in Classification and Regression

Current machine learning algorithms perform well in many problem domains, but in risk-sensitive decision making – for example, in medicine and finance – experts do not rely on common evaluation methods that provide overall assessments of models because such techniques do not provide any information about single predictions. This chapter summarizes the research areas that have motivated the development of various approaches to individual prediction reliability. Based on these motivations, the authors describe six approaches to reliability estimation: inverse transduction, local sensitivity analysis, bagging variance, local cross-validation, local error modelling, and density-based estimation. Empirical evaluation of the benchmark datasets provides promising results, especially for use with decision and regression trees. The testing results also reveal that the reliability estimators exhibit different performance levels when used with different models and in different domains. The authors show the usefulness of individual prediction reliability estimates in attempts to predict breast cancer recurrence. In this context, estimating prediction reliability for individual predictions is of crucial importance for physicians seeking to validate predictions derived using classification and regression models.

Download Full-text

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets

Sensors ◽

10.3390/s21020656 ◽

2021 ◽

Vol 21 (2) ◽

pp. 656

Author(s):

Xavier Larriva-Novo ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera ◽

Mario Sanz Rodrigo

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

High Performance ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Statistical Characteristics ◽

Detection Techniques ◽

Traffic Characteristics ◽

Benchmark Datasets

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

Download Full-text

A Comparison of Reliability Estimation Based on Confirmatory Factor Analysis and Exploratory Structural Equation Models

Educational and Psychological Measurement ◽

10.1177/00131644211008953 ◽

2021 ◽

pp. 001316442110089

Author(s):

Yuanshu Fu ◽

Zhonglin Wen ◽

Yang Wang

Keyword(s):

Factor Analysis ◽

Confirmatory Factor Analysis ◽

Structural Equation ◽

Reliability Estimation ◽

Model Fit ◽

Equation Modeling ◽

Factor Loadings ◽

Reliability Estimates ◽

Confirmatory Factor ◽

Composite Reliability

Composite reliability, or coefficient omega, can be estimated using structural equation modeling. Composite reliability is usually estimated under the basic independent clusters model of confirmatory factor analysis (ICM-CFA). However, due to the existence of cross-loadings, the model fit of the exploratory structural equation model (ESEM) is often found to be substantially better than that of ICM-CFA. The present study first illustrated the method used to estimate composite reliability under ESEM and then compared the difference between ESEM and ICM-CFA in terms of composite reliability estimation under various indicators per factor, target factor loadings, cross-loadings, and sample sizes. The results showed no apparent difference in using ESEM or ICM-CFA for estimating composite reliability, and the rotation type did not affect the composite reliability estimates generated by ESEM. An empirical example was given as further proof of the results of the simulation studies. Based on the present study, we suggest that if the model fit of ESEM (regardless of the utilized rotation criteria) is acceptable but that of ICM-CFA is not, the composite reliability estimates based on the above two models should be similar. If the target factor loadings are relatively small, researchers should increase the number of indicators per factor or increase the sample size.

Download Full-text

Competitive Caching with Machine Learned Advice

Journal of the ACM ◽

10.1145/3447579 ◽

2021 ◽

Vol 68 (4) ◽

pp. 1-25

Author(s):

Thodoris Lykouris ◽

Sergei Vassilvitskii

Keyword(s):

Online Algorithms ◽

Empirical Evaluation ◽

Optimal Solution ◽

Poor Performance ◽

Machine Learning Algorithms ◽

Average Error ◽

Generalization Error ◽

Worst Case ◽

Future Events ◽

Real World Datasets

Traditional online algorithms encapsulate decision making under uncertainty, and give ways to hedge against all possible future events, while guaranteeing a nearly optimal solution, as compared to an offline optimum. On the other hand, machine learning algorithms are in the business of extrapolating patterns found in the data to predict the future, and usually come with strong guarantees on the expected generalization error. In this work, we develop a framework for augmenting online algorithms with a machine learned predictor to achieve competitive ratios that provably improve upon unconditional worst-case lower bounds when the predictor has low error. Our approach treats the predictor as a complete black box and is not dependent on its inner workings or the exact distribution of its errors. We apply this framework to the traditional caching problem—creating an eviction strategy for a cache of size k . We demonstrate that naively following the oracle’s recommendations may lead to very poor performance, even when the average error is quite low. Instead, we show how to modify the Marker algorithm to take into account the predictions and prove that this combined approach achieves a competitive ratio that both (i) decreases as the predictor’s error decreases and (ii) is always capped by O (log k ), which can be achieved without any assistance from the predictor. We complement our results with an empirical evaluation of our algorithm on real-world datasets and show that it performs well empirically even when using simple off-the-shelf predictions.

Download Full-text

Regional Mapping of Groundwater Potential in Ar Rub Al Khali, Arabian Peninsula Using the Classification and Regression Trees Model

Remote Sensing ◽

10.3390/rs13122300 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2300

Author(s):

Samy Elmahdy ◽

Tarig Ali ◽

Mohamed Mohamed

Keyword(s):

Machine Learning ◽

Regional Scale ◽

Regression Trees ◽

Classification And Regression Trees ◽

Groundwater Potential ◽

Machine Learning Algorithms ◽

Conditioning Factors ◽

Potential Mapping ◽

Classification And Regression ◽

Groundwater Potential Mapping

Mapping of groundwater potential in remote arid and semi-arid regions underneath sand sheets over a very regional scale is a challenge and requires an accurate classifier. The Classification and Regression Trees (CART) model is a robust machine learning classifier used in groundwater potential mapping over a very regional scale. Ten essential groundwater conditioning factors (GWCFs) were constructed using remote sensing data. The spatial relationship between these conditioning factors and the observed groundwater wells locations was optimized and identified by using the chi-square method. A total of 185 groundwater well locations were randomly divided into 129 (70%) for training the model and 56 (30%) for validation. The model was applied for groundwater potential mapping by using optimal parameters values for additive trees were 186, the value for the learning rate was 0.1, and the maximum size of the tree was five. The validation result demonstrated that the area under the curve (AUC) of the CART was 0.920, which represents a predictive accuracy of 92%. The resulting map demonstrated that the depressions of Mondafan, Khujaymah and Wajid Mutaridah depression and the southern gulf salt basin (SGSB) near Saudi Arabia, Oman and the United Arab Emirates (UAE) borders reserve fresh fossil groundwater as indicated from the observed lakes and recovered paleolakes. The proposed model and the new maps are effective at enhancing the mapping of groundwater potential over a very regional scale obtained using machine learning algorithms, which are used rarely in the literature and can be applied to the Sahara and the Kalahari Desert.

Download Full-text

Reliability Estimates for IRT-Based Forced-Choice Assessment Scores

Organizational Research Methods ◽

10.1177/1094428121999086 ◽

2021 ◽

pp. 109442812199908

Author(s):

Yin Lin

Keyword(s):

Impression Management ◽

Empirical Studies ◽

Forced Choice ◽

Reliability Estimation ◽

Estimation Methods ◽

High Stakes ◽

Personnel Decisions ◽

Assessment Scores ◽

Reliability Estimates ◽

Different Types

Forced-choice (FC) assessments of noncognitive psychological constructs (e.g., personality, behavioral tendencies) are popular in high-stakes organizational testing scenarios (e.g., informing hiring decisions) due to their enhanced resistance against response distortions (e.g., faking good, impression management). The measurement precisions of FC assessment scores used to inform personnel decisions are of paramount importance in practice. Different types of reliability estimates are reported for FC assessment scores in current publications, while consensus on best practices appears to be lacking. In order to provide understanding and structure around the reporting of FC reliability, this study systematically examined different types of reliability estimation methods for Thurstonian IRT-based FC assessment scores: their theoretical differences were discussed, and their numerical differences were illustrated through a series of simulations and empirical studies. In doing so, this study provides a practical guide for appraising different reliability estimation methods for IRT-based FC assessment scores.

Download Full-text

To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Cognitive Computation ◽

10.1007/s12559-021-09826-9 ◽

2021 ◽

Author(s):

Kristian Miok ◽

Blaž Škrlj ◽

Daniela Zaharie ◽

Marko Robnik-Šikonja

Keyword(s):

Monte Carlo ◽

Hate Speech ◽

Classification Performance ◽

Reliability Estimation ◽

Superior Performance ◽

Speech Detection ◽

Attention Networks ◽

Reliability Estimates ◽

Viable Mechanism ◽

Affective Dimensions

AbstractHate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, have achieved superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test whether affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it offers state-of-the-art classification performance and can detect less trusted predictions.

Download Full-text

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1019 ◽

2021 ◽

Author(s):

Gediminas Adomavicius ◽

Yaqiong Wang

Keyword(s):

Machine Learning ◽

General Purpose ◽

Reliability Estimation ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Learning Techniques ◽

Reliability Indicator ◽

Machine Learning Approach ◽

Prediction Reliability

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Download Full-text

DiagnosisQA: A semi-automated pipeline for developing clinician validated diagnosis specific QA datasets.

10.1101/2021.11.10.21266184 ◽

2021 ◽

Author(s):

Shreya Mishra ◽

Raghav Awasthi ◽

Frank Papay ◽

Kamal Maheshawari ◽

Jacek B Cywinski ◽

...

Keyword(s):

Question Answering ◽

State Of The Art ◽

Healthcare Providers ◽

Unstructured Data ◽

Significant Progress ◽

Improve Performance ◽

Healthcare Data ◽

Research Areas ◽

Benchmark Datasets ◽

Automated Pipeline

Question answering (QA) is one of the oldest research areas of AI and Compu- national Linguistics. QA has seen significant progress with the development of state-of-the-art models and benchmark datasets over the last few years. However, pre-trained QA models perform poorly for clinical QA tasks, presumably due to the complexity of electronic healthcare data. With the digitization of healthcare data and the increasing volume of unstructured data, it is extremely important for healthcare providers to have a mechanism to query the data to find appropriate answers. Since diagnosis is central to any decision-making for the clinicians and patients, we have created a pipeline to develop diagnosis-specific QA datasets and curated a QA database for the Cerebrovascular Accident (CVA). CVA, also commonly known as Stroke, is an important and commonly occurring diagnosis amongst critically ill patients. Our method when compared to clinician validation achieved an accuracy of 0.90(with 90% CI [0.82,0.99]). Using our method, we hope to overcome the key challenges of building and validating a highly accurate QA dataset in a semiautomated manner which can help improve performance of QA models.

Download Full-text

Unsupervised Outlier Detection in Multidimensional Data

10.21203/rs.3.rs-250665/v1 ◽

2021 ◽

Author(s):

Atiq Rehman ◽

Samir Brahim Belhaouari

Keyword(s):

State Of The Art ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

High Dimensions ◽

Comprehensive Performance ◽

Benchmark Datasets ◽

Distance Vector ◽

Detection Schemes ◽

Unsupervised Outlier Detection ◽

Better Than

Abstract Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use only a single dimensional distance vector to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.

Download Full-text

Comparison of Reliability in Seventeen European Countries Using the Quasi-Simplex Model

Measurement Error in Longitudinal Data ◽

10.1093/oso/9780198859987.003.0016 ◽

2021 ◽

pp. 383-404

Author(s):

Johana Chylíková

Keyword(s):

European Union ◽

The Netherlands ◽

Reliability Estimation ◽

European Countries ◽

Eastern European ◽

The European Union ◽

European Regions ◽

Financial Situation ◽

Reliability Estimates ◽

Simplex Model

The aim of this chapter is to illustrate the application of the quasi-simplex model (QSM) for reliability estimation in longitudinal data and to employ it to obtain information about the reliability of the European Union—Survey on Income and Living Conditions (EU-SILC) data collected between 2012 and 2017. Reliability of two survey questions is analysed: one which asks respondents about the financial situation in their households, and one which requests information about respondents’ health. Employing the QSM on the two items resulted in 80 reliability estimates from 17 and 11 European countries, respectively. Results revealed statistically significant differences in reliability between post-communist Central and Eastern European (CEE) countries and the rest of Europe, and similar patterns of the size of reliability estimates were observed for both items. The highest reliability (i.e. reliability over 0.8) was observed in CEE countries such as Bulgaria, Romania, Czechia, Poland, and Hungary. The lowest reliability (i.e. reliability lower than 0.7) was observed for data from Sweden, Slovenia, Norway, Spain, Portugal, Austria, Italy, and the Netherlands. The remarkable variation in longitudinal reliability across culturally and historically different European regions is discussed both from substantive and methodological perspectives.

Download Full-text