black box models
Recently Published Documents


TOTAL DOCUMENTS

177
(FIVE YEARS 71)

H-INDEX

17
(FIVE YEARS 6)

Author(s):  
LEOPOLDO BERTOSSI

Abstract We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely responsibility scores. The approach and the programs can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus of this study is on the specification and computation of best counterfactual entities, that is, those that lead to maximum responsibility scores. From them one can read off the explanations as maximum responsibility feature values in the original entity. We also extend the programs to bring into the picture semantic or domain knowledge. We show how the approach could be extended by means of probabilistic methods, and how the underlying probability distributions could be modified through the use of constraints. Several examples of programs written in the syntax of the DLV ASP-solver, and run with it, are shown.


Sports ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 5
Author(s):  
Alessio Rossi ◽  
Luca Pappalardo ◽  
Paolo Cintia

In the last decade, the number of studies about machine learning algorithms applied to sports, e.g., injury forecasting and athlete performance prediction, have rapidly increased. Due to the number of works and experiments already present in the state-of-the-art regarding machine-learning techniques in sport science, the aim of this narrative review is to provide a guideline describing a correct approach for training, validating, and testing machine learning models to predict events in sports science. The main contribution of this narrative review is to highlight any possible strengths and limitations during all the stages of model development, i.e., training, validation, testing, and interpretation, in order to limit possible errors that could induce misleading results. In particular, this paper shows an example about injury forecaster that provides a description of all the features that could be used to predict injuries, all the possible pre-processing approaches for time series analysis, how to correctly split the dataset to train and test the predictive models, and the importance to explain the decision-making approach of the white and black box models.


Risks ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 3
Author(s):  
Spencer Matthews ◽  
Brian Hartman

Two-part models are important to and used throughout insurance and actuarial science. Since insurance is required for registering a car, obtaining a mortgage, and participating in certain businesses, it is especially important that the models that price insurance policies are fair and non-discriminatory. Black box models can make it very difficult to know which covariates are influencing the results, resulting in model risk and bias. SHAP (SHapley Additive exPlanations) values enable interpretation of various black box models, but little progress has been made in two-part models. In this paper, we propose mSHAP (or multiplicative SHAP), a method for computing SHAP values of two-part models using the SHAP values of the individual models. This method will allow for the predictions of two-part models to be explained at an individual observation level. After developing mSHAP, we perform an in-depth simulation study. Although the kernelSHAP algorithm is also capable of computing approximate SHAP values for a two-part model, a comparison with our method demonstrates that mSHAP is exponentially faster. Ultimately, we apply mSHAP to a two-part ratemaking model for personal auto property damage insurance coverage. Additionally, an R package (mshap) is available to easily implement the method in a wide variety of applications.


2021 ◽  
Vol 2021 (12) ◽  
pp. 124007
Author(s):  
Christoph Feinauer ◽  
Carlo Lucibello

Abstract Pairwise models like the Ising model or the generalized Potts model have found many successful applications in fields like physics, biology, and economics. Closely connected is the problem of inverse statistical mechanics, where the goal is to infer the parameters of such models given observed data. An open problem in this field is the question of how to train these models in the case where the data contain additional higher-order interactions that are not present in the pairwise model. In this work, we propose an approach based on energy-based models and pseudolikelihood maximization to address these complications: we show that hybrid models, which combine a pairwise model and a neural network, can lead to significant improvements in the reconstruction of pairwise interactions. We show these improvements to hold consistently when compared to a standard approach using only the pairwise model and to an approach using only a neural network. This is in line with the general idea that simple interpretable models and complex black-box models are not necessarily a dichotomy: interpolating these two classes of models can allow to keep some advantages of both.


Energies ◽  
2021 ◽  
Vol 14 (23) ◽  
pp. 7865
Author(s):  
Saeid Shahpouri ◽  
Armin Norouzi ◽  
Christopher Hayduk ◽  
Reza Rezaei ◽  
Mahdi Shahbakhti ◽  
...  

The standards for emissions from diesel engines are becoming more stringent and accurate emission modeling is crucial in order to control the engine to meet these standards. Soot emissions are formed through a complex process and are challenging to model. A comprehensive analysis of diesel engine soot emissions modeling for control applications is presented in this paper. Physical, black-box, and gray-box models are developed for soot emissions prediction. Additionally, different feature sets based on the least absolute shrinkage and selection operator (LASSO) feature selection method and physical knowledge are examined to develop computationally efficient soot models with good precision. The physical model is a virtual engine modeled in GT-Power software that is parameterized using a portion of experimental data. Different machine learning methods, including Regression Tree (RT), Ensemble of Regression Trees (ERT), Support Vector Machines (SVM), Gaussian Process Regression (GPR), Artificial Neural Network (ANN), and Bayesian Neural Network (BNN) are used to develop the black-box models. The gray-box models include a combination of the physical and black-box models. A total of five feature sets and eight different machine learning methods are tested. An analysis of the accuracy, training time and test time of the models is performed using the K-means clustering algorithm. It provides a systematic way for categorizing the feature sets and methods based on their performance and selecting the best method for a specific application. According to the analysis, the black-box model consisting of GPR and feature selection by LASSO shows the best performance with test R2 of 0.96. The best gray-box model consists of SVM-based method with physical insight feature set along with LASSO for feature selection with test R2 of 0.97.


SPE Journal ◽  
2021 ◽  
pp. 1-15
Author(s):  
Basma Alharbi ◽  
Zhenwen Liang ◽  
Jana M. Aljindan ◽  
Ammar K. Agnia ◽  
Xiangliang Zhang

Summary Trusting a machine-learning model is a critical factor that will speed the spread of the fourth industrial revolution. Trust can be achieved by understanding how a model is making decisions. For white-box models, it is easy to “see” the model and examine its prediction. For black-box models, the explanation of the decision process is not straightforward. In this work, we compare the performance of several white- and black-box models on two production data sets in an anomaly detection task. The presence of anomalies in production data can significantly influence business decisions and misrepresent the results of the analysis, if not identified. Therefore, identifying anomalies is a crucial and necessary step to maintain safety and ensure that the wells perform at full capacity. To achieve this, we compare the performance of K-nearest neighbor (KNN), logistic regression (Logit), support vector machines (SVMs), decision tree (DT), random forest (RF), and rule fit classifier (RFC). F1 and complexity are the two main metrics used to compare the prediction performance and interpretability of these models. In one data set, RFC outperformed the remaining models in both F1 and complexity, where F1 = 0.92, and complexity = 0.5. In the second data set, RF outperformed the rest in prediction performance with F1 = 0.84, yet it had the lowest complexity metric (0.04). We further analyzed the best performing models by explaining their predictions using local interpretable model-agnostic explanations, which provide justification for decisions made for each instance. Additionally, we evaluated the global rules learned from white-box models. Local and global analysis enable decision makers to understand how and why models are making certain decisions, which in turn allows trusting the models.


Energies ◽  
2021 ◽  
Vol 14 (21) ◽  
pp. 6899
Author(s):  
Fisnik Loku ◽  
Patrick Düllmann ◽  
Christina Brantl ◽  
Antonello Monti

A major challenge in the development of multi-vendor HVDC networks are converter control interactions. While recent publications have reported interoperability issues such as persistent oscillations for first multi-vendor HVDC setups with AC-side coupling, multi-terminal HVDC networks are expected to face similar challenges. To investigate DC-side control interactions and mitigate possible interoperability issues, several methods based on the converters’ and DC network’s impedances have been proposed in literature. For DC network’s impedance modelling, most methods require detailed knowledge of all converters’ design and controls. However, in multi-vendor HVDC networks, converter control parameters are not expected to be shared due to proprietary reasons. Therefore, to facilitate impedance-based stability analyses in multi-vendor MTDC networks, methods that do not require the disclosure of the existing converter controls are needed. Here, detailed impedance measurements can be applied; however, they are time-consuming and require new measurement for a single configuration change. This paper proposes an equivalent impedance calculation method suitable for multi-vendor DC networks, which for available black-box models or converter impedance characteristics can be modularly applied for various network configurations, including different control settings and operating points, while significantly reducing the required time for obtaining an equivalent DC network impedance.


2021 ◽  
Author(s):  
Najlaa Maaroof ◽  
Antonio Moreno ◽  
Mohammed Jabreel ◽  
Aida Valls

Despite the broad adoption of Machine Learning models in many domains, they remain mostly black boxes. There is a pressing need to ensure Machine Learning models that are interpretable, so that designers and users can understand the reasons behind their predictions. In this work, we propose a new method called C-LORE-F to explain the decisions of fuzzy-based black box models. This new method uses some contextual information about the attributes as well as the knowledge of the fuzzy sets associated to the linguistic labels of the fuzzy attributes to provide actionable explanations. The experimental results on three datasets reveal the effectiveness of C-LORE-F when compared with the most relevant related works.


Author(s):  
Pooja Thakkar

Abstract: The focus of this study is on drug categorization utilising Machine Learning models, as well as interpretability utilizing LIME and SHAP to get a thorough understanding of the ML models. To do this, the researchers used machine learning models such as random forest, decision tree, and logistic regression to classify drugs. Then, using LIME and SHAP, they determined if these models were interpretable, which allowed them to better understand their results. It may be stated at the conclusion of this paper that LIME and SHAP can be utilised to get insight into a Machine Learning model and determine which attribute is accountable for the divergence in the outcomes. According to the LIME and SHAP results, it is also discovered that Random Forest and Decision Tree ML models are the best models to employ for drug classification, with Na to K and BP being the most significant characteristics for drug classification. Keywords: Machine Learning, Back-box models, LIME, SHAP, Decision Tree


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Quentin Ferré ◽  
Jeanne Chèneby ◽  
Denis Puthier ◽  
Cécile Capponi ◽  
Benoît Ballester

Abstract Background Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision. Results Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions’ representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database’s large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models. Conclusion Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak.


Sign in / Sign up

Export Citation Format

Share Document