Explainable and Interpretable Anomaly Detection Models for Production Data

Summary Trusting a machine-learning model is a critical factor that will speed the spread of the fourth industrial revolution. Trust can be achieved by understanding how a model is making decisions. For white-box models, it is easy to “see” the model and examine its prediction. For black-box models, the explanation of the decision process is not straightforward. In this work, we compare the performance of several white- and black-box models on two production data sets in an anomaly detection task. The presence of anomalies in production data can significantly influence business decisions and misrepresent the results of the analysis, if not identified. Therefore, identifying anomalies is a crucial and necessary step to maintain safety and ensure that the wells perform at full capacity. To achieve this, we compare the performance of K-nearest neighbor (KNN), logistic regression (Logit), support vector machines (SVMs), decision tree (DT), random forest (RF), and rule fit classifier (RFC). F1 and complexity are the two main metrics used to compare the prediction performance and interpretability of these models. In one data set, RFC outperformed the remaining models in both F1 and complexity, where F1 = 0.92, and complexity = 0.5. In the second data set, RF outperformed the rest in prediction performance with F1 = 0.84, yet it had the lowest complexity metric (0.04). We further analyzed the best performing models by explaining their predictions using local interpretable model-agnostic explanations, which provide justification for decisions made for each instance. Additionally, we evaluated the global rules learned from white-box models. Local and global analysis enable decision makers to understand how and why models are making certain decisions, which in turn allows trusting the models.

Download Full-text

Permutation-based Identification of Important Biomarkers for Complex Diseases via Black-box Models

10.1101/2020.04.27.064170 ◽

2020 ◽

Author(s):

Xinlei Mi ◽

Baiming Zou ◽

Fei Zou ◽

Jianhua Hu

Keyword(s):

Human Disease ◽

Molecular Mechanisms ◽

Black Box ◽

The Cancer Genome Atlas ◽

Human Diseases ◽

Support Vector ◽

Individual Feature ◽

Box Models ◽

Feature Importance ◽

Black Box Models

AbstractStudy of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Many machine learning-based methods, including deep learning and random forest, have been developed and widely used to alleviate some analytic challenges in complex human disease studies. While enjoying the modeling flexibility and robustness, these model frameworks suffer from non-transparency and difficulty in interpreting the role of each individual feature due to their intrinsic black-box natures. However, identifying important biomarkers associated with complex human diseases is a critical pursuit towards assisting researchers to establish novel hypotheses regarding prevention, diagnosis and treatment of complex human diseases. Herein, we propose a Permutation-based Feature Importance Test (PermFIT) for estimating and testing the feature importance, and for assisting interpretation of individual feature in various black-box frameworks, including deep neural networks, random forests, and support vector machines. PermFIT (available at https://github.com/SkadiEye/deepTL) is implemented in a computationally efficient manner, without model refitting for each permuted data. We conduct extensive numerical studies under various scenarios, and show that PermFIT not only yields valid statistical inference, but also helps to improve the prediction accuracy of black-box models with top selected features. With the application to the Cancer Genome Atlas (TCGA) kidney tumor data and the HITChip atlas BMI data, PermFIT clearly demonstrates its practical usage in identifying important biomarkers and boosting performance of black-box predictive models.

Download Full-text

Explainable Artificial Intelligence (xAI) Approaches and Deep Meta-Learning Models

Advances and Applications in Deep Learning ◽

10.5772/intechopen.92172 ◽

2020 ◽

Author(s):

Evren Dağlarli

Keyword(s):

Artificial Intelligence ◽

Deep Learning ◽

Black Box ◽

Learning Models ◽

Learning Methods ◽

Data Set ◽

Box Models ◽

Explainable Artificial Intelligence ◽

Artificial Neural ◽

Black Box Models

The explainable artificial intelligence (xAI) is one of the interesting issues that has emerged recently. Many researchers are trying to deal with the subject with different dimensions and interesting results that have come out. However, we are still at the beginning of the way to understand these types of models. The forthcoming years are expected to be years in which the openness of deep learning models is discussed. In classical artificial intelligence approaches, we frequently encounter deep learning methods available today. These deep learning methods can yield highly effective results according to the data set size, data set quality, the methods used in feature extraction, the hyper parameter set used in deep learning models, the activation functions, and the optimization algorithms. However, there are important shortcomings that current deep learning models are currently inadequate. These artificial neural network-based models are black box models that generalize the data transmitted to it and learn from the data. Therefore, the relational link between input and output is not observable. This is an important open point in artificial neural networks and deep learning models. For these reasons, it is necessary to make serious efforts on the explainability and interpretability of black box models.

Download Full-text

Branch and Bound Algorithm Based on Prediction Error of Metamodel for Computational Electromagnetics

Energies ◽

10.3390/en13246749 ◽

2020 ◽

Vol 13 (24) ◽

pp. 6749

Author(s):

Reda El Bechari ◽

Stéphane Brisset ◽

Stéphane Clénet ◽

Frédéric Guyomarch ◽

Jean Claude Mipo

Keyword(s):

Branch And Bound ◽

Prediction Error ◽

Global Solutions ◽

Black Box ◽

High Fidelity ◽

Electromagnetic Devices ◽

Box Models ◽

Element Simulation ◽

The Cost ◽

Black Box Models

Metamodels proved to be a very efficient strategy for optimizing expensive black-box models, e.g., Finite Element simulation for electromagnetic devices. It enables the reduction of the computational burden for optimization purposes. However, the conventional approach of using metamodels presents limitations such as the cost of metamodel fitting and infill criteria problem-solving. This paper proposes a new algorithm that combines metamodels with a branch and bound (B&B) strategy. However, the efficiency of the B&B algorithm relies on the estimation of the bounds; therefore, we investigated the prediction error given by metamodels to predict the bounds. This combination leads to high fidelity global solutions. We propose a comparison protocol to assess the approach’s performances with respect to those of other algorithms of different categories. Then, two electromagnetic optimization benchmarks are treated. This paper gives practical insights into algorithms that can be used when optimizing electromagnetic devices.

Download Full-text

Soft-sensors based on Black-box Models for Bioreactors Monitoring and State Estimation

Proceedings of the 2020 12th International Conference on Bioinformatics and Biomedical Technology ◽

10.1145/3405758.3405780 ◽

2020 ◽

Author(s):

Vygandas Vaitkus ◽

Kęstutis Brazauskas ◽

Jolanta Repšytė

Keyword(s):

State Estimation ◽

Black Box ◽

Soft Sensors ◽

Box Models ◽

Black Box Models

Download Full-text

Study on Individual Differences in Thermal Stress Using Black Box Models

10.4271/2004-01-2287 ◽

2004 ◽

Author(s):

Tai S. Jang ◽

Anthony Iyoho ◽

S. S. Nair

Keyword(s):

Thermal Stress ◽

Individual Differences ◽

Black Box ◽

Box Models ◽

Black Box Models

Download Full-text

Creating Dynamic Pretrade Models: Beyond the Black Box

The Journal of Trading ◽

10.3905/jot.2018.13.4.041 ◽

2018 ◽

Keyword(s):

Functional Form ◽

Information Leakage ◽

Black Box ◽

Sensitivity Analyses ◽

Stock Selection ◽

Market Participants ◽

Box Models ◽

Portfolio Managers ◽

Black Box Models ◽

Shed Light

We provide a framework for investment managers to create dynamic pretrade models. The approach helps market participants shed light on vendor black-box models that often do not provide any transparency into the model’s functional form or working mechanics. In addition, this allows portfolio managers to create consensus estimates based on their own expectations, such as forecasted liquidity and volatility, and to incorporate firm proprietary alpha estimates into the solution. These techniques allow managers to reduce overdependency on any one black-box model, incorporate costs into the stock selection and portfolio optimization phase of the investment cycle, and perform “what-if” and sensitivity analyses without the risk of information leakage to any outside party or vendor.

Download Full-text

Trajectory Optimization under Changing Conditions through Evolutionary Approach and Black-Box Models with Refining

Distributed Computing and Artificial Intelligence - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-00551-5_33 ◽

2013 ◽

pp. 267-274

Author(s):

Karel Macek ◽

Jiří Rojíček ◽

Vladimír Bičík

Keyword(s):

Trajectory Optimization ◽

Black Box ◽

Evolutionary Approach ◽

Box Models ◽

Black Box Models

Download Full-text

Ordinal classification for efficient plant stress prediction in hyperspectral data

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-7-29-2014 ◽

2014 ◽

Vol XL-7 ◽

pp. 29-36 ◽

Cited By ~ 5

Author(s):

J. Behmann ◽

P. Schmitter ◽

J. Steinrücken ◽

L. Plümer

Keyword(s):

Linear Models ◽

Plant Stress ◽

Crop Protection ◽

Local Stress ◽

Prediction Performance ◽

Hyperspectral Data ◽

Hyperspectral Images ◽

Support Vector ◽

Data Set ◽

High Prediction

Detection of crop stress from hyperspectral images is of high importance for breeding and precision crop protection. However, the continuous monitoring of stress in phenotyping facilities by hyperspectral imagers produces huge amounts of uninterpreted data. In order to derive a stress description from the images, interpreting algorithms with high prediction performance are required. Based on a static model, the local stress state of each pixel has to be predicted. Due to the low computational complexity, linear models are preferable. <br><br> In this paper, we focus on drought-induced stress which is represented by discrete stages of ordinal order. We present and compare five methods which are able to derive stress levels from hyperspectral images: One-vs.-one Support Vector Machine (SVM), one-vs.-all SVM, Support Vector Regression (SVR), Support Vector Ordinal Regression (SVORIM) and Linear Ordinal SVM classification. The methods are applied on two data sets - a real world set of drought stress in single barley plants and a simulated data set. It is shown, that Linear Ordinal SVM is a powerful tool for applications which require high prediction performance under limited resources. It is significantly more efficient than the one-vs.-one SVM and even more efficient than the less accurate one-vs.-all SVM. Compared to the very compact SVORIM model, it represents the senescence process much more accurate.

Download Full-text

The Role of Textualisation and Argumentation in Understanding the Machine Learning Process

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/765 ◽

2017 ◽

Author(s):

Kacper Sokol ◽

Peter Flach

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Spatial Perception ◽

Black Box ◽

High Dimensional ◽

Box Models ◽

Machine Learning Applications ◽

Black Box Models ◽

Machine Learning Models

Understanding data, models and predictions is important for machine learning applications. Due to the limitations of our spatial perception and intuition, analysing high-dimensional data is inherently difficult. Furthermore, black-box models achieving high predictive accuracy are widely used, yet the logic behind their predictions is often opaque. Use of textualisation -- a natural language narrative of selected phenomena -- can tackle these shortcomings. When extended with argumentation theory we could envisage machine learning models and predictions arguing persuasively for their choices.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text