Estimation and Interpretation of Machine Learning Models with Customized Surrogate Model

Mudabbir Ali; Asad Masood Khattak; Zain Ali; Bashir Hayat; Muhammad Idrees; Zeeshan Pervez; Kashif Rizwan; Tae-Eung Sung; Ki-Il Kim

doi:10.3390/electronics10233045

Estimation and Interpretation of Machine Learning Models with Customized Surrogate Model

Electronics ◽

10.3390/electronics10233045 ◽

2021 ◽

Vol 10 (23) ◽

pp. 3045

Author(s):

Mudabbir Ali ◽

Asad Masood Khattak ◽

Zain Ali ◽

Bashir Hayat ◽

Muhammad Idrees ◽

...

Keyword(s):

Machine Learning ◽

Surrogate Model ◽

Data Science ◽

End Users ◽

Learning Models ◽

Daily Life Activities ◽

Unseen Data ◽

Complex Models ◽

Interpretable Models ◽

Machine Learning Models

Machine learning has the potential to predict unseen data and thus improve the productivity and processes of daily life activities. Notwithstanding its adaptiveness, several sensitive applications based on such technology cannot compromise our trust in them; thus, highly accurate machine learning models require reason. Such models are black boxes for end-users. Therefore, the concept of interpretability plays the role if assisting users in a couple of ways. Interpretable models are models that possess the quality of explaining predictions. Different strategies have been proposed for the aforementioned concept but some of these require an excessive amount of effort, lack generalization, are not agnostic and are computationally expensive. Thus, in this work, we propose a strategy that can tackle the aforementioned issues. A surrogate model assisted us in building interpretable models. Moreover, it helped us achieve results with accuracy close to that of the black box model but with less processing time. Thus, the proposed technique is computationally cheaper than traditional methods. The significance of such a novel technique is that data science developers will not have to perform strenuous hands-on activities to undertake feature engineering tasks and end-users will have the graphical-based explanation of complex models in a comprehensive way—consequently building trust in a machine.

Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data

Proceedings of the Workshop on Human-In-the-Loop Data Analytics - HILDA'19 ◽

10.1145/3328519.3329126 ◽

2019 ◽

Author(s):

Sergey Redyuk ◽

Sebastian Schelter ◽

Tammo Rukat ◽

Volker Markl ◽

Felix Biessmann

Keyword(s):

Machine Learning ◽

Black Box ◽

Learning Models ◽

Unseen Data ◽

Machine Learning Models

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Nature Machine Intelligence ◽

10.1038/s42256-019-0048-x ◽

2019 ◽

Vol 1 (5) ◽

pp. 206-215 ◽

Cited By ~ 296

Author(s):

Cynthia Rudin

Keyword(s):

Machine Learning ◽

Black Box ◽

Learning Models ◽

High Stakes ◽

Interpretable Models ◽

Machine Learning Models

Glycowork: A Python package for glycan data science and machine learning

10.1101/2021.04.22.440981 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Rebekka Burkholz ◽

Daniel Bojar

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Biological Processes ◽

Biological Sequence ◽

Learning Models ◽

Related Data ◽

Strong Focus ◽

Python Package ◽

Machine Learning Models

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.

A Review on Use of Data Science for Visualization and Prediction of the COVID-19 Pandemic and Early Diagnosis of COVID-19 Using Machine Learning Models

Studies in Big Data - Internet of Medical Things for Smart Healthcare ◽

10.1007/978-981-15-8097-0_10 ◽

2020 ◽

pp. 241-265

Author(s):

Shiv Kumar Choubey ◽

Harshit Naman

Keyword(s):

Machine Learning ◽

Early Diagnosis ◽

Data Science ◽

Learning Models ◽

Use Of Data ◽

Machine Learning Models

Machine Learning in Assessing the Performance of Hydrological Models

Hydrology ◽

10.3390/hydrology9010005 ◽

2021 ◽

Vol 9 (1) ◽

pp. 5

Author(s):

Evangelos Rozos ◽

Panayiotis Dimitriadis ◽

Vasilis Bellos

Keyword(s):

Machine Learning ◽

Hydrological Models ◽

Learning Models ◽

Technological Field ◽

Physically Based ◽

Complex Models ◽

Short Term Forecasting ◽

Applications Of Machine Learning ◽

Significant Effort ◽

Machine Learning Models

Machine learning has been employed successfully as a tool virtually in every scientific and technological field. In hydrology, machine learning models first appeared as simple feed-forward networks that were used for short-term forecasting, and have evolved into complex models that can take into account even the static features of catchments, imitating the hydrological experience. Recent studies have found machine learning models to be robust and efficient, frequently outperforming the standard hydrological models (both conceptual and physically based). However, and despite some recent efforts, the results of the machine learning models require significant effort to interpret and derive inferences. Furthermore, all successful applications of machine learning in hydrology are based on networks of fairly complex topology that require significant computational power and CPU time to train. For these reasons, the value of the standard hydrological models remains indisputable. In this study, we suggest employing machine learning models not as a substitute for hydrological models, but as an independent tool to assess their performance. We argue that this approach can help to unveil the anomalies in catchment data that do not fit in the employed hydrological model structure or configuration, and to deal with them without compromising the understanding of the underlying physical processes.

An Efficient Data Imputation Technique for Human Activity Recognition

10.21203/rs.3.rs-40843/v1 ◽

2020 ◽

Author(s):

Ivan Miguel Pires ◽

Faisal Hussain ◽

Nuno M. Garcia ◽

Eftim Zdravevski

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Human Activity ◽

Daily Living ◽

Human Activity Recognition ◽

Learning Models ◽

K Nearest Neighbors ◽

Daily Living Activities ◽

Daily Life Activities ◽

Machine Learning Models

Abstract The tremendous applications of human activity recognition are surging its span from health monitoring systems to virtual reality applications. Thus, the automatic recognition of daily life activities has become significant for numerous applications. In recent years, many datasets have been proposed to train the machine learning models for efficient monitoring and recognition of human daily living activities. However, the performance of machine learning models in activity recognition is crucially affected when there are incomplete activities in a dataset, i.e., having missing samples in dataset captures. Therefore, in this work, we propose a methodology for extrapolating the missing samples of a dataset to better recognize the human daily living activities. The proposed method efficiently pre-processes the data captures and utilizes the k-Nearest Neighbors (KNN) imputation technique to extrapolate the missing samples in dataset captures. The proposed methodology elegantly extrapolated a similar pattern of activities as they were in the real dataset.

Causality in digital medicine

Nature Communications ◽

10.1038/s41467-021-25743-9 ◽

2021 ◽

Vol 12 (1) ◽

Keyword(s):

Machine Learning ◽

Data Science ◽

Digital Health ◽

Imperial College ◽

Learning Models ◽

Digital Medicine ◽

Health Expert ◽

Causality Inference ◽

University College London ◽

Machine Learning Models

AbstractBen Glocker (an expert in machine learning for medical imaging, Imperial College London), Mirco Musolesi (a data science and digital health expert, University College London), Jonathan Richens (an expert in diagnostic machine learning models, Babylon Health) and Caroline Uhler (a computational biology expert, MIT) talked to Nature Communications about their research interests in causality inference and how this can provide a robust framework for digital medicine studies and their implementation, across different fields of application.

Machine Learning Driven Contouring of High-Frequency Four-Dimensional Cardiac Ultrasound Data

Applied Sciences ◽

10.3390/app11041690 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1690

Author(s):

Frederick W. Damen ◽

David T. Newton ◽

Guang Lin ◽

Craig J. Goergen

Keyword(s):

Machine Learning ◽

Monte Carlo ◽

Radial Distance ◽

Ground Truth ◽

Left Ventricular ◽

Short Axis ◽

Learning Models ◽

Versus Model ◽

Unseen Data ◽

Machine Learning Models

Automatic boundary detection of 4D ultrasound (4DUS) cardiac data is a promising yet challenging application at the intersection of machine learning and medicine. Using recently developed murine 4DUS cardiac imaging data, we demonstrate here a set of three machine learning models that predict left ventricular wall kinematics along both the endo- and epi-cardial boundaries. Each model is fundamentally built on three key features: (1) the projection of raw US data to a lower dimensional subspace, (2) a smoothing spline basis across time, and (3) a strategic parameterization of the left ventricular boundaries. Model 1 is constructed such that boundary predictions are based on individual short-axis images, regardless of their relative position in the ventricle. Model 2 simultaneously incorporates parallel short-axis image data into their predictions. Model 3 builds on the multi-slice approach of model 2, but assists predictions with a single ground-truth position at end-diastole. To assess the performance of each model, Monte Carlo cross validation was used to assess the performance of each model on unseen data. For predicting the radial distance of the endocardium, models 1, 2, and 3 yielded average R2 values of 0.41, 0.49, and 0.71, respectively. Monte Carlo simulations of the endocardial wall showed significantly closer predictions when using model 2 versus model 1 at a rate of 48.67%, and using model 3 versus model 2 at a rate of 83.50%. These finding suggest that a machine learning approach where multi-slice data are simultaneously used as input and predictions are aided by a single user input yields the most robust performance. Subsequently, we explore the how metrics of cardiac kinematics compare between ground-truth contours and predicted boundaries. We observed negligible deviations from ground-truth when using predicted boundaries alone, except in the case of early diastolic strain rate, providing confidence for the use of such machine learning models for rapid and reliable assessments of murine cardiac function. To our knowledge, this is the first application of machine learning to murine left ventricular 4DUS data. Future work will be needed to strengthen both model performance and applicability to different cardiac disease models.

Development of an Optimal Model For Rate of Penetration Rop Using Deep Neural Networks DNN.

10.2118/207161-ms ◽

2021 ◽

Author(s):

_ _

Keyword(s):

Machine Learning ◽

Empirical Model ◽

Empirical Models ◽

The Other ◽

Gradient Boosting ◽

Past Century ◽

Learning Models ◽

Continuous Increase ◽

Unseen Data ◽

Machine Learning Models

Abstract For the past century, optimization of drilling has caught the eyes of many researchers. The main areas center on ROP, fluid treatment, and bit selection. They all share the same goal of maximizing ROP and reducing NPT. In other to develop an optimal control system, ROP must be predicted accurately, unfortunately, it is a complex parameter that is affected by multiple drilling parameters, rock properties, fluid properties, and bit selection. Models used for prediction have developed from empirical models like Bourgoyne and Young's to more intelligent models such as SVM and ANN. With the continuous increase in data obtained from sensors while drilling, there is still much work to be done in this field. In this research, the improvement of an empirical model and the development of an intelligent model are presented. The Bourgoyne and Young's model uses multiple linear regression to estimate coefficients which it then inserts into an empirical formula to predict ROP. This model was modified using non-linear curve-fitting to estimate the coefficients and make it reduce bias to generalize better. Machine learning models such as Gradient Boosting, Random Forest, ANN, and DNN were used in the development of a predictive model for the ROP. These models were easier to develop compared to the empirical model since they rely more on data rather than statistical formulas. The data used in this research include drilling data from 3 wells drilled in 2 fields within the Niger Delta region in Nigeria. The models were developed and trained on one of the wells, while the remaining two were used for testing the performance of the models. The modified empirical model improved the efficiency of the base model by 14% during validation but performs poorly on unseen data from the other two wells. The Machine learning models outperform the empirical models and perform accurately on unseen data from the other wells. DNN was the best performing model achieving an average accuracy of 0.987 for the 3 wells.

Towards Machine Learning Models as a Key Mean to Train and Optimize Multi-view Web Services Proxy Security Layer

International Journal of Recent Contributions from Engineering Science & IT (iJES) ◽

10.3991/ijes.v6i4.9883 ◽

2018 ◽

Vol 6 (4) ◽

pp. 65

Author(s):

Anass Misbah ◽

Ahmed Ettalbi

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Internet Of Things ◽

Web Services ◽

End Users ◽

Human Being ◽

Learning Models ◽

Cloud Infrastructure ◽

Machine Learning Models

Muti-view Web services have brought many advantages regarding the early abstraction of end users needs and constraints. Thus, security has been positively impacted by this paradigm, particularly, within Web services applications area, and then Multi-view Web services.In our previous work, we introduce the concept of Multi-view Web services to Internet of Things architecture within a Cloud infrastructure by proposing a Proxy Security Layer which consists of Multi-view Web services allowing the identification and categorizing of all interacting IoT objects and applications so as to increase the level of security and improve the control of transactions.Besides, Artificial Intelligence and especially Machine Learning are growing fast and are making it possible to simulate human being intelligence in many domains; consequently, it is more and more possible to process automatically a large amount of data in order to make decision, bring new insights or even detect new threats / opportunities that we were not able to detect before by simple human means.In this work, we are bringing together the power of the Machine Learning models and The Multi-view Web services Proxy Security Layer so as to verify permanently the consistency of the access rules, detect the suspicious intrusions, update the policy and also optimize the Multi-view Web services for a better performance of the whole Internet of Things architecture.