scholarly journals Estimation and Interpretation of Machine Learning Models with Customized Surrogate Model

Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 3045
Author(s):  
Mudabbir Ali ◽  
Asad Masood Khattak ◽  
Zain Ali ◽  
Bashir Hayat ◽  
Muhammad Idrees ◽  
...  

Machine learning has the potential to predict unseen data and thus improve the productivity and processes of daily life activities. Notwithstanding its adaptiveness, several sensitive applications based on such technology cannot compromise our trust in them; thus, highly accurate machine learning models require reason. Such models are black boxes for end-users. Therefore, the concept of interpretability plays the role if assisting users in a couple of ways. Interpretable models are models that possess the quality of explaining predictions. Different strategies have been proposed for the aforementioned concept but some of these require an excessive amount of effort, lack generalization, are not agnostic and are computationally expensive. Thus, in this work, we propose a strategy that can tackle the aforementioned issues. A surrogate model assisted us in building interpretable models. Moreover, it helped us achieve results with accuracy close to that of the black box model but with less processing time. Thus, the proposed technique is computationally cheaper than traditional methods. The significance of such a novel technique is that data science developers will not have to perform strenuous hands-on activities to undertake feature engineering tasks and end-users will have the graphical-based explanation of complex models in a comprehensive way—consequently building trust in a machine.

2021 ◽  
Author(s):  
Luc Thomès ◽  
Rebekka Burkholz ◽  
Daniel Bojar

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.


Hydrology ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 5
Author(s):  
Evangelos Rozos ◽  
Panayiotis Dimitriadis ◽  
Vasilis Bellos

Machine learning has been employed successfully as a tool virtually in every scientific and technological field. In hydrology, machine learning models first appeared as simple feed-forward networks that were used for short-term forecasting, and have evolved into complex models that can take into account even the static features of catchments, imitating the hydrological experience. Recent studies have found machine learning models to be robust and efficient, frequently outperforming the standard hydrological models (both conceptual and physically based). However, and despite some recent efforts, the results of the machine learning models require significant effort to interpret and derive inferences. Furthermore, all successful applications of machine learning in hydrology are based on networks of fairly complex topology that require significant computational power and CPU time to train. For these reasons, the value of the standard hydrological models remains indisputable. In this study, we suggest employing machine learning models not as a substitute for hydrological models, but as an independent tool to assess their performance. We argue that this approach can help to unveil the anomalies in catchment data that do not fit in the employed hydrological model structure or configuration, and to deal with them without compromising the understanding of the underlying physical processes.


2020 ◽  
Author(s):  
Ivan Miguel Pires ◽  
Faisal Hussain ◽  
Nuno M. Garcia ◽  
Eftim Zdravevski

Abstract The tremendous applications of human activity recognition are surging its span from health monitoring systems to virtual reality applications. Thus, the automatic recognition of daily life activities has become significant for numerous applications. In recent years, many datasets have been proposed to train the machine learning models for efficient monitoring and recognition of human daily living activities. However, the performance of machine learning models in activity recognition is crucially affected when there are incomplete activities in a dataset, i.e., having missing samples in dataset captures. Therefore, in this work, we propose a methodology for extrapolating the missing samples of a dataset to better recognize the human daily living activities. The proposed method efficiently pre-processes the data captures and utilizes the k-Nearest Neighbors (KNN) imputation technique to extrapolate the missing samples in dataset captures. The proposed methodology elegantly extrapolated a similar pattern of activities as they were in the real dataset.


2021 ◽  
Vol 12 (1) ◽  

AbstractBen Glocker (an expert in machine learning for medical imaging, Imperial College London), Mirco Musolesi (a data science and digital health expert, University College London), Jonathan Richens (an expert in diagnostic machine learning models, Babylon Health) and Caroline Uhler (a computational biology expert, MIT) talked to Nature Communications about their research interests in causality inference and how this can provide a robust framework for digital medicine studies and their implementation, across different fields of application.


2021 ◽  
Vol 11 (4) ◽  
pp. 1690
Author(s):  
Frederick W. Damen ◽  
David T. Newton ◽  
Guang Lin ◽  
Craig J. Goergen

Automatic boundary detection of 4D ultrasound (4DUS) cardiac data is a promising yet challenging application at the intersection of machine learning and medicine. Using recently developed murine 4DUS cardiac imaging data, we demonstrate here a set of three machine learning models that predict left ventricular wall kinematics along both the endo- and epi-cardial boundaries. Each model is fundamentally built on three key features: (1) the projection of raw US data to a lower dimensional subspace, (2) a smoothing spline basis across time, and (3) a strategic parameterization of the left ventricular boundaries. Model 1 is constructed such that boundary predictions are based on individual short-axis images, regardless of their relative position in the ventricle. Model 2 simultaneously incorporates parallel short-axis image data into their predictions. Model 3 builds on the multi-slice approach of model 2, but assists predictions with a single ground-truth position at end-diastole. To assess the performance of each model, Monte Carlo cross validation was used to assess the performance of each model on unseen data. For predicting the radial distance of the endocardium, models 1, 2, and 3 yielded average R2 values of 0.41, 0.49, and 0.71, respectively. Monte Carlo simulations of the endocardial wall showed significantly closer predictions when using model 2 versus model 1 at a rate of 48.67%, and using model 3 versus model 2 at a rate of 83.50%. These finding suggest that a machine learning approach where multi-slice data are simultaneously used as input and predictions are aided by a single user input yields the most robust performance. Subsequently, we explore the how metrics of cardiac kinematics compare between ground-truth contours and predicted boundaries. We observed negligible deviations from ground-truth when using predicted boundaries alone, except in the case of early diastolic strain rate, providing confidence for the use of such machine learning models for rapid and reliable assessments of murine cardiac function. To our knowledge, this is the first application of machine learning to murine left ventricular 4DUS data. Future work will be needed to strengthen both model performance and applicability to different cardiac disease models.


2021 ◽  
Author(s):  
_ _

Abstract For the past century, optimization of drilling has caught the eyes of many researchers. The main areas center on ROP, fluid treatment, and bit selection. They all share the same goal of maximizing ROP and reducing NPT. In other to develop an optimal control system, ROP must be predicted accurately, unfortunately, it is a complex parameter that is affected by multiple drilling parameters, rock properties, fluid properties, and bit selection. Models used for prediction have developed from empirical models like Bourgoyne and Young's to more intelligent models such as SVM and ANN. With the continuous increase in data obtained from sensors while drilling, there is still much work to be done in this field. In this research, the improvement of an empirical model and the development of an intelligent model are presented. The Bourgoyne and Young's model uses multiple linear regression to estimate coefficients which it then inserts into an empirical formula to predict ROP. This model was modified using non-linear curve-fitting to estimate the coefficients and make it reduce bias to generalize better. Machine learning models such as Gradient Boosting, Random Forest, ANN, and DNN were used in the development of a predictive model for the ROP. These models were easier to develop compared to the empirical model since they rely more on data rather than statistical formulas. The data used in this research include drilling data from 3 wells drilled in 2 fields within the Niger Delta region in Nigeria. The models were developed and trained on one of the wells, while the remaining two were used for testing the performance of the models. The modified empirical model improved the efficiency of the base model by 14% during validation but performs poorly on unseen data from the other two wells. The Machine learning models outperform the empirical models and perform accurately on unseen data from the other wells. DNN was the best performing model achieving an average accuracy of 0.987 for the 3 wells.


Author(s):  
Anass Misbah ◽  
Ahmed Ettalbi

<p class="0abstractCxSpFirst">Muti-view Web services have brought many advantages regarding the early abstraction of end users needs and constraints. Thus, security has been positively impacted by this paradigm, particularly, within Web services applications area, and then Multi-view Web services.</p><p class="0abstractCxSpMiddle">In our previous work, we introduce the concept of Multi-view Web services to Internet of Things architecture within a Cloud infrastructure by proposing a Proxy Security Layer which consists of Multi-view Web services allowing the identification and categorizing of all interacting IoT objects and applications so as to increase the level of security and improve the control of transactions.</p><p class="0abstractCxSpLast">Besides, Artificial Intelligence and especially Machine Learning are growing fast and are making it possible to simulate human being intelligence in many domains; consequently, it is more and more possible to process automatically a large amount of data in order to make decision, bring new insights or even detect new threats / opportunities that we were not able to detect before by simple human means.</p>In this work, we are bringing together the power of the Machine Learning models and The Multi-view Web services Proxy Security Layer so as to verify permanently the consistency of the access rules, detect the suspicious intrusions, update the policy and also optimize the Multi-view Web services for a better performance of the whole Internet of Things architecture.


Sign in / Sign up

Export Citation Format

Share Document