Interpretable Machine Learning for COVID-19 Diagnosis Through Clinical Variables

2020 ◽  
Author(s):  
Lucas M. Thimoteo ◽  
Marley M. Vellasco ◽  
Jorge M. do Amaral ◽  
Karla Figueiredo ◽  
Cátia Lie Yokoyama ◽  
...  

This work proposes an interpretable machine learning approach to diagnosesuspected COVID-19 cases based on clinical variables. Results obtained for the proposed models have F-2 measure superior to 0.80 and accuracy superior to 0.85. Interpretation of the linear model feature importance brought insights about the most relevant features. Shapley Additive Explanations were used in the non-linear models. They were able to show the difference between positive and negative patients as well as offer a global interpretability sense of the models.

Molecules ◽  
2020 ◽  
Vol 25 (20) ◽  
pp. 4696
Author(s):  
Ștefan-Mihai Petrea ◽  
Mioara Costache ◽  
Dragoș Cristea ◽  
Ștefan-Adrian Strungaru ◽  
Ira-Adeline Simionov ◽  
...  

Metals are considered to be one of the most hazardous substances due to their potential for accumulation, magnification, persistence, and wide distribution in water, sediments, and aquatic organisms. Demersal fish species, such as turbot (Psetta maxima maeotica), are accepted by the scientific communities as suitable bioindicators of heavy metal pollution in the aquatic environment. The present study uses a machine learning approach, which is based on multiple linear and non-linear models, in order to effectively estimate the concentrations of heavy metals in both turbot muscle and liver tissues. For multiple linear regression (MLR) models, the stepwise method was used, while non-linear models were developed by applying random forest (RF) algorithm. The models were based on data that were provided from scientific literature, attributed to 11 heavy metals (As, Ca, Cd, Cu, Fe, K, Mg, Mn, Na, Ni, Zn) from both muscle and liver tissues of turbot exemplars. Significant MLR models were recorded for Ca, Fe, Mg, and Na in muscle tissue and K, Cu, Zn, and Na in turbot liver tissue. The non-linear tree-based RF prediction models (over 70% prediction accuracy) were identified for As, Cd, Cu, K, Mg, and Zn in muscle tissue and As, Ca, Cd, Mg, and Fe in turbot liver tissue. Both machine learning MLR and non-linear tree-based RF prediction models were identified to be suitable for predicting the heavy metal concentration from both turbot muscle and liver tissues. The models can be used for improving the knowledge and economic efficiency of linked heavy metals food safety and environment pollution studies.


Author(s):  
Vidyullatha P ◽  
D. Rajeswara Rao

<p>Curve fitting is one of the procedures in data analysis and is helpful for prediction analysis showing graphically how the data points are related to one another whether it is in linear or non-linear model. Usually, the curve fit will find the concentrates along the curve or it will just use to smooth the data and upgrade the presence of the plot. Curve fitting checks the relationship between independent variables and dependent variables with the objective of characterizing a good fit model. Curve fitting finds mathematical equation that best fits given information. In this paper, 150 unorganized data points of environmental variables are used to develop Linear and non-linear data modelling which are evaluated by utilizing 3 dimensional ‘Sftool’ and ‘Labfit’ machine learning techniques. In Linear model, the best estimations of the coefficients are realized by the estimation of R- square turns in to one and in Non-Linear models with least Chi-square are the criteria. </p>


Author(s):  
Harshal D. Akolekar ◽  
Yaomin Zhao ◽  
Richard D. Sandberg ◽  
Roberto Pacciani

Abstract This paper presents development of accurate turbulence closures for wake mixing prediction by integrating a machine-learning approach with Reynolds Averaged Navier-Stokes (RANS)-based computational fluid dynamics (CFD). The data-driven modeling framework is based on the gene expression programming (GEP) approach previously shown to generate non-linear RANS models with good accuracy. To further improve the performance and robustness of the data-driven closures, here we exploit that GEP produces tangible models to integrate RANS in the closure development process. Specifically, rather than using as cost function a comparison of the GEP-based closure terms with a frozen high-fidelity dataset, each GEP model is instead automatically implemented into a RANS solver and the subsequent calculation results compared with reference data. By first using a canonical turbine wake with inlet conditions prescribed based on high-fidelity data, we demonstrate that the CFD-driven machine-learning approach produces non-linear turbulence closures that are physically correct, i.e. predict the right downstream wake development and maintain an accurate peak wake loss throughout the domain. We then extend our analysis to full turbine-blade cases and show that the model development is sensitive to the training region due to the presence of deterministic unsteadiness in the near wake region. Models developed including this region have artificially large diffusion coefficients to over-compensate for the vortex shedding steady RANS cannot capture. In contrast, excluding the near wake region in the model development produces the correct physical model behavior, but predictive accuracy in the near-wake remains unsatisfactory. We show that this can be remedied by using the physically consistent models in unsteady RANS, implying that the non-linear closure producing the best predictive accuracy depends on whether it will be deployed in RANS or unsteady RANS calculations. Overall, the models developed with the CFD-assisted machine learning approach were found to be robust and capture the correct physical behavior across different operating conditions.


Sign in / Sign up

Export Citation Format

Share Document