Accuracy of Machine Learning Models to Predict Mortality in COVID-19 Infection Using the Clinical and Laboratory Data at the Time of Admission

Data consists of baseline clinical assessments of gait, mobility, and fall risk at the time of admission of 54 adults with dementia. Furthermore, it includes the participants' daily medication intake in three medication categories, and frequent assessments of gait performed via a computer vision-based ambient monitoring system.

Download Full-text

Effectiveness, Explainability and Reliability of Machine Meta-Learning Methods for Predicting Mortality in Patients with COVID-19: Results of the Brazilian COVID-19 Registry

10.1101/2021.11.01.21265527 ◽

2021 ◽

Author(s):

Bruno Barbosa Miranda de Paiva ◽

Polianna Delfino Pereira ◽

Claudio Moises Valiense de Andrade ◽

Virginia Mara Reis Gomes ◽

Maria Clara Pontello Barbosa Lima ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

State Of The Art ◽

Laboratory Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Learning Models ◽

Learning Methods ◽

Meta Learning ◽

Machine Learning Models

Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning

Download Full-text

Benchmarking of Machine Learning Models to Assist the Prognosis of Tuberculosis

10.20944/preprints202103.0284.v1 ◽

2021 ◽

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Élisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Laboratory Data ◽

Gradient Boosting ◽

Learning Models ◽

Middle Income ◽

Middle Income Countries ◽

Probability Of Death ◽

Health Database ◽

Incorrect Data ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. This work performs a benchmarking of machine learning models using a Brazilian health database related to TB confirmed cases and deaths, named SINAN-TB. The goal is to predict the probability of death by TB, assisting the TB prognosis and decision taking process. The database originally has 130 features, and many of these features had missing data, or incorrect data regarding the notification dates or birth dates, or were not related to the clinical and laboratory data. These data are treated, and after the preprocessing step, a new database with 38 features and 24,015 records is generated, having 22,876 TB cases and 1,139 deaths by TB. We design two experiments to investigated how the data unbalancing impacts on the models performance. With the evaluation of the f1-macro metric, we verify that the best result is achieved when using the imbalanced database, with the ensemble model that is composed of gradient boosting (GB), random forest (RF) and multi-layer perceptron (MLP) models.

Download Full-text

Ambient Monitoring of Gait and Machine Learning Models for Dynamic and Short-Term Falls Risk Assessment in People With Dementia

10.36227/techrxiv.16943395.v1 ◽

2021 ◽

Author(s):

Navid Korhani ◽

Babak Taati ◽

Andrea Iaboni ◽

Andrea Sabo ◽

Sina Mehdizadeh ◽

...

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Fall Risk ◽

Learning Models ◽

Short Term ◽

Ambient Monitoring ◽

Clinical Assessments ◽

People With Dementia ◽

Time Of Admission ◽

Machine Learning Models

Data consists of baseline clinical assessments of gait, mobility, and fall risk at the time of admission of 54 adults with dementia. Furthermore, it includes the participants' daily medication intake in three medication categories, and frequent assessments of gait performed via a computer vision-based ambient monitoring system.

Download Full-text

Toward the accurate estimation of elliptical side orifice discharge coefficient applying two rigorous kernel-based data-intelligence paradigms

Scientific Reports ◽

10.1038/s41598-021-99166-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Masoud Karbasi ◽

Mehdi Jamei ◽

Iman Ahmadianfar ◽

Amin Asadi

Keyword(s):

Machine Learning ◽

Discharge Coefficient ◽

Laboratory Data ◽

Gaussian Process Regression ◽

Accurate Estimation ◽

Generalized Regression Neural Network ◽

Learning Models ◽

Learning Machine ◽

Rsm Model ◽

Machine Learning Models

AbstractIn the present study, two kernel-based data-intelligence paradigms, namely, Gaussian Process Regression (GPR) and Kernel Extreme Learning Machine (KELM) along with Generalized Regression Neural Network (GRNN) and Response Surface Methodology (RSM), as the validated schemes, employed to precisely estimate the elliptical side orifice discharge coefficient in rectangular channels. A total of 588 laboratory data in various geometric and hydraulic conditions were used to develop the models. The discharge coefficient was considered as a function of five dimensionless hydraulically and geometrical variables. The results showed that the machine learning models used in this study had shown good performance compared to the regression-based relationships. Comparison between machine learning models showed that GPR (RMSE = 0.0081, R = 0.958, MAPE = 1.3242) and KELM (RMSE = 0.0082, R = 0.9564, MAPE = 1.3499) models provide higher accuracy. Base on the RSM model, a new practical equation was developed to predict the discharge coefficient. Also, the sensitivity analysis of the input parameters showed that the main channel width to orifice height ratio (B/b) has the most significant effect on determining the discharge coefficient. The leveraged approach was applied to identify outlier data and applicability domain.

Download Full-text

Transparent Machine Learning models for Rapid Risk Stratification in the Emergency Department: A multi-center evaluation

10.1101/2020.11.25.20238386 ◽

2020 ◽

Author(s):

William P.T.M. van Doorn ◽

Floris Helmich ◽

Paul M.E.L. van Dam ◽

Leo H.J. Jacobs ◽

Patricia M. Stassen ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Risk Stratification ◽

Diagnostic Performance ◽

Medical Center ◽

Laboratory Data ◽

Patient Specific ◽

Learning Models ◽

Laboratory Results ◽

Machine Learning Models

AbstractIntroductionRisk stratification of patients presenting to the emergency department (ED) is important for appropriate triage. Using machine learning technology, we can integrate laboratory data from a modern emergency department and present these in relation to clinically relevant endpoints for risk stratification. In this study, we developed and evaluated transparent machine learning models in four large hospitals in the Netherlands.MethodsHistorical laboratory data (2013-2018) available within the first two hours after presentation to the ED of Maastricht University Medical Centre+ (Maastricht), Meander Medical Center (Amersfoort), and Zuyderland (locations Sittard and Heerlen) were used. We used the first five years of data to develop the model and the sixth year to evaluate model performance in each hospital separately. Performance was assessed using area under the receiver-operating-characteristic curve (AUROC), brier scores and calibration curves. The SHapley Additive exPlanations (SHAP) algorithm was used to obtain transparent machine learning models.ResultsWe included 266,327 patients with more than 7 million laboratory results available for analysis. Models possessed high diagnostic performance with AUROCs of 0.94 [0.94-0.95], 0.98 [0.97-0.98], 0.88 [0.87-0.89] and 0.90 [0.89-0.91] for Maastricht, Amersfoort, Sittard and Heerlen, respectively. Using the SHAP algorithm, we visualized patient characteristics and laboratory results that drive patient-specific RISKINDEX predictions. As an illustrative example, we applied our models in a triage system for risk stratification that categorized 94.7% of the patients as low risk with a corresponding NPV of ≥99%.DiscussionDeveloped machine learning models are transparent with excellent diagnostic performance in predicting 31-day mortality in ED patients across four hospitals. Follow up studies will assess whether implementation of these algorithm can improve clinically relevant endpoints.

Download Full-text

Machine learning is the key to diagnose COVID-19: a proof-of-concept study

Scientific Reports ◽

10.1038/s41598-021-86735-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Cedric Gangloff ◽

Sonia Rafi ◽

Guillaume Bouzillé ◽

Louis Soulat ◽

Marc Cuggia

Keyword(s):

Machine Learning ◽

False Negative ◽

Laboratory Data ◽

Chest Ct ◽

Potential Contribution ◽

Imaging Data ◽

Learning Models ◽

Rt Pcr ◽

Negative Results ◽

Machine Learning Models

AbstractThe reverse transcription-polymerase chain reaction (RT-PCR) assay is the accepted standard for coronavirus disease 2019 (COVID-19) diagnosis. As any test, RT-PCR provides false negative results that can be rectified by clinicians by confronting clinical, biological and imaging data. The combination of RT-PCR and chest-CT could improve diagnosis performance, but this would requires considerable resources for its rapid use in all patients with suspected COVID-19. The potential contribution of machine learning in this situation has not been fully evaluated. The objective of this study was to develop and evaluate machine learning models using routine clinical and laboratory data to improve the performance of RT-PCR and chest-CT for COVID-19 diagnosis among post-emergency hospitalized patients. All adults admitted to the ED for suspected COVID-19, and then hospitalized at Rennes academic hospital, France, between March 20, 2020 and May 5, 2020 were included in the study. Three model types were created: logistic regression, random forest, and neural network. Each model was trained to diagnose COVID-19 using different sets of variables. Area under the receiving operator characteristics curve (AUC) was the primary outcome to evaluate model’s performances. 536 patients were included in the study: 106 in the COVID group, 430 in the NOT-COVID group. The AUC values of chest-CT and RT-PCR increased from 0.778 to 0.892 and from 0.852 to 0.930, respectively, with the contribution of machine learning. After generalization, machine learning models will allow increasing chest-CT and RT-PCR performances for COVID-19 diagnosis.

Download Full-text

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0918-5 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 12

Author(s):

An Dinh ◽

Stacey Miertschin ◽

Amber Young ◽

Somya D. Mohanty

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

At Risk ◽

Laboratory Data ◽

Ensemble Model ◽

Learning Models ◽

Laboratory Results ◽

Risk Patients ◽

Machine Learning Models ◽

Roc Score

Abstract Background Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. We evaluate the capabilities of machine learning models in detecting at-risk patients using survey data (and laboratory results), and identify key variables within the data contributing to these diseases among the patients. Methods Our research explores data-driven approaches which utilize supervised machine learning models to identify patients with such diseases. Using the National Health and Nutrition Examination Survey (NHANES) dataset, we conduct an exhaustive search of all available feature variables within the data to develop models for cardiovascular, prediabetes, and diabetes detection. Using different time-frames and feature sets for the data (based on laboratory data), multiple machine learning models (logistic regression, support vector machines, random forest, and gradient boosting) were evaluated on their classification performance. The models were then combined to develop a weighted ensemble model, capable of leveraging the performance of the disparate models to improve detection accuracy. Information gain of tree-based models was used to identify the key variables within the patient data that contributed to the detection of at-risk patients in each of the diseases classes by the data-learned models. Results The developed ensemble model for cardiovascular disease (based on 131 variables) achieved an Area Under - Receiver Operating Characteristics (AU-ROC) score of 83.1% using no laboratory results, and 83.9% accuracy with laboratory results. In diabetes classification (based on 123 variables), eXtreme Gradient Boost (XGBoost) model achieved an AU-ROC score of 86.2% (without laboratory data) and 95.7% (with laboratory data). For pre-diabetic patients, the ensemble model had the top AU-ROC score of 73.7% (without laboratory data), and for laboratory based data XGBoost performed the best at 84.4%. Top five predictors in diabetes patients were 1) waist size, 2) age, 3) self-reported weight, 4) leg length, and 5) sodium intake. For cardiovascular diseases the models identified 1) age, 2) systolic blood pressure, 3) self-reported weight, 4) occurrence of chest pain, and 5) diastolic blood pressure as key contributors. Conclusion We conclude machine learned models based on survey questionnaire can provide an automated identification mechanism for patients at risk of diabetes and cardiovascular diseases. We also identify key contributors to the prediction, which can be further explored for their implications on electronic health records.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text