Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests

Federico Cabitza; Andrea Campagner; Davide Ferrari; Chiara Di Resta; Daniele Ceriotti; Eleonora Sabetta; Alessandra Colombini; Elena De Vecchi; Giuseppe Banfi; Massimo Locatelli; Anna Carobene

doi:10.1515/cclm-2020-1294

Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2020-1294 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Federico Cabitza ◽

Andrea Campagner ◽

Davide Ferrari ◽

Chiara Di Resta ◽

Daniele Ceriotti ◽

...

Keyword(s):

Machine Learning ◽

Complete Blood Count ◽

Characteristic Curve ◽

External Validation ◽

False Negative ◽

Turnaround Time ◽

Training Data ◽

Data Set ◽

Blood Tests ◽

Routine Blood

AbstractObjectivesThe rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19), presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative rates around 15–20%, and expensive equipment. The hematochemical values of routine blood exams could represent a faster and less expensive alternative.MethodsThree different training data set of hematochemical values from 1,624 patients (52% COVID-19 positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical, coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub-datasets (COVID-specific and CBC dataset, 32 and 21 features respectively). 58 cases (50% COVID-19 positive) from another hospital, and 54 negative patients collected in 2018 at OSR, were used for internal-external and external validation.ResultsWe developed five ML models: for the complete OSR dataset, the area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.83 to 0.90; for the COVID-specific dataset from 0.83 to 0.87; and for the CBC dataset from 0.74 to 0.86. The validations also achieved good results: respectively, AUC from 0.75 to 0.78; and specificity from 0.92 to 0.96.ConclusionsML can be applied to blood tests as both an adjunct and alternative method to rRT-PCR for the fast and cost-effective identification of COVID-19-positive patients. This is especially useful in developing countries, or in countries facing an increase in contagions.

Download Full-text

Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests

10.1101/2020.10.02.20205070 ◽

2020 ◽

Author(s):

Cabitza Federico ◽

Campagner Andrea ◽

Ferrari Davide ◽

Di Resta Chiara ◽

Ceriotti Daniele ◽

...

Keyword(s):

Machine Learning ◽

Complete Blood Count ◽

Characteristic Curve ◽

External Validation ◽

False Negative ◽

Turnaround Time ◽

Training Data ◽

Data Set ◽

Blood Tests ◽

Routine Blood

AbstractBackgroundThe rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19), presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative rates around 15–20%, and expensive equipment. The hematochemical values of routine blood exams could represent a faster and less expensive alternative.MethodsThree different training data set of hematochemical values from 1,624 patients (52% COVID-19 positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical, coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub-datasets (COVID-specific and CBC dataset, 32 and 21 features respectively). 58 cases (50% COVID-19 positive) from another hospital, and 54 negative patients collected in 2018 at OSR, were used for internal-external and external validation.ResultsWe developed five ML models: for the complete OSR dataset, the area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.83 to 0.90; for the COVID-specific dataset from 0.83 to 0.87; and for the CBC dataset from 0.74 to 0.86. The validations also achieved good results: respectively, AUC from 0.75 to 0.78; and specificity from 0.92 to 0.96.ConclusionsML can be applied to blood tests as both an adjunct and alternative method to rRT-PCR for the fast and cost-effective identification of COVID-19-positive patients. This is especially useful in developing countries, or in countries facing an increase in contagions.

Download Full-text

Machine Learning Prediction of SARS-CoV-2 Polymerase Chain Reaction Results with Routine Blood Tests

Laboratory Medicine ◽

10.1093/labmed/lmaa111 ◽

2020 ◽

Author(s):

Thomas Tschoellitsch ◽

Martin Dünser ◽

Carl Böck ◽

Karin Schwarzbauer ◽

Jens Meier

Keyword(s):

Machine Learning ◽

Polymerase Chain Reaction ◽

Characteristic Curve ◽

Cohort Analysis ◽

Rt Pcr ◽

Chain Reaction ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Polymerase Chain

Abstract Objective The diagnosis of COVID-19 is based on the detection of SARS-CoV-2 in respiratory secretions, blood, or stool. Currently, reverse transcription polymerase chain reaction (RT-PCR) is the most commonly used method to test for SARS-CoV-2. Methods In this retrospective cohort analysis, we evaluated whether machine learning could exclude SARS-CoV-2 infection using routinely available laboratory values. A Random Forests algorithm with 1353 unique features was trained to predict the RT-PCR results. Results Out of 12,848 patients undergoing SARS-CoV-2 testing, routine blood tests were simultaneously performed in 1528 patients. The machine learning model could predict SARS-CoV-2 test results with an accuracy of 86% and an area under the receiver operating characteristic curve of 0.90. Conclusion Machine learning methods can reliably predict a negative SARS-CoV-2 RT-PCR test result using standard blood tests.

Download Full-text

One- year mortality in patients with advanced hepatocellular carcinoma on immunotherapy: Prediction using machine learning models (Preprint)

10.2196/preprints.32281 ◽

2021 ◽

Author(s):

Thomas Ka-Luen Lui ◽

Ka Shing, Michael Cheung ◽

Wai Keung Leung

Keyword(s):

Machine Learning ◽

Hepatocellular Carcinoma ◽

Characteristic Curve ◽

False Negative ◽

False Negative Rate ◽

Absolute Error ◽

Advanced Hepatocellular Carcinoma ◽

Data Set ◽

One Year ◽

Related Mortality

BACKGROUND Immunotherapy is a new promising treatment for patients with advanced hepatocellular carcinoma (HCC), but is costly and potentially associated with considerable side effects. OBJECTIVE This study aimed to evaluate the role of machine learning (ML) models in predicting the one-year cancer-related mortality in advanced HCC patients treated with immunotherapy METHODS 395 HCC patients who had received immunotherapy (including nivolumab, pembrolizumab or ipilimumab) in 2014 - 2019 in Hong Kong were included. The whole data set were randomly divided into training (n=316) and validation (n=79) set. The data set, including 45 clinical variables, was used to construct six different ML models in predicting the risk of one-year mortality. The performances of ML models were measured by the area under receiver operating characteristic curve (AUC) and the mean absolute error (MAE) using calibration analysis. RESULTS The overall one-year cancer-related mortality was 51.1%. Of the six ML models, the random forest (RF) has the highest AUC of 0.93 (95%CI: 0.86-0.98), which was better than logistic regression (0.82, p=0.01) and XGBoost (0.86, p=0.04). RF also had the lowest false positive (6.7%) and false negative rate (2.8%). High baseline AFP, bilirubin and alkaline phosphatase were three common risk factors identified by all ML models. CONCLUSIONS ML models could predict one-year cancer-related mortality of HCC patients treated with immunotherapy, which may help to select patients who would most benefit from this new treatment option.

Download Full-text

Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study (Preprint)

10.2196/preprints.24048 ◽

2020 ◽

Author(s):

Timothy B Plante ◽

Aaron M Blau ◽

Adrian N Berg ◽

Aaron S Weinberg ◽

Ik C Jun ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Clinical Data ◽

External Validation ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Laboratory Results ◽

Negative Controls ◽

Rule Out

BACKGROUND Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. OBJECTIVE We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. METHODS Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). RESULTS Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9% and specificity of 41.7%; with a cutoff of 2.0, sensitivity was 92.6% and specificity was 59.9%. At the cutoff of 2.0, the NPVs at a prevalence of 1%, 10%, and 20% were 99.9%, 98.6%, and 97%, respectively. CONCLUSIONS A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing.

Download Full-text

Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: a Feasibility Study

10.1101/2020.04.22.20075143 ◽

2020 ◽

Cited By ~ 2

Author(s):

Davide Brinati ◽

Andrea Campagner ◽

Davide Ferrari ◽

Massimo Locatelli ◽

Giuseppe Banfi ◽

...

Keyword(s):

Machine Learning ◽

Gold Standard ◽

False Negative ◽

White Blood Cells ◽

Standard Test ◽

Tree Model ◽

Web Based ◽

Machine Learning Classification ◽

Blood Tests ◽

Routine Blood

AbstractBackgroundThe COVID-19 pandemia due to the SARS-CoV-2 coronavirus, in its first 4 months since its outbreak, has to date reached more than 200 countries worldwide with more than 2 million confirmed cases (probably a much higher number of infected), and almost 200,000 deaths. Amplification of viral RNA by (real time) reverse transcription polymerase chain reaction (rRT-PCR) is the current gold standard test for confirmation of infection, although it presents known shortcomings: long turnaround times (3-4 hours to generate results), potential shortage of reagents, false-negative rates as large as 15-20%, the need for certified laboratories, expensive equipment and trained personnel. Thus there is a need for alternative, faster, less expensive and more accessible tests.Material and methodsWe developed two machine learning classification models using hematochemical values from routine blood exams (namely: white blood cells counts, and the platelets, CRP, AST, ALT, GGT, ALP, LDH plasma levels) drawn from 279 patients who, after being admitted to the San Raffaele Hospital (Milan, Italy) emergency-room with COVID-19 symptoms, were screened with the rRT-PCR test performed on respiratory tract specimens. Of these patients, 177 resulted positive, whereas 102 received a negative response.ResultsWe have developed two machine learning models, to discriminate between patients who are either positive or negative to the SARS-CoV-2: their accuracy ranges between 82% and 86%, and sensitivity between 92% e 95%, so comparably well with respect to the gold standard. We also developed an interpretable Decision Tree model as a simple decision aid for clinician interpreting blood tests (even off-line) for COVID-19 suspect cases.DiscussionThis study demonstrated the feasibility and clinical soundness of using blood tests analysis and machine learning as an alternative to rRT-PCR for identifying COVID-19 positive patients. This is especially useful in those countries, like developing ones, suffering from shortages of rRT-PCR reagents and specialized laboratories. We made available a Web-based tool for clinical reference and evaluation1.

Download Full-text

Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study

Journal of Medical Internet Research ◽

10.2196/24048 ◽

2020 ◽

Vol 22 (12) ◽

pp. e24048

Author(s):

Timothy B Plante ◽

Aaron M Blau ◽

Adrian N Berg ◽

Aaron S Weinberg ◽

Ik C Jun ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Clinical Data ◽

External Validation ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Laboratory Results ◽

Negative Controls ◽

Rule Out

Background Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. Objective We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. Methods Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). Results Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9% and specificity of 41.7%; with a cutoff of 2.0, sensitivity was 92.6% and specificity was 59.9%. At the cutoff of 2.0, the NPVs at a prevalence of 1%, 10%, and 20% were 99.9%, 98.6%, and 97%, respectively. Conclusions A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text

Routine blood tests as a potential diagnostic tool for COVID-19

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2020-0398 ◽

2020 ◽

Vol 58 (7) ◽

pp. 1095-1099 ◽

Cited By ~ 34

Author(s):

Davide Ferrari ◽

Andrea Motta ◽

Marta Strollo ◽

Giuseppe Banfi ◽

Massimo Locatelli

Keyword(s):

Blood Test ◽

Hematological Parameters ◽

False Negative ◽

White Blood Cells ◽

Turnaround Time ◽

Test Analysis ◽

Infected People ◽

Reactive Protein ◽

Routine Blood ◽

Glutamyl Transpeptidase

AbstractObjectivesThe outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) to date, the epidemic has gradually spread to 209 countries worldwide with more than 1.5 million infected people and 100,000 deaths. Amplification of viral RNA by rRT-PCR serves as the gold standard for confirmation of infection, yet it needs a long turnaround time (3–4 h to generate results) and shows false-negative rates as large as 15%–20%. In addition, the need of certified laboratories, expensive equipment and trained personnel led many countries to limit the rRT-PCR tests only to individuals with pronounced respiratory syndrome symptoms. Thus, there is a need for alternative, less expensive and more accessible tests.MethodsWe analyzed the plasma levels of white blood cells (WBCs), platelets, C-reactive protein (CRP), aspartate aminotransferase (AST), alanine aminotransferase (ALT), γ-glutamyl transpeptidase (GGT), alkaline phosphatase and lactate dehydrogenase (LDH) of 207 patients who, after being admitted to the emergency room of the San Raffaele Hospital (Milan, Italy) with COVID-19 symptoms, were rRT-PCR tested. Of them, 105 tested positive, whereas 102 tested negative.ResultsStatistically significant differences were observed for WBC, CRP, AST, ALT and LDH. Empirical thresholds for AST and LDH allowed the identification of 70% of either COVID-19-positive or -negative patients on the basis of routine blood test results.ConclusionsCombining appropriate cutoffs for certain hematological parameters could help in identifying false-positive/negative rRT-PCR tests. Blood test analysis might be used as an alternative to rRT-PCR for identifying COVID-19-positive patients in those countries which suffer from a large shortage of rRT-PCR reagents and/or specialized laboratory.

Download Full-text

Nowcasting heavy precipitation over the Netherlands using a 13-year radar archive: a machine learning approach

10.5194/egusphere-egu21-12814 ◽

2021 ◽

Author(s):

Eva van der Kooij ◽

Marc Schleiss ◽

Riccardo Taormina ◽

Francesco Fioranelli ◽

Dorien Lugt ◽

...

Keyword(s):

Machine Learning ◽

The Netherlands ◽

Heavy Rainfall ◽

Predictive Performance ◽

Heavy Precipitation ◽

Early Warning Systems ◽

Training Data ◽

Short Term ◽

Data Set ◽

Radar Images

Accurate short-term forecasts, also known as nowcasts, of heavy precipitation are desirable for creating early warning systems for extreme weather and its consequences, e.g. urban flooding. In this research, we explore the use of machine learning for short-term prediction of heavy rainfall showers in the Netherlands.We assess the performance of a recurrent, convolutional neural network (TrajGRU) with lead times of 0 to 2 hours. The network is trained on a 13-year archive of radar images with 5-min temporal and 1-km spatial resolution from the precipitation radars of the Royal Netherlands Meteorological Institute (KNMI). We aim to train the model to predict the formation and dissipation of dynamic, heavy, localized rain events, a task for which traditional Lagrangian nowcasting methods still come up short.We report on different ways to optimize predictive performance for heavy rainfall intensities through several experiments. The large dataset available provides many possible configurations for training. To focus on heavy rainfall intensities, we use different subsets of this dataset through using different conditions for event selection and varying the ratio of light and heavy precipitation events present in the training data set and change the loss function used to train the model.To assess the performance of the model, we compare our method to current state-of-the-art Lagrangian nowcasting system from the pySTEPS library, like S-PROG, a deterministic approximation of an ensemble mean forecast. The results of the experiments are used to discuss the pros and cons of machine-learning based methods for precipitation nowcasting and possible ways to further increase performance.

Download Full-text