COVID-19 diagnosis by routine blood tests using machine learning

AbstractPhysicians taking care of patients with COVID-19 have described different changes in routine blood parameters. However, these changes hinder them from performing COVID-19 diagnoses. We constructed a machine learning model for COVID-19 diagnosis that was based and cross-validated on the routine blood tests of 5333 patients with various bacterial and viral infections, and 160 COVID-19-positive patients. We selected the operational ROC point at a sensitivity of 81.9% and a specificity of 97.9%. The cross-validated AUC was 0.97. The five most useful routine blood parameters for COVID-19 diagnosis according to the feature importance scoring of the XGBoost algorithm were: MCHC, eosinophil count, albumin, INR, and prothrombin activity percentage. t-SNE visualization showed that the blood parameters of the patients with a severe COVID-19 course are more like the parameters of a bacterial than a viral infection. The reported diagnostic accuracy is at least comparable and probably complementary to RT-PCR and chest CT studies. Patients with fever, cough, myalgia, and other symptoms can now have initial routine blood tests assessed by our diagnostic tool. All patients with a positive COVID-19 prediction would then undergo standard RT-PCR studies to confirm the diagnosis. We believe that our results represent a significant contribution to improvements in COVID-19 diagnosis.

Download Full-text

Machine Learning Prediction of SARS-CoV-2 Polymerase Chain Reaction Results with Routine Blood Tests

Laboratory Medicine ◽

10.1093/labmed/lmaa111 ◽

2020 ◽

Author(s):

Thomas Tschoellitsch ◽

Martin Dünser ◽

Carl Böck ◽

Karin Schwarzbauer ◽

Jens Meier

Keyword(s):

Machine Learning ◽

Polymerase Chain Reaction ◽

Characteristic Curve ◽

Cohort Analysis ◽

Rt Pcr ◽

Chain Reaction ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Polymerase Chain

Abstract Objective The diagnosis of COVID-19 is based on the detection of SARS-CoV-2 in respiratory secretions, blood, or stool. Currently, reverse transcription polymerase chain reaction (RT-PCR) is the most commonly used method to test for SARS-CoV-2. Methods In this retrospective cohort analysis, we evaluated whether machine learning could exclude SARS-CoV-2 infection using routinely available laboratory values. A Random Forests algorithm with 1353 unique features was trained to predict the RT-PCR results. Results Out of 12,848 patients undergoing SARS-CoV-2 testing, routine blood tests were simultaneously performed in 1528 patients. The machine learning model could predict SARS-CoV-2 test results with an accuracy of 86% and an area under the receiver operating characteristic curve of 0.90. Conclusion Machine learning methods can reliably predict a negative SARS-CoV-2 RT-PCR test result using standard blood tests.

Download Full-text

Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study (Preprint)

10.2196/preprints.24048 ◽

2020 ◽

Author(s):

Timothy B Plante ◽

Aaron M Blau ◽

Adrian N Berg ◽

Aaron S Weinberg ◽

Ik C Jun ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Clinical Data ◽

External Validation ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Laboratory Results ◽

Negative Controls ◽

Rule Out

BACKGROUND Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. OBJECTIVE We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. METHODS Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). RESULTS Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9% and specificity of 41.7%; with a cutoff of 2.0, sensitivity was 92.6% and specificity was 59.9%. At the cutoff of 2.0, the NPVs at a prevalence of 1%, 10%, and 20% were 99.9%, 98.6%, and 97%, respectively. CONCLUSIONS A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing.

Download Full-text

Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph

Scientific Reports ◽

10.1038/s41598-021-93719-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Richard Du ◽

Efstratios D. Tsougenis ◽

Joshua W. K. Ho ◽

Joyce K. Y. Chan ◽

Keith W. H. Chiu ◽

...

Keyword(s):

Machine Learning ◽

Hong Kong ◽

High Accuracy ◽

Rt Pcr ◽

Blood Tests ◽

Applied Machine Learning ◽

Laboratory Markers ◽

Machine Learning Model ◽

Temporal Validation ◽

Low Sensitivity

AbstractTriaging and prioritising patients for RT-PCR test had been essential in the management of COVID-19 in resource-scarce countries. In this study, we applied machine learning (ML) to the task of detection of SARS-CoV-2 infection using basic laboratory markers. We performed the statistical analysis and trained an ML model on a retrospective cohort of 5148 patients from 24 hospitals in Hong Kong to classify COVID-19 and other aetiology of pneumonia. We validated the model on three temporal validation sets from different waves of infection in Hong Kong. For predicting SARS-CoV-2 infection, the ML model achieved high AUCs and specificity but low sensitivity in all three validation sets (AUC: 89.9–95.8%; Sensitivity: 55.5–77.8%; Specificity: 91.5–98.3%). When used in adjunction with radiologist interpretations of chest radiographs, the sensitivity was over 90% while keeping moderate specificity. Our study showed that machine learning model based on readily available laboratory markers could achieve high accuracy in predicting SARS-CoV-2 infection.

Download Full-text

Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study

Journal of Medical Internet Research ◽

10.2196/24048 ◽

2020 ◽

Vol 22 (12) ◽

pp. e24048

Author(s):

Timothy B Plante ◽

Aaron M Blau ◽

Adrian N Berg ◽

Aaron S Weinberg ◽

Ik C Jun ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Clinical Data ◽

External Validation ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Laboratory Results ◽

Negative Controls ◽

Rule Out

Background Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. Objective We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. Methods Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). Results Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9% and specificity of 41.7%; with a cutoff of 2.0, sensitivity was 92.6% and specificity was 59.9%. At the cutoff of 2.0, the NPVs at a prevalence of 1%, 10%, and 20% were 99.9%, 98.6%, and 97%, respectively. Conclusions A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing.

Download Full-text

ANALYZING CHEST X-RAY TO DIFFERENTIATE SYMPTOMS OF COVID-19 THROUGH ML APPROACH

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v05i12.042 ◽

2021 ◽

Vol 5 (12) ◽

Author(s):

Deepali R Deshpande ◽

Raj L Shah ◽

Anish N Shaha

Keyword(s):

Machine Learning ◽

Learning Model ◽

Time Constraints ◽

X Rays ◽

Rt Pcr ◽

X Ray ◽

Social Distancing ◽

Machine Learning Model ◽

Chest X Ray

The motive behind the project is to build a machine learning model for detection of Covid-19. Using this model, it is possible to classify images of chest x-rays into normal patients, pneumatic patients, and covid-19 positive patients. This CNN based model will help drastically to save time constraints among the patients. Instead of relying on limited RT-PCR kits, just a simple chest x-ray can help us determine health of the patient. Not only we get immediate results, but we can also practice social distancing norms more effectively.

Download Full-text

Diagnosing brain tumours by routine blood tests using machine learning

Scientific Reports ◽

10.1038/s41598-019-51147-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Simon Podnar ◽

Matjaž Kukar ◽

Gregor Gunčar ◽

Mateja Notar ◽

Nina Gošnjak ◽

...

Keyword(s):

Machine Learning ◽

Brain Tumour ◽

Brain Tumours ◽

Blood Test ◽

Neurological Diseases ◽

Blood Tests ◽

Routine Blood ◽

Routine Blood Test ◽

Tumour Diagnosis ◽

Neurological Patients

Abstract Routine blood test results are assumed to contain much more information than is usually recognised even by the most experienced clinicians. Using routine blood tests from 15,176 neurological patients we built a machine learning predictive model for the diagnosis of brain tumours. We validated the model by retrospective analysis of 68 consecutive brain tumour and 215 control patients presenting to the neurological emergency service. Only patients with head imaging and routine blood test data were included in the validation sample. The sensitivity and specificity of the adapted tumour model in the validation group were 96% and 74%, respectively. Our data demonstrate the feasibility of brain tumour diagnosis from routine blood tests using machine learning. The reported diagnostic accuracy is comparable and possibly complementary to that of imaging studies. The presented machine learning approach opens a completely new avenue in the diagnosis of these grave neurological diseases and demonstrates the utility of valuable information obtained from routine blood tests.

Download Full-text

Data Driven Workflow to Optimize Eagle Ford Unconventional Asset Development Plan Based on Multidisciplinary Data

10.2118/206276-ms ◽

2021 ◽

Author(s):

Tarik Abdelfattah ◽

Ehsaan Nasir ◽

Junjie Yang ◽

Jamar Bynum ◽

Alexander Klebanov ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Subject Matter ◽

Data Driven ◽

Well Performance ◽

Eagle Ford ◽

Field Development ◽

Well Design ◽

Machine Learning Model ◽

Feature Importance

Abstract Unconventional reservoir development is a multidisciplinary challenge due to complicated physical system, including but not limited to complicated flow mechanism, multiple porosity system, heterogeneous subsurface rock and minerals, well interference, and fluid-rock interaction. With enough well data, physics-based models can be supplemented with data driven methods to describe a reservoir system and accurately predict well performance. This study uses a data driven approach to tackle the field development problem in the Eagle Ford Shale. A large amount of data spanning major oil and gas disciplines was collected and interrogated from around 300 wells in the area of interest. The data driven workflow consists of: Descriptive model to regress on existing wells with the selected well features and provide insight on feature importance, Predictive model to forecast well performance, and Subject matter expert driven prescriptive model to optimize future well design for well economics improvement. To evaluate initial well economics, 365 consecutive days of production oil per CAPEX dollar spent (bbl/$) was setup as the objective function. After a careful model selection, Random Forest (RF) shows the best accuracy with the given dataset, and Differential Evolution (DE) was used for optimization. Using recursive feature elimination (RFE), the final master dataset was reduced to 50 parameters to feed into the machine learning model. After hyperparameter tuning, reasonable regression accuracy was achieved by the Random Forest algorithm, where correlation coefficient (R2) for the training and test dataset was 0.83, and mean absolute error percentage (MAEP) was less than 20%. The model also reveals that the well performance is highly dependent on a good combination of variables spanning geology, drilling, completions, production and reservoir. Completion year has one of the highest feature importance, indicating the improvement of operation and design efficiency and the fluctuation of service cost. Moreover, lateral rate of penetration (ROP) was always amongst the top two important parameters most likely because it impacts the drilling cost significantly. With subject matter experts’ (SME) input, optimization using the regression model was performed in an iterative manner with the chosen parameters and using reasonable upper and lower bounds. Compared to the best existing wells in the vicinity, the optimized well design shows a potential improvement on bbl/$ by approximately 38%. This paper introduces an integrated data driven solution to optimize unconventional development strategy. Comparing to conventional analytical and numerical methods, machine learning model is able to handle large multidimensional dataset and provide actionable recommendations with a much faster turnaround. In the course of field development, the model accuracy can be dynamically improved by including more data collected from new wells.

Download Full-text

Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: a Feasibility Study

10.1101/2020.04.22.20075143 ◽

2020 ◽

Cited By ~ 2

Author(s):

Davide Brinati ◽

Andrea Campagner ◽

Davide Ferrari ◽

Massimo Locatelli ◽

Giuseppe Banfi ◽

...

Keyword(s):

Machine Learning ◽

Gold Standard ◽

False Negative ◽

White Blood Cells ◽

Standard Test ◽

Tree Model ◽

Web Based ◽

Machine Learning Classification ◽

Blood Tests ◽

Routine Blood

AbstractBackgroundThe COVID-19 pandemia due to the SARS-CoV-2 coronavirus, in its first 4 months since its outbreak, has to date reached more than 200 countries worldwide with more than 2 million confirmed cases (probably a much higher number of infected), and almost 200,000 deaths. Amplification of viral RNA by (real time) reverse transcription polymerase chain reaction (rRT-PCR) is the current gold standard test for confirmation of infection, although it presents known shortcomings: long turnaround times (3-4 hours to generate results), potential shortage of reagents, false-negative rates as large as 15-20%, the need for certified laboratories, expensive equipment and trained personnel. Thus there is a need for alternative, faster, less expensive and more accessible tests.Material and methodsWe developed two machine learning classification models using hematochemical values from routine blood exams (namely: white blood cells counts, and the platelets, CRP, AST, ALT, GGT, ALP, LDH plasma levels) drawn from 279 patients who, after being admitted to the San Raffaele Hospital (Milan, Italy) emergency-room with COVID-19 symptoms, were screened with the rRT-PCR test performed on respiratory tract specimens. Of these patients, 177 resulted positive, whereas 102 received a negative response.ResultsWe have developed two machine learning models, to discriminate between patients who are either positive or negative to the SARS-CoV-2: their accuracy ranges between 82% and 86%, and sensitivity between 92% e 95%, so comparably well with respect to the gold standard. We also developed an interpretable Decision Tree model as a simple decision aid for clinician interpreting blood tests (even off-line) for COVID-19 suspect cases.DiscussionThis study demonstrated the feasibility and clinical soundness of using blood tests analysis and machine learning as an alternative to rRT-PCR for identifying COVID-19 positive patients. This is especially useful in those countries, like developing ones, suffering from shortages of rRT-PCR reagents and specialized laboratories. We made available a Web-based tool for clinical reference and evaluation1.

Download Full-text

Prediction and Chemical Interpretation of Singlet-Oxygen-Scavenging Activity of Small Molecule Compounds by Using Machine Learning

Antioxidants ◽

10.3390/antiox10111751 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1751

Author(s):

Taiki Fujimoto ◽

Hiroaki Gotoh

Keyword(s):

Machine Learning ◽

Singlet Oxygen ◽

Molecular Descriptors ◽

High Accuracy ◽

Model Ensemble ◽

Explanatory Variables ◽

Oxygen Scavenging ◽

Machine Learning Model ◽

Feature Importance ◽

Chemical Knowledge

A chemically explainable machine learning model was constructed with a small dataset to quantitatively predict the singlet-oxygen-scavenging ability. In this model, ensemble learning based on decision trees resulted in high accuracy. For explanatory variables, molecular descriptors by computational chemistry and Morgan fingerprints were used for achieving high accuracy and simple prediction. The singlet-oxygen-scavenging mechanism was explained by the feature importance obtained from machine learning outputs. The results are consistent with conventional chemical knowledge. The use of machine learning and reduction in the number of measurements for screening high-antioxidant-capacity compounds can considerably improve prediction accuracy and efficiency.

Download Full-text

Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests

10.1101/2020.10.02.20205070 ◽

2020 ◽

Author(s):

Cabitza Federico ◽

Campagner Andrea ◽

Ferrari Davide ◽

Di Resta Chiara ◽

Ceriotti Daniele ◽

...

Keyword(s):

Machine Learning ◽

Complete Blood Count ◽

Characteristic Curve ◽

External Validation ◽

False Negative ◽

Turnaround Time ◽

Training Data ◽

Data Set ◽

Blood Tests ◽

Routine Blood

AbstractBackgroundThe rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19), presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative rates around 15–20%, and expensive equipment. The hematochemical values of routine blood exams could represent a faster and less expensive alternative.MethodsThree different training data set of hematochemical values from 1,624 patients (52% COVID-19 positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical, coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub-datasets (COVID-specific and CBC dataset, 32 and 21 features respectively). 58 cases (50% COVID-19 positive) from another hospital, and 54 negative patients collected in 2018 at OSR, were used for internal-external and external validation.ResultsWe developed five ML models: for the complete OSR dataset, the area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.83 to 0.90; for the COVID-specific dataset from 0.83 to 0.87; and for the CBC dataset from 0.74 to 0.86. The validations also achieved good results: respectively, AUC from 0.75 to 0.78; and specificity from 0.92 to 0.96.ConclusionsML can be applied to blood tests as both an adjunct and alternative method to rRT-PCR for the fast and cost-effective identification of COVID-19-positive patients. This is especially useful in developing countries, or in countries facing an increase in contagions.

Download Full-text