scholarly journals Computing SARS-CoV-2 Infection Risk From Symptoms, Imaging, and Test Data: Diagnostic Model Development (Preprint)

2020 ◽  
Author(s):  
Christopher D'Ambrosia ◽  
Henrik Christensen ◽  
Eliah Aronoff-Spencer

BACKGROUND Assigning meaningful probabilities of SARS-CoV-2 infection risk presents a diagnostic challenge across the continuum of care. OBJECTIVE The aim of this study was to develop and clinically validate an adaptable, personalized diagnostic model to assist clinicians in ruling in and ruling out COVID-19 in potential patients. We compared the diagnostic performance of probabilistic, graphical, and machine learning models against a previously published benchmark model. METHODS We integrated patient symptoms and test data using machine learning and Bayesian inference to quantify individual patient risk of SARS-CoV-2 infection. We trained models with 100,000 simulated patient profiles based on 13 symptoms and estimated local prevalence, imaging, and molecular diagnostic performance from published reports. We tested these models with consecutive patients who presented with a COVID-19–compatible illness at the University of California San Diego Medical Center over the course of 14 days starting in March 2020. RESULTS We included 55 consecutive patients with fever (n=43, 78%) or cough (n=42, 77%) presenting for ambulatory (n=11, 20%) or hospital care (n=44, 80%). In total, 51% (n=28) were female and 49% (n=27) were aged <60 years. Common comorbidities included diabetes (n=12, 22%), hypertension (n=15, 27%), cancer (n=9, 16%), and cardiovascular disease (n=7, 13%). Of these, 69% (n=38) were confirmed via reverse transcription-polymerase chain reaction (RT-PCR) to be positive for SARS-CoV-2 infection, and 20% (n=11) had repeated negative nucleic acid testing and an alternate diagnosis. Bayesian inference network, distance metric learning, and ensemble models discriminated between patients with SARS-CoV-2 infection and alternate diagnoses with sensitivities of 81.6%-84.2%, specificities of 58.8%-70.6%, and accuracies of 61.4%-71.8%. After integrating imaging and laboratory test statistics with the predictions of the Bayesian inference network, changes in diagnostic uncertainty at each step in the simulated clinical evaluation process were highly sensitive to location, symptom, and diagnostic test choices. CONCLUSIONS Decision support models that incorporate symptoms and available test results can help providers diagnose SARS-CoV-2 infection in real-world settings.

10.2196/24478 ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. e24478
Author(s):  
Christopher D'Ambrosia ◽  
Henrik Christensen ◽  
Eliah Aronoff-Spencer

Background Assigning meaningful probabilities of SARS-CoV-2 infection risk presents a diagnostic challenge across the continuum of care. Objective The aim of this study was to develop and clinically validate an adaptable, personalized diagnostic model to assist clinicians in ruling in and ruling out COVID-19 in potential patients. We compared the diagnostic performance of probabilistic, graphical, and machine learning models against a previously published benchmark model. Methods We integrated patient symptoms and test data using machine learning and Bayesian inference to quantify individual patient risk of SARS-CoV-2 infection. We trained models with 100,000 simulated patient profiles based on 13 symptoms and estimated local prevalence, imaging, and molecular diagnostic performance from published reports. We tested these models with consecutive patients who presented with a COVID-19–compatible illness at the University of California San Diego Medical Center over the course of 14 days starting in March 2020. Results We included 55 consecutive patients with fever (n=43, 78%) or cough (n=42, 77%) presenting for ambulatory (n=11, 20%) or hospital care (n=44, 80%). In total, 51% (n=28) were female and 49% (n=27) were aged <60 years. Common comorbidities included diabetes (n=12, 22%), hypertension (n=15, 27%), cancer (n=9, 16%), and cardiovascular disease (n=7, 13%). Of these, 69% (n=38) were confirmed via reverse transcription-polymerase chain reaction (RT-PCR) to be positive for SARS-CoV-2 infection, and 20% (n=11) had repeated negative nucleic acid testing and an alternate diagnosis. Bayesian inference network, distance metric learning, and ensemble models discriminated between patients with SARS-CoV-2 infection and alternate diagnoses with sensitivities of 81.6%-84.2%, specificities of 58.8%-70.6%, and accuracies of 61.4%-71.8%. After integrating imaging and laboratory test statistics with the predictions of the Bayesian inference network, changes in diagnostic uncertainty at each step in the simulated clinical evaluation process were highly sensitive to location, symptom, and diagnostic test choices. Conclusions Decision support models that incorporate symptoms and available test results can help providers diagnose SARS-CoV-2 infection in real-world settings.


2020 ◽  
Author(s):  
Chistopher D'Ambrosia ◽  
Henrik Christensen ◽  
Eliah Aronoff-Spencer

Background: Assigning meaningful probabilities of SARS CoV2 infection risk presents a diagnostic challenge across the continuum of care. Methods: We integrated patient symptom and test data using machine learning and Bayesian inference to quantify individual patient risk of SARS CoV 2 infection. We trained models with 100,000 simulated patient profiles based on thirteen symptoms, estimated local prevalence, imaging, and molecular diagnostic performance from published reports. We tested these models with consecutive patients who presented with a COVID 19 compatible illness at the University of California San Diego Medical Center over 14 days starting in March 2020. Results: We included 55 consecutive patients with fever (78%) or cough (77%) presenting for ambulatory (n=11) or hospital care (n=44). 51% (n=28) were female, 49% were age <60. Common comorbidities included diabetes (22%), hypertension (27%), cancer (16%) and cardiovascular disease (13%). 69% of these (n=38) were RT-PCR confirmed positive for SARS CoV2 infection, 11 had repeated negative nucleic acid testing and an alternate diagnosis. Bayesian inference network, distance metric learning, and ensemble models discriminated between patients with SARS CoV2 infection and alternate diagnoses with sensitivities of 81.6 to 84.2%, specificities of 58.8 to 70.6%, and accuracies of 61.4 to 71.8%. After integrating imaging and laboratory test statistics with the predictions of the Bayesian inference network, changes in diagnostic uncertainty at each step in the simulated clinical evaluation process were highly sensitive to location, symptom, and diagnostic test choices. Conclusions: Decision support models that incorporate symptoms and available test results can help providers diagnose SARS CoV2 infection in real world settings.


2020 ◽  
Author(s):  
William P.T.M. van Doorn ◽  
Floris Helmich ◽  
Paul M.E.L. van Dam ◽  
Leo H.J. Jacobs ◽  
Patricia M. Stassen ◽  
...  

AbstractIntroductionRisk stratification of patients presenting to the emergency department (ED) is important for appropriate triage. Using machine learning technology, we can integrate laboratory data from a modern emergency department and present these in relation to clinically relevant endpoints for risk stratification. In this study, we developed and evaluated transparent machine learning models in four large hospitals in the Netherlands.MethodsHistorical laboratory data (2013-2018) available within the first two hours after presentation to the ED of Maastricht University Medical Centre+ (Maastricht), Meander Medical Center (Amersfoort), and Zuyderland (locations Sittard and Heerlen) were used. We used the first five years of data to develop the model and the sixth year to evaluate model performance in each hospital separately. Performance was assessed using area under the receiver-operating-characteristic curve (AUROC), brier scores and calibration curves. The SHapley Additive exPlanations (SHAP) algorithm was used to obtain transparent machine learning models.ResultsWe included 266,327 patients with more than 7 million laboratory results available for analysis. Models possessed high diagnostic performance with AUROCs of 0.94 [0.94-0.95], 0.98 [0.97-0.98], 0.88 [0.87-0.89] and 0.90 [0.89-0.91] for Maastricht, Amersfoort, Sittard and Heerlen, respectively. Using the SHAP algorithm, we visualized patient characteristics and laboratory results that drive patient-specific RISKINDEX predictions. As an illustrative example, we applied our models in a triage system for risk stratification that categorized 94.7% of the patients as low risk with a corresponding NPV of ≥99%.DiscussionDeveloped machine learning models are transparent with excellent diagnostic performance in predicting 31-day mortality in ED patients across four hospitals. Follow up studies will assess whether implementation of these algorithm can improve clinically relevant endpoints.


2021 ◽  
Author(s):  
YIXUAN DUAN ◽  
Enrui Xie ◽  
Chang Liu ◽  
Jingjing Sun ◽  
Jie Deng

Abstract Background Abdominal aortic aneurysm (AAA), a disease with high mortality, is limited by the current diagnostic methods in the early screening. This study aimed to construct a diagnostic model for AAA by using a novel machine learning method, i.e., an ensemble of the random forest (RF) algorithm and an artificial neural network (ANN) (RF-ANN), to identify potential AAA-associated genetic biomarkers. Methods Through a search of the Gene Expression Omnibus (GEO) database, two large-sample gene expression datasets (GSE57691 and GSE47472) were identified and downloaded. The differentially expressed genes (DEGs) between the AAA and normal control samples were identified, followed by Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Then, RF-ANN was used to identify the key genes from the DEGs, and an AAA diagnostic model was established. Finally, the diagnostic performance of the model was assessed using the area under the receiver operating characteristic curve (AUC) with GSE47472 as a test dataset. Results Using GSE57691, we obtained 2486 DEGs, 52 biological process annotations, 17 cellular component annotations, 17 molecular function annotations, and 13 significantly enriched KEGG pathways. Out of these DEGs, we further identified 74 key candidate feature genes by using the RF machine learning algorithm. The weight of each key gene was calculated by the ANN with GSE57691 as a training dataset to construct an AAA diagnostic model. A transcription factor (TF) regulatory network of key genes was constructed. Finally, GSE47472 was used to validate the model. The AUC value was 0.786, indicating that the model had a highly satisfactory diagnostic performance. Conclusion Potential AAA-associated gene biomarkers were identified, and a diagnostic model of AAA was established. This study may provide a valuable reference for early clinical diagnosis and the search for therapeutic targets of AAA.


2021 ◽  
Vol 11 ◽  
Author(s):  
Hyo-jae Lee ◽  
Anh-Tien Nguyen ◽  
So Yeon Ki ◽  
Jong Eun Lee ◽  
Luu-Ngoc Do ◽  
...  

ObjectiveThis study was conducted in order to investigate the feasibility of using radiomics analysis (RA) with machine learning algorithms based on breast magnetic resonance (MR) images for discriminating malignant from benign MR-detected additional lesions in patients with primary breast cancer.Materials and MethodsOne hundred seventy-four MR-detected additional lesions (benign, n = 86; malignancy, n = 88) from 158 patients with ipsilateral primary breast cancer from a tertiary medical center were included in this retrospective study. The entire data were randomly split to training (80%) and independent test sets (20%). In addition, 25 patients (benign, n = 21; malignancy, n = 15) from another tertiary medical center were included for the external test. Radiomics features that were extracted from three regions-of-interest (ROIs; intratumor, peritumor, combined) using fat-saturated T1-weighted images obtained by subtracting pre- from postcontrast images (SUB) and T2-weighted image (T2) were utilized to train the support vector machine for the binary classification. A decision tree method was utilized to build a classifier model using clinical imaging interpretation (CII) features assessed by radiologists. Area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, and specificity were used to compare the diagnostic performance.ResultsThe RA models trained using radiomics features from the intratumor-ROI showed comparable performance to the CII model (accuracy, AUROC: 73.3%, 69.6% for the SUB RA model; 70.0%, 75.1% for the T2 RA model; 73.3%, 72.0% for the CII model). The diagnostic performance increased when the radiomics and CII features were combined to build a fusion model. The fusion model that combines the CII features and radiomics features from multiparametric MRI data demonstrated the highest performance with an accuracy of 86.7% and an AUROC of 91.1%. The external test showed a similar pattern where the fusion models demonstrated higher levels of performance compared with the RA- or CII-only models. The accuracy and AUROC of the SUB+T2 RA+CII model in the external test were 80.6% and 91.4%, respectively.ConclusionOur study demonstrated the feasibility of using RA with machine learning approach based on multiparametric MRI for quantitatively characterizing MR-detected additional lesions. The fusion model demonstrated an improved diagnostic performance over the models trained with either RA or CII alone.


2020 ◽  
Vol 15 ◽  
Author(s):  
Elham Shamsara ◽  
Sara Saffar Soflaei ◽  
Mohammad Tajfard ◽  
Ivan Yamshchikov ◽  
Habibollah Esmaili ◽  
...  

Background: Coronary artery disease (CAD) is an important cause of mortality and morbidity globally. Objective : The early prediction of the CAD would be valuable in identifying individuals at risk, and in focusing resources on its prevention. In this paper, we aimed to establish a diagnostic model to predict CAD by using three approaches of ANN (pattern recognition-ANN, LVQ-ANN, and competitive ANN). Methods: One promising method for early prediction of disease based on risk factors is machine learning. Among different machine learning algorithms, the artificial neural network (ANN) algo-rithms have been applied widely in medicine and a variety of real-world classifications. ANN is a non-linear computational model, that is inspired by the human brain to analyze and process complex datasets. Results: Different methods of ANN that are investigated in this paper indicates in both pattern recognition ANN and LVQ-ANN methods, the predictions of Angiography+ class have high accuracy. Moreover, in CNN the correlations between the individuals in cluster ”c” with the class of Angiography+ is strongly high. This accuracy indicates the significant difference among some of the input features in Angiography+ class and the other two output classes. A comparison among the chosen weights in these three methods in separating control class and Angiography+ shows that hs-CRP, FSG, and WBC are the most substantial excitatory weights in recognizing the Angiography+ individuals although, HDL-C and MCH are determined as inhibitory weights. Furthermore, the effect of decomposition of a multi-class problem to a set of binary classes and random sampling on the accuracy of the diagnostic model is investigated. Conclusion : This study confirms that pattern recognition-ANN had the most accuracy of performance among different methods of ANN. That’s due to the back-propagation procedure of the process in which the network classify input variables based on labeled classes. The results of binarization show that decomposition of the multi-class set to binary sets could achieve higher accuracy.


2021 ◽  
Vol 09 (02) ◽  
pp. E233-E238
Author(s):  
Rajesh N. Keswani ◽  
Daniel Byrd ◽  
Florencia Garcia Vicente ◽  
J. Alex Heller ◽  
Matthew Klug ◽  
...  

Abstract Background and study aims Storage of full-length endoscopic procedures is becoming increasingly popular. To facilitate large-scale machine learning (ML) focused on clinical outcomes, these videos must be merged with the patient-level data in the electronic health record (EHR). Our aim was to present a method of accurately linking patient-level EHR data with cloud stored colonoscopy videos. Methods This study was conducted at a single academic medical center. Most procedure videos are automatically uploaded to the cloud server but are identified only by procedure time and procedure room. We developed and then tested an algorithm to match recorded videos with corresponding exams in the EHR based upon procedure time and room and subsequently extract frames of interest. Results Among 28,611 total colonoscopies performed over the study period, 21,170 colonoscopy videos in 20,420 unique patients (54.2 % male, median age 58) were matched to EHR data. Of 100 randomly sampled videos, appropriate matching was manually confirmed in all. In total, these videos represented 489,721 minutes of colonoscopy performed by 50 endoscopists (median 214 colonoscopies per endoscopist). The most common procedure indications were polyp screening (47.3 %), surveillance (28.9 %) and inflammatory bowel disease (9.4 %). From these videos, we extracted procedure highlights (identified by image capture; mean 8.5 per colonoscopy) and surrounding frames. Conclusions We report the successful merging of a large database of endoscopy videos stored with limited identifiers to rich patient-level data in a highly accurate manner. This technique facilitates the development of ML algorithms based upon relevant patient outcomes.


2020 ◽  
Vol 41 (S1) ◽  
pp. s521-s522
Author(s):  
Debarka Sengupta ◽  
Vaibhav Singh ◽  
Seema Singh ◽  
Dinesh Tewari ◽  
Mudit Kapoor ◽  
...  

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen &kappa;. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None


Author(s):  
Cheng-Chien Lai ◽  
Wei-Hsin Huang ◽  
Betty Chia-Chen Chang ◽  
Lee-Ching Hwang

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.


Sign in / Sign up

Export Citation Format

Share Document