An automated EHR-based tool for identification of patients (pts) with metastatic disease to facilitate clinical trial pt ascertainment.

2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 2051-2051
Author(s):  
Jeffrey J. Kirshner ◽  
Kelly Cohn ◽  
Steven Dunder ◽  
Karri Donahue ◽  
Madeline Richey ◽  
...  

2051 Background: Efforts to facilitate patient identification for clinical trials in routine practice, such as automating electronic health record (EHR) data reviews, are hindered by the lack of information on metastatic status in structured format. We developed a machine learning tool that infers metastatic status from unstructured EHR data, and we describe its real-world implementation. Methods: This machine learning model scans EHR documents, extracting features from text snippets surrounding key words (ie, ‘Metastatic’ ‘Progression’ ‘Local’). A regularized logistic regression model was trained, and used to classify patients across 5 metastatic status inference categories: highly-likely and likely positive, highly-likely and likely negative, and unknown. The model accuracy was characterized using the Flatiron Health EHR-derived de-identified database of patients with solid tumors, where manually abstracted information served as standard accurate reference. We assessed model accuracy using sensitivity and specificity (patients in the ‘unknown’ category omitted from numerator), negative and positive predictive values (NPV, PPV; patients ‘unknown’ included in denominator), and its performance in a real-world dataset. In a separate validation, we evaluated the accuracy gained upon additional user review of the model outputs after integration of this tool into workflows. Results: This metastatic status inference model was characterized using a sample of 66,532 patients. The model sensitivity and specificity (95%CI) were 82.% (82, 83) and 95% (95, 96), respectively; PPV was 89% (89, 90) and NPV was 94% (94, 94). In the validation sample (N = 200 originated from 5 distinct care sites), and after user review of model outputs, values increased to 97% (85, 100) for sensitivity, 98% (95, 100) for specificity, 92 (78, 98) for PPV and 99% (97, 100) for NPV. The model assigned 163/200 patients to the highly-likely categories, which were deemed not to require further EHR review by users. The prevalence of errors was 4% without user review, and 2% after user review. Conclusions: This machine learning model infers metastatic status from unstructured EHR data with high accuracy. The tool assigns metastatic status with high confidence in more than 75% of cases without requiring additional manual review, allowing more efficient identification of clinical trial candidates and clinical trial matching, thus mitigating a key barrier for clinical trial participation in community clinics.

Author(s):  
Xianping Du ◽  
Onur Bilgen ◽  
Hongyi Xu

Abstract Machine learning for classification has been used widely in engineering design, for example, feasible domain recognition and hidden pattern discovery. Training an accurate machine learning model requires a large dataset; however, high computational or experimental costs are major issues in obtaining a large dataset for real-world problems. One possible solution is to generate a large pseudo dataset with surrogate models, which is established with a smaller set of real training data. However, it is not well understood whether the pseudo dataset can benefit the classification model by providing more information or deteriorates the machine learning performance due to the prediction errors and uncertainties introduced by the surrogate model. This paper presents a preliminary investigation towards this research question. A classification-and-regressiontree model is employed to recognize the design subspaces to support design decision-making. It is implemented on the geometric design of a vehicle energy-absorbing structure based on finite element simulations. Based on a small set of real-world data obtained by simulations, a surrogate model based on Gaussian process regression is employed to generate pseudo datasets for training. The results showed that the tree-based method could help recognize feasible design domains efficiently. Furthermore, the additional information provided by the surrogate model enhances the accuracy of classification. One important conclusion is that the accuracy of the surrogate model determines the quality of the pseudo dataset and hence, the improvements in the machine learning model.


2020 ◽  
Author(s):  
Yingjian Liang ◽  
Chengrui Zhu ◽  
Cong Tian ◽  
Qizhong Lin ◽  
Zhiliang Li ◽  
...  

Abstract Background: This study was performed to develop and validate machine learning models for the early detection of ventilator-associated pneumonia (VAP) in patients 24 h before the diagnosis that enables VAP patients to receive early intervention and reduces the occurrence of complications.Patients and Methods: This study was based on the MIMIC-III dataset, which was a retrospective cohort. The random forest algorithm was applied to construct a base classifier, and the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity and specificity of the prediction model were evaluated. Meanwhile, a Clinical Pulmonary Infection Score (CPIS)-based model (threshold value≥3) using the same training and test data set was used as the control model.Results: A total of 38,515 ventilation durations occurred in 61,532 ICU admissions. VAP occurred in 212 of these durations. We incorporated 42 VAP risk factors on admission and routinely measured vital characteristics and laboratory results. Five-fold cross-validation was performed to evaluate the model performance, and the model achieved an AUC of 84.4%±1.7% on validation, 74.3%±2.5% sensitivity and 70.7.6%±1.2% specificity 24 h before the gold standard time (at least 48 h after ventilation). Our VAP machine learning model improved the AUC of the CPIS-based model by almost 25%, and the sensitivity and specificity were also improved by almost 14% and 15%, respectively.Conclusions: We developed and internally validated an automated model of VAP prediction in the MIMIC-III cohort. The VAP prediction model achieved high performance for AUC, sensitivity and specificity. and its performance was superior to that of the CPIS model. External validation and prospective interventional or outcome studies using this prediction model are envisioned as future work.


2018 ◽  
Author(s):  
Steen Lysgaard ◽  
Paul C. Jennings ◽  
Jens Strabo Hummelshøj ◽  
Thomas Bligaard ◽  
Tejs Vegge

A machine learning model is used as a surrogate fitness evaluator in a genetic algorithm (GA) optimization of the atomic distribution of Pt-Au nanoparticles. The machine learning accelerated genetic algorithm (MLaGA) yields a 50-fold reduction of required energy calculations compared to a traditional GA.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


Author(s):  
Dhaval Patel ◽  
Shrey Shrivastava ◽  
Wesley Gifford ◽  
Stuart Siegel ◽  
Jayant Kalagnanam ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document