Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

2021 ◽  
Vol 12 (04) ◽  
pp. 808-815
Author(s):  
Lin Lawrence Guo ◽  
Stephen R. Pfohl ◽  
Jason Fries ◽  
Jose Posada ◽  
Scott Lanyon Fleming ◽  
...  

Abstract Objective The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts. Methods Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects. Results Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination. Conclusion There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.

2019 ◽  
Vol 3 (s1) ◽  
pp. 60-61
Author(s):  
Kadie Clancy ◽  
Esmaeel Dadashzadeh ◽  
Christof Kaltenmeier ◽  
JB Moses ◽  
Shandong Wu

OBJECTIVES/SPECIFIC AIMS: This retrospective study aims to create and train machine learning models using a radiomic-based feature extraction method for two classification tasks: benign vs. pathologic PI and operation of benefit vs. operation not needed. The long-term goal of our study is to build a computerized model that incorporates both radiomic features and critical non-imaging clinical factors to improve current surgical decision-making when managing PI patients. METHODS/STUDY POPULATION: Searched radiology reports from 2010-2012 via the UPMC MARS Database for reports containing the term “pneumatosis” (subsequently accounting for negations and age restrictions). Our inclusion criteria included: patient age 18 or older, clinical data available at time of CT diagnosis, and PI visualized on manual review of imaging. Cases with intra-abdominal free air were excluded. Collected CT imaging data and an additional 149 clinical data elements per patient for a total of 75 PI cases. Data collection of an additional 225 patients is ongoing. We trained models for two clinically-relevant prediction tasks. The first (referred to as prediction task 1) classifies between benign and pathologic PI. Benign PI is defined as either lack of intraoperative visualization of transmural intestinal necrosis or successful non-operative management until discharge. Pathologic PI is defined as either intraoperative visualization of transmural PI or withdrawal of care and subsequent death during hospitalization. The distribution of data samples for prediction task 1 is 47 benign cases and 38 pathologic cases. The second (referred to as prediction task 2) classifies between whether the patient benefitted from an operation or not. “Operation of benefit” is defined as patients with PI, be it transmural or simply mucosal, who benefited from an operation. “Operation not needed” is defined as patients who were safely discharged without an operation or patients who had an operation, but nothing was found. The distribution of data samples for prediction task 2 is 37 operation not needed cases and 38 operation of benefit cases. An experienced surgical resident from UPMC manually segmented 3D PI ROIs from the CT scans (5 mm Axial cut) for each case. The most concerning ~10-15 cm segment of bowel for necrosis with a 1 cm margin was selected. A total of 7 slices per patient were segmented for consistency. For both prediction task 1 and prediction task 2, we independently completed the following procedure for testing and training: 1.) Extracted radiomic features from the 3D PI ROIs that resulted in 99 total features. 2.) Used LASSO feature selection to determine the subset of the original 99 features that are most significant for performance of the prediction task. 3.) Used leave-one-out cross-validation for testing and training to account for the small dataset size in our preliminary analysis. Implemented and trained several machine learning models (AdaBoost, SVM, and Naive Bayes). 4.) Evaluated the trained models in terms of AUC and Accuracy and determined the ideal model structure based on these performance metrics. RESULTS/ANTICIPATED RESULTS: Prediction Task 1: The top-performing model for this task was an SVM model trained using 19 features. This model had an AUC of 0.79 and an accuracy of 75%. Prediction Task 2: The top-performing model for this task was an SVM model trained using 28 features. This model had an AUC of 0.74 and an accuracy of 64%. DISCUSSION/SIGNIFICANCE OF IMPACT: To the best of our knowledge, this is the first study to use radiomic-based machine learning models for the prediction of tissue ischemia, specifically intestinal ischemia in the setting of PI. In this preliminary study, which serves as a proof of concept, the performance of our models has demonstrated the potential of machine learning based only on radiomic imaging features to have discriminative power for surgical decision-making problems. While many non-imaging-related clinical factors play a role in the gestalt of clinical decision making when PI presents, we have presented radiomic-based models that may augment this decision-making process, especially for more difficult cases when clinical features indicating acute abdomen are absent. It should be noted that prediction task 2, whether or not a patient presenting with PI would benefit from an operation, has lower performance than prediction task 1 and is also a more challenging task for physicians in real clinical environments. While our results are promising and demonstrate potential, we are currently working to increase our dataset to 300 patients to further train and assess our models. References DuBose, Joseph J., et al. “Pneumatosis Intestinalis Predictive Evaluation Study (PIPES): a multicenter epidemiologic study of the Eastern Association for the Surgery of Trauma.” Journal of Trauma and Acute Care Surgery 75.1 (2013): 15-23. Knechtle, Stuart J., Andrew M. Davidoff, and Reed P. Rice. “Pneumatosis intestinalis. Surgical management and clinical outcome.” Annals of Surgery 212.2 (1990): 160.


2021 ◽  
Vol 39 (28_suppl) ◽  
pp. 330-330
Author(s):  
Teja Ganta ◽  
Stephanie Lehrman ◽  
Rachel Pappalardo ◽  
Madalene Crow ◽  
Meagan Will ◽  
...  

330 Background: Machine learning models are well-positioned to transform cancer care delivery by providing oncologists with more accurate or accessible information to augment clinical decisions. Many machine learning projects, however, focus on model accuracy without considering the impact of using the model in real-world settings and rarely carry forward to clinical implementation. We present a human-centered systems engineering approach to address clinical problems with workflow interventions utilizing machine learning algorithms. Methods: We aimed to develop a mortality predictive tool, using a Random Forest algorithm, to identify oncology patients at high risk of death within 30 days to move advance care planning (ACP) discussions earlier in the illness trajectory. First, a project sponsor defined the clinical need and requirements of an intervention. The data scientists developed the predictive algorithm using data available in the electronic health record (EHR). A multidisciplinary workgroup was assembled including oncology physicians, advanced practice providers, nurses, social workers, chaplain, clinical informaticists, and data scientists. Meeting bi-monthly, the group utilized human-centered design (HCD) methods to understand clinical workflows and identify points of intervention. The workgroup completed a workflow redesign workshop, a 90-minute facilitated group discussion, to integrate the model in a future state workflow. An EHR (Epic) analyst built the user interface to support the intervention per the group’s requirements. The workflow was piloted in thoracic oncology and bone marrow transplant with plans to scale to other cancer clinics. Results: Our predictive model performance on test data was acceptable (sensitivity 75%, specificity 75%, F-1 score 0.71, AUC 0.82). The workgroup identified a “quality of life coordinator” who: reviews an EHR report of patients scheduled in the upcoming 7 days who have a high risk of 30-day mortality; works with the oncology team to determine ACP clinical appropriateness; documents the need for ACP; identifies potential referrals to supportive oncology, social work, or chaplain; and coordinates the oncology appointment. The oncologist receives a reminder on the day of the patient’s scheduled visit. Conclusions: This workgroup is a viable approach that can be replicated at institutions to address clinical needs and realize the full potential of machine learning models in healthcare. The next steps for this project are to address end-user feedback from the pilot, expand the intervention to other cancer disease groups, and track clinical metrics.


Author(s):  
Chenxi Huang ◽  
Shu-Xia Li ◽  
César Caraballo ◽  
Frederick A. Masoudi ◽  
John S. Rumsfeld ◽  
...  

Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics. Methods and Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics. Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.


2021 ◽  
Vol 28 (1) ◽  
pp. e100439
Author(s):  
Lukasz S Wylezinski ◽  
Coleman R Harris ◽  
Cody N Heiser ◽  
Jamieson D Gray ◽  
Charles F Spurlock

IntroductionThe SARS-CoV-2 (COVID-19) pandemic has exposed health disparities throughout the USA, particularly among racial and ethnic minorities. As a result, there is a need for data-driven approaches to pinpoint the unique constellation of clinical and social determinants of health (SDOH) risk factors that give rise to poor patient outcomes following infection in US communities.MethodsWe combined county-level COVID-19 testing data, COVID-19 vaccination rates and SDOH information in Tennessee. Between February and May 2021, we trained machine learning models on a semimonthly basis using these datasets to predict COVID-19 incidence in Tennessee counties. We then analyzed SDOH data features at each time point to rank the impact of each feature on model performance.ResultsOur results indicate that COVID-19 vaccination rates play a crucial role in determining future COVID-19 disease risk. Beginning in mid-March 2021, higher vaccination rates significantly correlated with lower COVID-19 case growth predictions. Further, as the relative importance of COVID-19 vaccination data features grew, demographic SDOH features such as age, race and ethnicity decreased while the impact of socioeconomic and environmental factors, including access to healthcare and transportation, increased.ConclusionIncorporating a data framework to track the evolving patterns of community-level SDOH risk factors could provide policy-makers with additional data resources to improve health equity and resilience to future public health emergencies.


Sensors ◽  
2019 ◽  
Vol 19 (16) ◽  
pp. 3491 ◽  
Author(s):  
Issam Hammad ◽  
Kamal El-Sankary

Accuracy evaluation in machine learning is based on the split of data into a training set and a test set. This critical step is applied to develop machine learning models including models based on sensor data. For sensor-based problems, comparing the accuracy of machine learning models using the train/test split provides only a baseline comparison in ideal situations. Such comparisons won’t consider practical production problems that can impact the inference accuracy such as the sensors’ thermal noise, performance with lower inference quantization, and tolerance to sensor failure. Therefore, this paper proposes a set of practical tests that can be applied when comparing the accuracy of machine learning models for sensor-based problems. First, the impact of the sensors’ thermal noise on the models’ inference accuracy was simulated. Machine learning algorithms have different levels of error resilience to thermal noise, as will be presented. Second, the models’ accuracy using lower inference quantization was compared. Lowering inference quantization leads to lowering the analog-to-digital converter (ADC) resolution which is cost-effective in embedded designs. Moreover, in custom designs, analog-to-digital converters’ (ADCs) effective number of bits (ENOB) is usually lower than the ideal number of bits due to various design factors. Therefore, it is practical to compare models’ accuracy using lower inference quantization. Third, the models’ accuracy tolerance to sensor failure was evaluated and compared. For this study, University of California Irvine (UCI) ‘Daily and Sports Activities’ dataset was used to present these practical tests and their impact on model selection.


mSystems ◽  
2019 ◽  
Vol 4 (4) ◽  
Author(s):  
Finlay Maguire ◽  
Muhammad Attiq Rehman ◽  
Catherine Carrillo ◽  
Moussa S. Diarra ◽  
Robert G. Beiko

ABSTRACT Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained “reference-free” k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 β-lactamase is a primary driver of β-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem. IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.


The study aims at Rainfall prediction using Machine Learning models using the minimum of features. The prediction here is based on temperature, vapour pressure and relative humidity. Numerous studies carried out earlier used more features than this study. A training-test split of 75-25 was used. The best results were obtained by combining the best of the candidate models into an ensemble model to identify that predictor importance of vapour pressure was 0.89 while that of relative humidity was 0.11 with temperature not seen as a significant predictor for rainfall though the high correlation of temperature (°C) with vapour pressure (Torr) and relative humidity (Percentage) suggests that the two predictor variables subsume the impact of temperature.


Sign in / Sign up

Export Citation Format

Share Document