Implementing clinical decision support for oncology advanced care planning: A systems engineering framework to optimize the usability and utility of a machine learning predictive model in clinical practice.

330 Background: Machine learning models are well-positioned to transform cancer care delivery by providing oncologists with more accurate or accessible information to augment clinical decisions. Many machine learning projects, however, focus on model accuracy without considering the impact of using the model in real-world settings and rarely carry forward to clinical implementation. We present a human-centered systems engineering approach to address clinical problems with workflow interventions utilizing machine learning algorithms. Methods: We aimed to develop a mortality predictive tool, using a Random Forest algorithm, to identify oncology patients at high risk of death within 30 days to move advance care planning (ACP) discussions earlier in the illness trajectory. First, a project sponsor defined the clinical need and requirements of an intervention. The data scientists developed the predictive algorithm using data available in the electronic health record (EHR). A multidisciplinary workgroup was assembled including oncology physicians, advanced practice providers, nurses, social workers, chaplain, clinical informaticists, and data scientists. Meeting bi-monthly, the group utilized human-centered design (HCD) methods to understand clinical workflows and identify points of intervention. The workgroup completed a workflow redesign workshop, a 90-minute facilitated group discussion, to integrate the model in a future state workflow. An EHR (Epic) analyst built the user interface to support the intervention per the group’s requirements. The workflow was piloted in thoracic oncology and bone marrow transplant with plans to scale to other cancer clinics. Results: Our predictive model performance on test data was acceptable (sensitivity 75%, specificity 75%, F-1 score 0.71, AUC 0.82). The workgroup identified a “quality of life coordinator” who: reviews an EHR report of patients scheduled in the upcoming 7 days who have a high risk of 30-day mortality; works with the oncology team to determine ACP clinical appropriateness; documents the need for ACP; identifies potential referrals to supportive oncology, social work, or chaplain; and coordinates the oncology appointment. The oncologist receives a reminder on the day of the patient’s scheduled visit. Conclusions: This workgroup is a viable approach that can be replicated at institutions to address clinical needs and realize the full potential of machine learning models in healthcare. The next steps for this project are to address end-user feedback from the pilot, expand the intervention to other cancer disease groups, and track clinical metrics.

Download Full-text

O-203 Application of machine learning to predict aneuploidy and mosaicism in embryos from in vitro fertilization (IVF) cycles

Human Reproduction ◽

10.1093/humrep/deab128.014 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

J A Ortiz ◽

R Morales ◽

B Lledo ◽

E Garcia-Hernandez ◽

A Cascales ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Predictive Models ◽

Maternal Age ◽

The Other ◽

Predictor Variables ◽

Learning Models ◽

Male Factor ◽

Factors Associated ◽

Machine Learning Models

Abstract Study question Is it possible to predict the likelihood of an IVF embryo being aneuploid and/or mosaic using a machine learning algorithm? Summary answer There are paternal, maternal, embryonic and IVF-cycle factors that are associated with embryonic chromosomal status that can be used as predictors in machine learning models. What is known already The factors associated with embryonic aneuploidy have been extensively studied. Mostly maternal age and to a lesser extent male factor and ovarian stimulation have been related to the occurrence of chromosomal alterations in the embryo. On the other hand, the main factors that may increase the incidence of embryo mosaicism have not yet been established. The models obtained using classical statistical methods to predict embryonic aneuploidy and mosaicism are not of high reliability. As an alternative to traditional methods, different machine and deep learning algorithms are being used to generate predictive models in different areas of medicine, including human reproduction. Study design, size, duration The study design is observational and retrospective. A total of 4654 embryos from 1558 PGT-A cycles were included (January-2017 to December-2020). The trophoectoderm biopsies on D5, D6 or D7 blastocysts were analysed by NGS. Embryos with ≤25% aneuploid cells were considered euploid, between 25-50% were classified as mosaic and aneuploid with >50%. The variables of the PGT-A were recorded in a database from which predictive models of embryonic aneuploidy and mosaicism were developed. Participants/materials, setting, methods The main indications for PGT-A were advanced maternal age, abnormal sperm FISH and recurrent miscarriage or implantation failure. Embryo analysis were performed using Veriseq-NGS (Illumina). The software used to carry out all the analysis was R (RStudio). The library used to implement the different algorithms was caret. In the machine learning models, 22 predictor variables were introduced, which can be classified into 4 categories: maternal, paternal, embryonic and those specific to the IVF cycle. Main results and the role of chance The different couple, embryo and stimulation cycle variables were recorded in a database (22 predictor variables). Two different predictive models were performed, one for aneuploidy and the other for mosaicism. The predictor variable was of multi-class type since it included the segmental and whole chromosome alteration categories. The dataframe were first preprocessed and the different classes to be predicted were balanced. A 80% of the data were used for training the model and 20% were reserved for further testing. The classification algorithms applied include multinomial regression, neural networks, support vector machines, neighborhood-based methods, classification trees, gradient boosting, ensemble methods, Bayesian and discriminant analysis-based methods. The algorithms were optimized by minimizing the Log_Loss that measures accuracy but penalizing misclassifications. The best predictive models were achieved with the XG-Boost and random forest algorithms. The AUC of the predictive model for aneuploidy was 80.8% (Log_Loss 1.028) and for mosaicism 84.1% (Log_Loss: 0.929). The best predictor variables of the models were maternal age, embryo quality, day of biopsy and whether or not the couple had a history of pregnancies with chromosomopathies. The male factor only played a relevant role in the mosaicism model but not in the aneuploidy model. Limitations, reasons for caution Although the predictive models obtained can be very useful to know the probabilities of achieving euploid embryos in an IVF cycle, increasing the sample size and including additional variables could improve the models and thus increase their predictive capacity. Wider implications of the findings Machine learning can be a very useful tool in reproductive medicine since it can allow the determination of factors associated with embryonic aneuploidies and mosaicism in order to establish a predictive model for both. To identify couples at risk of embryo aneuploidy/mosaicism could benefit them of the use of PGT-A. Trial registration number Not Applicable

Download Full-text

Effect of a Real-Time Risk Score on 30-day Readmission Reduction in Singapore

Applied Clinical Informatics ◽

10.1055/s-0041-1726422 ◽

2021 ◽

Vol 12 (02) ◽

pp. 372-382

Author(s):

Christine Xia Wu ◽

Ernest Suresh ◽

Francis Wei Loong Phng ◽

Kai Pik Tai ◽

Janthorn Pakdeethai ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Real Time ◽

Risk Score ◽

Patient Specific ◽

Learning Models ◽

Medicine Department ◽

High Risk Patients ◽

Risk Patients ◽

Machine Learning Models

Abstract Objective To develop a risk score for the real-time prediction of readmissions for patients using patient specific information captured in electronic medical records (EMR) in Singapore to enable the prospective identification of high-risk patients for enrolment in timely interventions. Methods Machine-learning models were built to estimate the probability of a patient being readmitted within 30 days of discharge. EMR of 25,472 patients discharged from the medicine department at Ng Teng Fong General Hospital between January 2016 and December 2016 were extracted retrospectively for training and internal validation of the models. We developed and implemented a real-time 30-day readmission risk score generation in the EMR system, which enabled the flagging of high-risk patients to care providers in the hospital. Based on the daily high-risk patient list, the various interfaces and flow sheets in the EMR were configured according to the information needs of the various stakeholders such as the inpatient medical, nursing, case management, emergency department, and postdischarge care teams. Results Overall, the machine-learning models achieved good performance with area under the receiver operating characteristic ranging from 0.77 to 0.81. The models were used to proactively identify and attend to patients who are at risk of readmission before an actual readmission occurs. This approach successfully reduced the 30-day readmission rate for patients admitted to the medicine department from 11.7% in 2017 to 10.1% in 2019 (p < 0.01) after risk adjustment. Conclusion Machine-learning models can be deployed in the EMR system to provide real-time forecasts for a more comprehensive outlook in the aspects of decision-making and care provision.

Download Full-text

Influence of social determinants of health and county vaccination rates on machine learning models to predict COVID-19 case growth in Tennessee

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2021-100439 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100439

Author(s):

Lukasz S Wylezinski ◽

Coleman R Harris ◽

Cody N Heiser ◽

Jamieson D Gray ◽

Charles F Spurlock

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Social Determinants Of Health ◽

Social Determinants ◽

Determinants Of Health ◽

Learning Models ◽

Vaccination Rates ◽

Data Framework ◽

The Impact ◽

Machine Learning Models

IntroductionThe SARS-CoV-2 (COVID-19) pandemic has exposed health disparities throughout the USA, particularly among racial and ethnic minorities. As a result, there is a need for data-driven approaches to pinpoint the unique constellation of clinical and social determinants of health (SDOH) risk factors that give rise to poor patient outcomes following infection in US communities.MethodsWe combined county-level COVID-19 testing data, COVID-19 vaccination rates and SDOH information in Tennessee. Between February and May 2021, we trained machine learning models on a semimonthly basis using these datasets to predict COVID-19 incidence in Tennessee counties. We then analyzed SDOH data features at each time point to rank the impact of each feature on model performance.ResultsOur results indicate that COVID-19 vaccination rates play a crucial role in determining future COVID-19 disease risk. Beginning in mid-March 2021, higher vaccination rates significantly correlated with lower COVID-19 case growth predictions. Further, as the relative importance of COVID-19 vaccination data features grew, demographic SDOH features such as age, race and ethnicity decreased while the impact of socioeconomic and environmental factors, including access to healthcare and transportation, increased.ConclusionIncorporating a data framework to track the evolving patterns of community-level SDOH risk factors could provide policy-makers with additional data resources to improve health equity and resilience to future public health emergencies.

Download Full-text

Benchmarking of Machine Learning Models to Assist the Prognosis of Tuberculosis

10.20944/preprints202103.0284.v2 ◽

2021 ◽

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Élisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Clinical Symptoms ◽

Treatment Decision ◽

Gradient Boosting ◽

Original Form ◽

Learning Models ◽

Data Set ◽

Risk Of Death ◽

Increased Risk ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.

Download Full-text

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study

PLoS Medicine ◽

10.1371/journal.pmed.1002701 ◽

2018 ◽

Vol 15 (11) ◽

pp. e1002701 ◽

Cited By ~ 37

Author(s):

Kristin M. Corey ◽

Sehj Kashyap ◽

Elizabeth Lorenzi ◽

Sandhya A. Lagoo-Deenadayalan ◽

Katherine Heller ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Electronic Health Record ◽

Surgical Patients ◽

Health Record ◽

Learning Models ◽

Electronic Health Record Data ◽

Record Data ◽

Development And Validation ◽

Machine Learning Models

Download Full-text

Practical Considerations for Accuracy Evaluation in Sensor-Based Machine Learning and Deep Learning

Sensors ◽

10.3390/s19163491 ◽

2019 ◽

Vol 19 (16) ◽

pp. 3491 ◽

Cited By ~ 1

Author(s):

Issam Hammad ◽

Kamal El-Sankary

Keyword(s):

Machine Learning ◽

Thermal Noise ◽

Error Resilience ◽

Sensor Data ◽

Accuracy Evaluation ◽

Sensor Failure ◽

Learning Models ◽

Analog To Digital ◽

The Impact ◽

Machine Learning Models

Accuracy evaluation in machine learning is based on the split of data into a training set and a test set. This critical step is applied to develop machine learning models including models based on sensor data. For sensor-based problems, comparing the accuracy of machine learning models using the train/test split provides only a baseline comparison in ideal situations. Such comparisons won’t consider practical production problems that can impact the inference accuracy such as the sensors’ thermal noise, performance with lower inference quantization, and tolerance to sensor failure. Therefore, this paper proposes a set of practical tests that can be applied when comparing the accuracy of machine learning models for sensor-based problems. First, the impact of the sensors’ thermal noise on the models’ inference accuracy was simulated. Machine learning algorithms have different levels of error resilience to thermal noise, as will be presented. Second, the models’ accuracy using lower inference quantization was compared. Lowering inference quantization leads to lowering the analog-to-digital converter (ADC) resolution which is cost-effective in embedded designs. Moreover, in custom designs, analog-to-digital converters’ (ADCs) effective number of bits (ENOB) is usually lower than the ideal number of bits due to various design factors. Therefore, it is practical to compare models’ accuracy using lower inference quantization. Third, the models’ accuracy tolerance to sensor failure was evaluated and compared. For this study, University of California Irvine (UCI) ‘Daily and Sports Activities’ dataset was used to present these practical tests and their impact on model selection.

Download Full-text

Identification of Primary Antimicrobial Resistance Drivers in Agricultural Nontyphoidal Salmonella enterica Serovars by Using Machine Learning

mSystems ◽

10.1128/msystems.00211-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 1

Author(s):

Finlay Maguire ◽

Muhammad Attiq Rehman ◽

Catherine Carrillo ◽

Moussa S. Diarra ◽

Robert G. Beiko

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Antimicrobial Resistance ◽

Broiler Chicken ◽

Genomic Data ◽

Set Covering ◽

Learning Models ◽

Commercial Chicken ◽

The Impact ◽

Machine Learning Models

ABSTRACT Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained “reference-free” k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 β-lactamase is a primary driver of β-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem. IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.

Download Full-text

Extra Point Under Review: Machine Learning And The NFL Field Goal

Elements ◽

10.6017/eurj.v12i2.9448 ◽

2016 ◽

Vol 12 (2) ◽

Author(s):

James LeDoux

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Learning Models ◽

The World ◽

Extra Point ◽

Machine Learning Models

<p>The new NFL extra point rule first implemented in the 2015 season requires a kicker to attempt his extra point with the ball snapped from the 15-yard line. This attempt stretches an extra point to the equivalent of a 32-yard field goal attempt, 13 yards longer than under the previous rule. Though a 32-yard attempt is still a chip shot to any professional kicker, many NFL analysts were surprised to see the number of extra points that were missed. Should this really have been a surprise, though? Beginning with a replication of a study by Clark et. al, this study aims to explore the world of NFL kicking from a statistical perspective, applying econometric and machine learning models to display a deeper perspective on what exactly makes some field goal attempts more difficult than others. Ultimately, the goal is to go beyond the previous research on this topic, providing an improved predictive model of field goal success and a better metric for evaluating placekicker ability.</p>

Download Full-text

Modelling COVID-19 Pandemic Dynamics Using Transparent, Interpretable, Parsimonious and Simulatable (TIPS) Machine Learning Models: A Case Study from Systems Thinking and System Identification Perspectives

10.1101/2021.11.01.21265653 ◽

2021 ◽

Author(s):

Hua-Liang Wei ◽

Stephen A Billings

Keyword(s):

Machine Learning ◽

System Identification ◽

Systems Thinking ◽

Systems Engineering ◽

Moving Average ◽

A Priori ◽

Learning Models ◽

New Findings ◽

Machine Learning Models

Since the outbreak of COVID-19, an astronomical number of publications on the pandemic dynamics appeared in the literature, of which many use the susceptible infected removed (SIR) and susceptible exposed infected removed (SEIR) models, or their variants, to simulate and study the spread of the coronavirus. SIR and SEIR are continuous-time models which are a class of initial value problems (IVPs) of ordinary differential equations (ODEs). Discrete-time models such as regression and machine learning have also been applied to analyze COVID-19 pandemic data (e.g. predicting infection cases), but most of these methods use simplified models involving a small number of input variables pre-selected based on a priori knowledge, or use very complicated models (e.g. deep learning), purely focusing on certain prediction purposes and paying little attention to the model interpretability. There have been relatively fewer studies focusing on the investigations of the inherent time-lagged or time-delayed relationships e.g. between the reproduction number (R number), infection cases, and deaths, analyzing the pandemic spread from a systems thinking and dynamic perspective. The present study, for the first time, proposes using systems engineering and system identification approach to build transparent, interpretable, parsimonious and simulatable (TIPS) dynamic machine learning models, establishing links between the R number, the infection cases and deaths caused by COVID-19. The TIPS models are developed based on the well-known NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous inputs) model, which can help better understand the COVID-19 pandemic dynamics. A case study on the UK COVID-19 data is carried out, and new findings are detailed. The proposed method and the associated new findings are useful for better understanding the spread dynamics of the COVID-19 pandemic.

Download Full-text

Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data (Preprint)

10.2196/preprints.11757 ◽

2018 ◽

Author(s):

Jaram Park ◽

Jeong-Whun Kim ◽

Borim Ryu ◽

Eunyoung Heo ◽

Se Young Jung ◽

...

Keyword(s):

Machine Learning ◽

Health Care ◽

High Risk ◽

Machine Learning Algorithms ◽

Risk Level ◽

Care Providers ◽

Learning Models ◽

Health Maintenance ◽

External Test ◽

Machine Learning Models

BACKGROUND Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension. OBJECTIVE Our goal was to develop and compare machine learning models predicting high-risk vascular diseases for hypertensive patients so that they can manage their blood pressure based on their risk level. METHODS We used a 12-year longitudinal dataset of the nationwide sample cohort, which contains the data of 514,866 patients and allows tracking of patients’ medical history across all health care providers in Korea (N=51,920). To ensure the generalizability of our models, we conducted an external validation using another national sample cohort dataset, comprising one million different patients, published by the National Health Insurance Service. From each dataset, we obtained the data of 74,535 and 59,738 patients with essential hypertension and developed machine learning models for predicting cardiovascular and cerebrovascular events. Six machine learning models were developed and compared for evaluating performances based on validation metrics. RESULTS Machine learning algorithms enabled us to detect high-risk patients based on their medical history. The long short-term memory-based algorithm outperformed in the within test (F1-score=.772, external test F1-score=.613), and the random forest-based algorithm of risk prediction showed better performance over other machine learning algorithms concerning generalization (within test F1-score=.757, external test F1-score=.705). Concerning the number of features, in the within test, the long short-term memory-based algorithms outperformed regardless of the number of features. However, in the external test, the random forest-based algorithm was the best, irrespective of the number of features it encountered. CONCLUSIONS We developed and compared machine learning models predicting high-risk vascular diseases in hypertensive patients so that they may manage their blood pressure based on their risk level. By relying on the prediction model, a government can predict high-risk patients at the nationwide level and establish health care policies in advance.

Download Full-text