Comparison of machine learning methods for predicting viral failure: a case study using electronic health record data

AbstractBackgroundHuman immunodeficiency virus (HIV) viral failure occurs when antiretroviral therapy fails to suppress and sustain a person’s viral load count below 1,000 copies of viral ribonucleic acid per milliliter. For those newly diagnosed with HIV and living in a setting where healthcare resources are limited, such as a low- and middle-income country, the World Health Organization recommends viral load monitoring six months after initiation of antiretroviral treatment and yearly thereafter. Deviations from this schedule are made in cases where viral failure occurs or at the discretion of the clinician. Failure to detect viral failure in a timely fashion can lead to delayed administration of essential interventions. Clinical prediction models based on information available in the patient medical record are increasingly being developed and deployed for decision support in clinical medicine and public health. This raises the possibility that prediction models can be used to detect potential for viral failure in advance of viral measurements, particularly when those measurements occur infrequently.ObjectiveOur goal is to use electronic health record data from a large HIV care program in Kenya to characterize and compare the predictive accuracy of several statistical machine learning methods for predicting viral failure at the first and second measurements following initiation of antiretroviral therapy. Predictive accuracy is measured in terms of sensitivity, specificity and area under the receiver-operator characteristic curve.MethodsWe trained and cross-validated 10 statistical machine learning models and algorithms on data from over 10,000 patients in the Academic Model Providing Access to Healthcare care program in western Kenya. These included parametric, non-parametric, ensemble, and Bayesian methods. The input variables included 50 items from the clinical record, hand picked in consultation with clinician experts. Predictive accuracy measures were calculated using 10-fold cross validation.ResultsViral load failure rate is about 20% in this patient cohort at both the first and second measurements. Ensemble techniques generally outperformed other methods. For predicting viral failure at the first follow up measure, specificity was over 90% for these methods, but sensitivity was typically in the 50–60% range. Predictive accuracy was greater for the second follow up measure, with sensitivities over 80%. Super Learner, gradient boosting and Bayesian additive regression trees consistently outperformed other methods. For a viral failure rate of 20%, the positive predictive value for the top-performing methods is between 75 and 85%, while the negative predictive value is over 95%.ConclusionEvidence from this study suggests that machine learning techniques have potential to identify patients at risk for viral failure prior to their scheduled measurements. Ultimately, prognostic virologic assessment can help guide the administration of earlier targeted intervention such as enhanced drug resistance monitoring, rigorous adherence counseling, or appropriate next-line therapy switching. External validation studies should be used to confirm the results found here.

Download Full-text

1EMF Prediction of Acute Kidney Injury in the Emergency Department Using Electronic Health Record Data and Machine Learning Methods

Annals of Emergency Medicine ◽

10.1016/j.annemergmed.2018.08.398 ◽

2018 ◽

Vol 72 (4) ◽

pp. S154 ◽

Cited By ~ 1

Author(s):

J.S. Hinson ◽

D.A. Martinez ◽

M.S. Grams ◽

S. Levin

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Acute Kidney Injury ◽

Electronic Health Record ◽

Kidney Injury ◽

Health Record ◽

Electronic Health Record Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Record Data

Download Full-text

Early Prediction of Acute Kidney Injury in the Emergency Department With Machine-Learning Methods Applied to Electronic Health Record Data

Annals of Emergency Medicine ◽

10.1016/j.annemergmed.2020.05.026 ◽

2020 ◽

Vol 76 (4) ◽

pp. 501-514

Author(s):

Diego A. Martinez ◽

Scott R. Levin ◽

Eili Y. Klein ◽

Chirag R. Parikh ◽

Steven Menez ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Acute Kidney Injury ◽

Electronic Health Record ◽

Kidney Injury ◽

Health Record ◽

Early Prediction ◽

Electronic Health Record Data ◽

Machine Learning Methods ◽

Record Data

Download Full-text

A novel multi-stage ensemble model with multiple K-means-based selective undersampling: An application in credit scoring

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201954 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9471-9484

Author(s):

Yilun Jin ◽

Yanan Liu ◽

Wenyu Zhang ◽

Shuai Zhang ◽

Yu Lou

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Credit Scoring ◽

Imbalanced Data ◽

Ensemble Model ◽

Selective Sampling ◽

Machine Learning Methods ◽

Multi Stage ◽

Proposed Model ◽

New Feature

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

FRI0046 PHARMACOGENOMICS-DRIVEN INDIVIDUALIZED PREDICTION OF TREATMENT RESPONSE TO METHOTREXATE IN PATIENTS WITH RHEUMATOID ARTHRITIS: A MACHINE LEARNING APPROACH

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.4993 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 598.2-598

Author(s):

E. Myasoedova ◽

A. Athreya ◽

C. S. Crowson ◽

R. Weinshilboum ◽

L. Wang ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Supervised Machine Learning ◽

Research Support ◽

Eular Response ◽

Learning Methods ◽

Machine Learning Methods ◽

Early Ra ◽

Genome Wide

Background:Methotrexate (MTX) is the most common anchor drug for rheumatoid arthritis (RA), but the risk of missing the opportunity for early effective treatment with alternative medications is substantial given the delayed onset of MTX action and 30-40% inadequate response rate. There is a compelling need to accurately predicting MTX response prior to treatment initiation, which allows for effectively identifying patients at RA onset who are likely to respond to MTX.Objectives:To test the ability of machine learning approaches with clinical and genomic biomarkers to predict MTX response with replications in independent samples.Methods:Age, sex, clinical, serological and genome-wide association study (GWAS) data on patients with early RA of European ancestry from 647 patients (336 recruited in United Kingdom [UK]; 307 recruited across Europe; 70% female; 72% rheumatoid factor [RF] positive; mean age 54 years; mean baseline Disease Activity Score with 28-joint count [DAS28] 5.65) of the PhArmacogenetics of Methotrexate in RA (PAMERA) consortium was used in this study. The genomics data comprised 160 genome-wide significant single nucleotide polymorphisms (SNPs) with p<1×10-5 associated with risk of RA and MTX metabolism. DAS28 score was available at baseline and 3-month follow-up visit. Response to MTX monotherapy at the dose of ≥15 mg/week was defined as good or moderate by the EULAR response criteria at 3 months’ follow up visit. Supervised machine-learning methods were trained with 5-repeats and 10-fold cross-validation using data from PAMERA’s 336 UK patients. Class imbalance (higher % of MTX responders) in training was accounted by using simulated minority oversampling technique. Prediction performance was validated in PAMERA’s 307 European patients (not used in training).Results:Age, sex, RF positivity and baseline DAS28 data predicted MTX response with 58% accuracy of UK and European patients (p = 0.7). However, supervised machine-learning methods that combined demographics, RF positivity, baseline DAS28 and genomic SNPs predicted EULAR response at 3 months with area under the receiver operating curve (AUC) of 0.83 (p = 0.051) in UK patients, and achieved prediction accuracies (fraction of correctly predicted outcomes) of 76.2% (p = 0.054) in the European patients, with sensitivity of 72% and specificity of 77%. The addition of genomic data improved the predictive accuracies of MTX response by 19% and achieved cross-site replication. Baseline DAS28 scores and following SNPs rs12446816, rs13385025, rs113798271, and rs2372536 were among the top predictors of MTX response.Conclusion:Pharmacogenomic biomarkers combined with DAS28 scores predicted MTX response in patients with early RA more reliably than using demographics and DAS28 scores alone. Using pharmacogenomics biomarkers for identification of MTX responders at early stages of RA may help to guide effective RA treatment choices, including timely escalation of RA therapies. Further studies on personalized prediction of response to MTX and other anti-rheumatic treatments are warranted to optimize control of RA disease and improve outcomes in patients with RA.Disclosure of Interests:Elena Myasoedova: None declared, Arjun Athreya: None declared, Cynthia S. Crowson Grant/research support from: Pfizer research grant, Richard Weinshilboum Shareholder of: co-founder and stockholder in OneOme, Liewei Wang: None declared, Eric Matteson Grant/research support from: Pfizer, Consultant of: Boehringer Ingelheim, Gilead, TympoBio, Arena Pharmaceuticals, Speakers bureau: Simply Speaking

Download Full-text

Predicting plaque vulnerability change using intravascular ultrasound + optical coherence tomography image-based fluid–structure interaction models and machine learning methods with patient follow-up data: a feasibility study

BioMedical Engineering OnLine ◽

10.1186/s12938-021-00868-6 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Xiaoya Guo ◽

Akiko Maehara ◽

Mitsuaki Matsumura ◽

Liang Wang ◽

Jie Zheng ◽

...

Keyword(s):

Machine Learning ◽

Coronary Plaque ◽

Support Vector ◽

Single Factor ◽

Plaque Vulnerability ◽

Learning Methods ◽

Machine Learning Methods ◽

Vulnerability Prediction ◽

Biomechanical Factors

Abstract Background Coronary plaque vulnerability prediction is difficult because plaque vulnerability is non-trivial to quantify, clinically available medical image modality is not enough to quantify thin cap thickness, prediction methods with high accuracies still need to be developed, and gold-standard data to validate vulnerability prediction are often not available. Patient follow-up intravascular ultrasound (IVUS), optical coherence tomography (OCT) and angiography data were acquired to construct 3D fluid–structure interaction (FSI) coronary models and four machine-learning methods were compared to identify optimal method to predict future plaque vulnerability. Methods Baseline and 10-month follow-up in vivo IVUS and OCT coronary plaque data were acquired from two arteries of one patient using IRB approved protocols with informed consent obtained. IVUS and OCT-based FSI models were constructed to obtain plaque wall stress/strain and wall shear stress. Forty-five slices were selected as machine learning sample database for vulnerability prediction study. Thirteen key morphological factors from IVUS and OCT images and biomechanical factors from FSI model were extracted from 45 slices at baseline for analysis. Lipid percentage index (LPI), cap thickness index (CTI) and morphological plaque vulnerability index (MPVI) were quantified to measure plaque vulnerability. Four machine learning methods (least square support vector machine, discriminant analysis, random forest and ensemble learning) were employed to predict the changes of three indices using all combinations of 13 factors. A standard fivefold cross-validation procedure was used to evaluate prediction results. Results For LPI change prediction using support vector machine, wall thickness was the optimal single-factor predictor with area under curve (AUC) 0.883 and the AUC of optimal combinational-factor predictor achieved 0.963. For CTI change prediction using discriminant analysis, minimum cap thickness was the optimal single-factor predictor with AUC 0.818 while optimal combinational-factor predictor achieved an AUC 0.836. Using random forest for predicting MPVI change, minimum cap thickness was the optimal single-factor predictor with AUC 0.785 and the AUC of optimal combinational-factor predictor achieved 0.847. Conclusion This feasibility study demonstrated that machine learning methods could be used to accurately predict plaque vulnerability change based on morphological and biomechanical factors from multi-modality image-based FSI models. Large-scale studies are needed to verify our findings.

Download Full-text

Advances in Blast-Induced Impact Prediction—A Review of Machine Learning Applications

Minerals ◽

10.3390/min11060601 ◽

2021 ◽

Vol 11 (6) ◽

pp. 601

Author(s):

Nelson K. Dumakor-Dupey ◽

Sampurna Arya ◽

Ankit Jha

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Academic Research ◽

Empirical Models ◽

Rock Breakage ◽

Environmental Implications ◽

Learning Methods ◽

Factors Affecting ◽

Impact Prediction ◽

Machine Learning Methods

Rock fragmentation in mining and construction industries is widely achieved using drilling and blasting technique. The technique remains the most effective and efficient means of breaking down rock mass into smaller pieces. However, apart from its intended purpose of rock breakage, throw, and heave, blasting operations generate adverse impacts, such as ground vibration, airblast, flyrock, fumes, and noise, that have significant operational and environmental implications on mining activities. Consequently, blast impact studies are conducted to determine an optimum blast design that can maximize the desirable impacts and minimize the undesirable ones. To achieve this objective, several blast impact estimation empirical models have been developed. However, despite being the industry benchmark, empirical model results are based on a limited number of factors affecting the outcomes of a blast. As a result, modern-day researchers are employing machine learning (ML) techniques for blast impact prediction. The ML approach can incorporate several factors affecting the outcomes of a blast, and therefore, it is preferred over empirical and other statistical methods. This paper reviews the various blast impacts and their prediction models with a focus on empirical and machine learning methods. The details of the prediction methods for various blast impacts—including their applications, advantages, and limitations—are discussed. The literature reveals that the machine learning methods are better predictors compared to the empirical models. However, we observed that presently these ML models are mainly applied in academic research.

Download Full-text

Development and Validation of Unplanned Extubation Prediction Models Using Intensive Care Unit Data: Comparative Machine Learning Study (Preprint)

10.2196/preprints.23508 ◽

2020 ◽

Author(s):

Sujeong Hur ◽

Ji Young Min ◽

Junsang Yoo ◽

Kyunga Kim ◽

Chi Ryang Chung ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Patient Safety ◽

Intensive Care ◽

Predictive Value ◽

Prediction Models ◽

Support Vector ◽

Unplanned Extubation ◽

Electronic Health Record Data ◽

Icu Patients

BACKGROUND Patient safety in the intensive care unit (ICU) is one of the most critical issues, and unplanned extubation (UE) is considered as the most adverse event for patient safety. Prevention and early detection of such an event is an essential but difficult component of quality care. OBJECTIVE This study aimed to develop and validate prediction models for UE in ICU patients using machine learning. METHODS This study was conducted an academic tertiary hospital in Seoul. The hospital had approximately 2,000 inpatient beds and 120 intensive care unit (ICU) beds. The number of patients, on daily basis, was approximately 9,000 for the out-patient. The number of annual ICU admission was approximately 10,000. We conducted a retrospective study between January 1, 2010 and December 31, 2018. A total of 6,914 extubation cases were included. We developed an unplanned extubation prediction model using machine learning algorithms, which included random forest (RF), logistic regression (LR), artificial neural network (ANN), and support vector machine (SVM). For evaluating the model’s performance, we used area under the receiver operator characteristic curve (AUROC). Sensitivity, specificity, positive predictive value negative predictive value, and F1-score were also determined for each model. For performance evaluation, we also used calibration curve, the Brier score, and the Hosmer-Lemeshow goodness-of-fit statistic. RESULTS Among the 6,914 extubation cases, 248 underwent UE. In the UE group, there were more males than females, higher use of physical restraints, and fewer surgeries. The incidence of UE was more likely to occur during the night shift compared to the planned extubation group. The rate of reintubation within 24 hours and hospital mortality was higher in the UE group. The UE prediction algorithm was developed, and the AUROC for RF was 0.787, for LR was 0.762, for ANN was 0.762, and for SVM was 0.740. CONCLUSIONS We successfully developed and validated machine learning-based prediction models to predict UE in ICU patients using electronic health record data. The best AUROC was 0.787, which was obtained using RF. CLINICALTRIAL N/A

Download Full-text