Role for machine learning in sex-specific prediction of successful electrical cardioversion in atrial fibrillation?

ObjectiveElectrical cardioversion is frequently performed to restore sinus rhythm in patients with persistent atrial fibrillation (AF). However, AF recurs in many patients and identifying the patients who benefit from electrical cardioversion is difficult. The objective was to develop sex-specific prediction models for successful electrical cardioversion and assess the potential of machine learning methods in comparison with traditional logistic regression.MethodsIn a retrospective cohort study, we examined several candidate predictors, including comorbidities, biochemistry, echocardiographic data, and medication. The outcome was successful cardioversion, defined as normal sinus rhythm immediately after the electrical cardioversion and no documented recurrence of AF within 3 months after. We used random forest and logistic regression models for sex-specific prediction.ResultsThe cohort comprised 332 female and 790 male patients with persistent AF who underwent electrical cardioversion. Cardioversion was successful in 44.9% of the women and 49.9% of the men. The prediction errors of the models were high for both women (41.0% for machine learning and 48.8% for logistic regression) and men (46.0% for machine learning and 44.8% for logistic regression). Discrimination was modest for both machine learning (0.59 for women and 0.56 for men) and logistic regression models (0.60 for women and 0.59 for men), although the models were well calibrated.ConclusionsSex-specific machine learning and logistic regression models showed modest predictive performance for successful electrical cardioversion. Identifying patients who will benefit from cardioversion remains challenging in clinical practice. The high recurrence rate calls for thoroughly informed shared decision-making for electrical cardioversion.

Download Full-text

45 Application of Machine Learning Models to Thermal Burn Patient Outcome Predictions in the Aftermath of a Nuclear Event

Journal of Burn Care & Research ◽

10.1093/jbcr/irab032.049 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

pp. S33-S34

Author(s):

Morgan A Taylor ◽

Randy D Kearns ◽

Jeffrey E Carter ◽

Mark H Ebell ◽

Curt A Harris

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Length Of Stay ◽

Regression Models ◽

Large Scale ◽

Prediction Models ◽

Burn Patients ◽

Thermal Burn ◽

Logistic Regression Models ◽

Burn Patient

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.

Download Full-text

Clinical Classifiers to Identify Ascending Aortic Dilatation in Patients With Bicuspid Versus Tricuspid Aortic Valves

10.21203/rs.3.rs-957446/v1 ◽

2021 ◽

Author(s):

Bamba Gaye ◽

Maxime Vignac ◽

Jesper R. Gådin ◽

Magalie Ladouceur ◽

Kenneth Caidahl ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Aortic Valve ◽

Regression Models ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

Aortic Dilatation ◽

Logistic Regression Models

Abstract Objective: We aimed to develop clinical classifiers to identify prevalent ascending aortic dilatation in patients with BAV and tricuspid aortic valve (TAV). Methods: This study included BAV (n=543) and TAV (n=491) patients with aortic valve disease and/or ascending aortic dilatation but devoid of coronary artery disease undergoing cardiothoracic surgery. We applied machine learning algorithms and classic logistic regression models, using multiple variable selection methodologies to identify predictors of high risk of ascending aortic dilatation (ascending aorta with a diameter above 40 mm). Analyses included comprehensive multidimensional data (i.e., valve morphology, clinical data, family history of cardiovascular diseases, prevalent diseases, demographic, lifestyle and medication). Results: BAV patients were younger (60.4±12.4 years) than TAV patients (70.4±9.1 years), and had a higher frequency of aortic dilatation (45.3% vs. 28.9% for BAV and TAV, respectively. P<0.001). The aneurysm prediction models showed mean AUC values above 0.8 for TAV patients, with the absence of aortic stenosis being the main predictor, followed by diabetes and high sensitivity C-Reactive Protein. Using the same clinical measures in BAV patients our prediction model resulted in AUC values between 0.5-0.55, not useful for prediction of aortic dilatation. The classification results were consistent for all machine learning algorithms and classic logistic regression models. Conclusions: Cardiovascular risk profiles appear to be more predictive of aortopathy in TAV patients than in patients with BAV. This adds evidence to the fact that BAV- and TAV-associated aortopathy involve different pathways to aneurysm formation and highlights the need for specific aneurysm preventions in these patients. Further, our results highlight that machine learning approaches do not outperform classical prediction methods in addressing complex interactions and non-linear relations between variables.

Download Full-text

Machine Learning Models Have Better Performance than Traditional Logistic Regression Models in Predicting the Risk of Diabetes

SSRN Electronic Journal ◽

10.2139/ssrn.3854672 ◽

2021 ◽

Author(s):

Yaqian Mao ◽

Shuyao Pan ◽

Zheng Zhu ◽

Wei Lin ◽

Junping Wen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Learning Models ◽

Logistic Regression Models ◽

Machine Learning Models

Download Full-text

Ambient air pollution exposure and risk and progression of interstitial lung abnormalities: the Framingham Heart Study

Thorax ◽

10.1136/thoraxjnl-2018-212877 ◽

2019 ◽

Vol 74 (11) ◽

pp. 1063-1069 ◽

Cited By ~ 8

Author(s):

Mary B Rice ◽

Wenyuan Li ◽

Joel Schwartz ◽

Qian Di ◽

Itai Kloog ◽

...

Keyword(s):

Air Pollution ◽

Logistic Regression ◽

Regression Models ◽

Cardiac Ct ◽

Prediction Models ◽

Ambient Air ◽

Chest Ct ◽

Ambient Air Pollution ◽

Logistic Regression Models ◽

Lung Abnormalities

BackgroundAmbient air pollution accelerates lung function decline among adults, however, there are limited data about its role in the development and progression of early stages of interstitial lung disease.AimsTo evaluate associations of long-term exposure to traffic and ambient pollutants with odds of interstitial lung abnormalities (ILA) and progression of ILA on repeated imaging.MethodsWe ascertained ILA on chest CT obtained from 2618 Framingham participants from 2008 to 2011. Among 1846 participants who also completed a cardiac CT from 2002 to 2005, we determined interval ILA progression. We assigned distance from home address to major roadway, and the 5-year average of fine particulate matter (PM2.5), elemental carbon (EC, a traffic-related PM2.5 constituent) and ozone using spatio-temporal prediction models. Logistic regression models were adjusted for age, sex, body mass index, smoking status, packyears of smoking, household tobacco exposure, neighbourhood household value, primary occupation, cohort and date.ResultsAmong 2618 participants with a chest CT, 176 (6.7%) had ILA, 1361 (52.0%) had no ILA, and the remainder were indeterminate. Among 1846 with a preceding cardiac CT, 118 (6.4%) had ILA with interval progression. In adjusted logistic regression models, an IQR difference in 5-year EC exposure of 0.14 µg/m3 was associated with a 1.27 (95% CI 1.04 to 1.55) times greater odds of ILA, and a 1.33 (95% CI 1.00 to 1.76) times greater odds of ILA progression. PM2.5 and O3 were not associated with ILA or ILA progression.ConclusionsExposure to EC may increase risk of progressive ILA, however, associations with other measures of ambient pollution were inconclusive.

Download Full-text

Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods

JMIR Medical Informatics ◽

10.2196/15601 ◽

2019 ◽

Vol 7 (4) ◽

pp. e15601 ◽

Cited By ~ 1

Author(s):

Quazi Abidur Rahman ◽

Tahir Janmohamed ◽

Hance Clarke ◽

Paul Ritvo ◽

Jane Heffernan ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Feature Selection ◽

Random Forests ◽

Prediction Models ◽

Class Imbalance ◽

Majority Voting ◽

Selection Methods ◽

Logistic Regression Models ◽

High Volatility

Background Pain volatility is an important factor in chronic pain experience and adaptation. Previously, we employed machine-learning methods to define and predict pain volatility levels from users of the Manage My Pain app. Reducing the number of features is important to help increase interpretability of such prediction models. Prediction results also need to be consolidated from multiple random subsamples to address the class imbalance issue. Objective This study aimed to: (1) increase the interpretability of previously developed pain volatility models by identifying the most important features that distinguish high from low volatility users; and (2) consolidate prediction results from models derived from multiple random subsamples while addressing the class imbalance issue. Methods A total of 132 features were extracted from the first month of app use to develop machine learning–based models for predicting pain volatility at the sixth month of app use. Three feature selection methods were applied to identify features that were significantly better predictors than other members of the large features set used for developing the prediction models: (1) Gini impurity criterion; (2) information gain criterion; and (3) Boruta. We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records. Results A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model. Conclusions We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy.

Download Full-text

Machine Learning Diagnostic Modeling for Classifying Fibromyalgia Using B-mode Ultrasound Images

Ultrasonic Imaging ◽

10.1177/0161734620908789 ◽

2020 ◽

Vol 42 (3) ◽

pp. 135-147 ◽

Cited By ~ 1

Author(s):

Michael Behr ◽

Saba Saiel ◽

Valerie Evans ◽

Dinesh Kumbhare

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Trapezius Muscle ◽

Image Texture ◽

Learning Models ◽

Test Set ◽

Logistic Regression Models ◽

Performance Accuracy ◽

Machine Learning Models

Fibromyalgia (FM) diagnosis remains a challenge for clinicians due to a lack of objective diagnostic tools. One proposed solution is the use of quantitative ultrasound (US) techniques, such as image texture analysis, which has demonstrated discriminatory capabilities with other chronic pain conditions. From this, we propose the use of image texture variables to construct and compare two machine learning models (support vector machine [SVM] and logistic regression) for differentiating between the trapezius muscle in healthy and FM patients. US videos of the right and left trapezius muscle were acquired from healthy ( n = 51) participants and those with FM ( n = 57). The videos were converted into 64,800 skeletal muscle regions of interest (ROIs) using MATLAB. The ROIs were filtered by an algorithm using the complex wavelet structural similarity index (CW-SSIM), which removed ROIs that were similar. Thirty-one texture variables were extracted from the ROIs, which were then used in nested cross-validation to construct SVM and elastic net regularized logistic regression models. The generalized performance accuracy of both models was estimated and confirmed with a final validation on a holdout test set. The predicted generalized performance accuracy of the SVM and logistic regression models was computed to be 83.9 ± 2.6% and 65.8 ± 1.7%, respectively. The models achieved accuracies of 84.1%, and 66.0% on the final holdout test set, validating performance estimates. Although both machine learning models differentiate between healthy trapezius muscle and that of patients with FM, only the SVM model demonstrated clinically relevant performance levels.

Download Full-text

Machine learning techniques to derive bioclimatic classifications for Colombia

10.1101/2021.09.05.459033 ◽

2021 ◽

Author(s):

Richard Rios ◽

Elkin A. Noguera-Urbano ◽

Jairo Espinosa ◽

Jose Manuael Ochoa

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Logistic Regression ◽

Regression Models ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Techniques ◽

Study Region ◽

Logistic Regression Models ◽

Learning Techniques

Bioclimatic classifications seek to divide a study region into geographic areas with similar bioclimatic characteristics. In this study we proposed two bioclimatic classifications for Colombia using machine learning techniques. We firstly characterized the precipitation space of Colombia using principal component analysis. Based on Lang classification, we then projected all background sites in the precipitation space with their corresponding categories. We sequentially fit logistic regression models to re-classify all background sites in the precipitation space with six redefined Lang categories. New categories were the used to define a new modified Lang and Caldas-Lang classifications.

Download Full-text

Using self-supervised feature learning to improve the use of pulse oximeter signals to predict paediatric hospitalization

Wellcome Open Research ◽

10.12688/wellcomeopenres.17148.1 ◽

2021 ◽

Vol 6 ◽

pp. 248

Author(s):

Paul Mwaniki ◽

Timothy Kamanu ◽

Samuel Akech ◽

Dustin Dunsmuir ◽

J. Mark Ansermino ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Pulse Oximeter ◽

Regression Models ◽

Data Sets ◽

Learning Models ◽

Data Set ◽

Logistic Regression Models ◽

End To End

Background: The success of many machine learning applications depends on knowledge about the relationship between the input data and the task of interest (output), hindering the application of machine learning to novel tasks. End-to-end deep learning, which does not require intermediate feature engineering, has been recommended to overcome this challenge but end-to-end deep learning models require large labelled training data sets often unavailable in many medical applications. In this study, we trained machine learning models to predict paediatric hospitalization given raw photoplethysmography (PPG) signals obtained from a pulse oximeter. We trained self-supervised learning (SSL) for automatic feature extraction from PPG signals and assessed the utility of SSL in initializing end-to-end deep learning models trained on a small labelled data set with the aim of predicting paediatric hospitalization.Methods: We compared logistic regression models fitted using features extracted using SSL with end-to-end deep learning models initialized either randomly or using weights from the SSL model. We also compared the performance of SSL models trained on labelled data alone (n=1,031) with SSL trained using both labelled and unlabelled signals (n=7,578). Results: The SSL model trained on both labelled and unlabelled PPG signals produced features that were more predictive of hospitalization compared to the SSL model trained on labelled PPG only (AUC of logistic regression model: 0.78 vs 0.74). The end-to-end deep learning model had an AUC of 0.80 when initialized using the SSL model trained on all PPG signals, 0.77 when initialized using SSL trained on labelled data only, and 0.73 when initialized randomly. Conclusions: This study shows that SSL can improve the classification of PPG signals by either extracting features required by logistic regression models or initializing end-to-end deep learning models. Furthermore, SSL can leverage larger unlabelled data sets to improve performance of models fitted using small labelled data sets.

Download Full-text

Analyzing injury severity of motorcycle at-fault crashes using machine learning techniques, decision tree and logistic regression models

International Journal of Transportation Science and Technology ◽

10.1016/j.ijtst.2019.10.002 ◽

2020 ◽

Vol 9 (2) ◽

pp. 89-99 ◽

Cited By ~ 7

Author(s):

Mahdi Rezapour ◽

Amirarsalan Mehrara Molan ◽

Khaled Ksaibati

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Regression Models ◽

Injury Severity ◽

Machine Learning Techniques ◽

Logistic Regression Models ◽

Learning Techniques

Download Full-text

Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study

Journal of Diabetes Research ◽

10.1155/2020/4168340 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Yunzhen Ye ◽

Yu Xiong ◽

Qiongjie Zhou ◽

Jiangnan Wu ◽

Xiaotian Li ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Gestational Diabetes ◽

Early Pregnancy ◽

Predictive Value ◽

Regression Models ◽

Learning Methods ◽

Logistic Regression Models ◽

Machine Learning Methods ◽

Logistic Regressions

Background. Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression. Objective. The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions. Methods. We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down’s syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics. Results. In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%). Conclusion. In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM.

Download Full-text