Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study

Background. Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression. Objective. The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions. Methods. We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down’s syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics. Results. In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%). Conclusion. In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM.

Download Full-text

45 Application of Machine Learning Models to Thermal Burn Patient Outcome Predictions in the Aftermath of a Nuclear Event

Journal of Burn Care & Research ◽

10.1093/jbcr/irab032.049 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

pp. S33-S34

Author(s):

Morgan A Taylor ◽

Randy D Kearns ◽

Jeffrey E Carter ◽

Mark H Ebell ◽

Curt A Harris

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Length Of Stay ◽

Regression Models ◽

Large Scale ◽

Prediction Models ◽

Burn Patients ◽

Thermal Burn ◽

Logistic Regression Models ◽

Burn Patient

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.

Download Full-text

Discovery of metabolic biomarkers for gestational diabetes mellitus in a Chinese population

10.21203/rs.3.rs-109665/v2 ◽

2021 ◽

Author(s):

Wenqian Lu ◽

Mingjuan Luo ◽

Xiangnan Fang ◽

Rong Zhang ◽

Mengyang Tang ◽

...

Keyword(s):

Diabetes Mellitus ◽

Logistic Regression ◽

Gestational Diabetes ◽

Pregnant Women ◽

Regression Models ◽

Second Trimester ◽

Third Trimester ◽

Logistic Regression Models ◽

The Third ◽

Clinical Indices

Abstract Background: Gestational diabetes mellitus (GDM), one of the most common pregnancy complications, can lead to morbidity and mortality in both the mother and the infant. Metabolomics has provided new insights into the pathology of GDM and systemic analysis of GDM with metabolites is required for providing more clues for GDM diagnosis and mechanism research. This study aims to reveal metabolic differences between normal pregnant women and GDM patients in the second- and third-trimester stages and to confirm the clinical relevance of these new findings.Methods: Metabolites were quantitated with the serum samples of 200 healthy pregnant women and 200 GDM women in the second trimester, 199 normal controls, and 199 GDM patients in the third trimester. Both function and pathway analyses were applied to explore biological roles involved in the two sets of metabolites. Then the trimester stage-specific GDM metabolite biomarkers were identified by combining machine learning approaches, and the logistic regression models were constructed to evaluate predictive efficiency. Finally, the weighted gene co-expression network analysis method was used to further capture the associations between metabolite modules with biomarkers and clinical indices. Results: This study revealed that 57 differentially expressed metabolites (DEMs) were discovered in the second-trimester group, among which the most significant one was 3-methyl-2-oxovaleric acid. Similarly, 72 DEMs were found in the third-trimester group, and the most significant metabolites were ketoleucine and alpha-ketoisovaleric acid. These DEMs were mainly involved in the metabolism pathway of amino acids, fatty acids and bile acids. The logistic regression models for selected metabolite biomarkers achieved the area under the curve values of 0.807 and 0.81 for the second- and third-trimester groups. Furthermore, significant associations were found between DEMs/biomarkers and GDM-related indices. Conclusions: Metabolic differences between healthy pregnant women and GDM patients were found. Associations between biomarkers and clinical indices were also investigated, which may provide insights into pathology of GDM.

Download Full-text

Machine Learning Models Have Better Performance than Traditional Logistic Regression Models in Predicting the Risk of Diabetes

SSRN Electronic Journal ◽

10.2139/ssrn.3854672 ◽

2021 ◽

Author(s):

Yaqian Mao ◽

Shuyao Pan ◽

Zheng Zhu ◽

Wei Lin ◽

Junping Wen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Learning Models ◽

Logistic Regression Models ◽

Machine Learning Models

Download Full-text

Machine learning methods for soil moisture prediction in vineyards using digital images

E3S Web of Conferences ◽

10.1051/e3sconf/202016702004 ◽

2020 ◽

Vol 167 ◽

pp. 02004

Author(s):

Chantal Saad Hajjar ◽

Celine Hajjar ◽

Michel Esta ◽

Yolla Ghorra Chamoun

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Regression Models ◽

Digital Images ◽

Pearson Correlation ◽

Support Vector ◽

Learning Methods ◽

Nonlinear Regression Models ◽

Machine Learning Methods ◽

The One

In this paper, we propose to estimate the moisture of vineyard soils from digital photography using machine learning methods. Two nonlinear regression models are implemented: a multilayer perceptron (MLP) and a support vector regression (SVR). Pixels coded with RGB colour model extracted from soil digital images along with the associated known soil moisture levels are used to train both models in order to predict moisture content from newly acquired images. The study is conducted on samples of six soil types collected from Chateau Kefraya terroirs in Lebanon. Both methods succeeded in forecasting moisture giving high correlation values between the measured moisture and the predicted moisture when tested on unknown data. However, the method based on SVR outperformed the one based on MLP yielding Pearson correlation coefficient values ranging from 0.89 to 0.99. Moreover, it is a simple and noninvasive method that can be adopted easily to detect vineyards soil moisture.

Download Full-text

Discovery of metabolic biomarkers for gestational diabetes mellitus in a Chinese population

Nutrition & Metabolism ◽

10.1186/s12986-021-00606-8 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Wenqian Lu ◽

Mingjuan Luo ◽

Xiangnan Fang ◽

Rong Zhang ◽

Shanshan Li ◽

...

Keyword(s):

Diabetes Mellitus ◽

Logistic Regression ◽

Gestational Diabetes ◽

Pregnant Women ◽

Regression Models ◽

Second Trimester ◽

Third Trimester ◽

Logistic Regression Models ◽

The Third ◽

Clinical Indices

Abstract Background Gestational diabetes mellitus (GDM), one of the most common pregnancy complications, can lead to morbidity and mortality in both the mother and the infant. Metabolomics has provided new insights into the pathology of GDM and systemic analysis of GDM with metabolites is required for providing more clues for GDM diagnosis and mechanism research. This study aims to reveal metabolic differences between normal pregnant women and GDM patients in the second- and third-trimester stages and to confirm the clinical relevance of these new findings. Methods Metabolites were quantitated with the serum samples of 200 healthy pregnant women and 200 GDM women in the second trimester, 199 normal controls, and 199 GDM patients in the third trimester. Both function and pathway analyses were applied to explore biological roles involved in the two sets of metabolites. Then the trimester stage-specific GDM metabolite biomarkers were identified by combining machine learning approaches, and the logistic regression models were constructed to evaluate predictive efficiency. Finally, the weighted gene co-expression network analysis method was used to further capture the associations between metabolite modules with biomarkers and clinical indices. Results This study revealed that 57 differentially expressed metabolites (DEMs) were discovered in the second-trimester group, among which the most significant one was 3-methyl-2-oxovaleric acid. Similarly, 72 DEMs were found in the third-trimester group, and the most significant metabolites were ketoleucine and alpha-ketoisovaleric acid. These DEMs were mainly involved in the metabolism pathway of amino acids, fatty acids and bile acids. The logistic regression models for selected metabolite biomarkers achieved the area under the curve values of 0.807 and 0.81 for the second- and third-trimester groups. Furthermore, significant associations were found between DEMs/biomarkers and GDM-related indices. Conclusions Metabolic differences between healthy pregnant women and GDM patients were found. Associations between biomarkers and clinical indices were also investigated, which may provide insights into pathology of GDM.

Download Full-text

Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed

Large-scale Assessments in Education ◽

10.1186/s40536-021-00111-7 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Gulsah Gurkan ◽

Yoav Benjamini ◽

Henry Braun

Keyword(s):

Logistic Regression ◽

Cognitive Skills ◽

Regression Models ◽

Large Scale ◽

Family Background ◽

National Income ◽

Logistic Models ◽

Logistic Regression Models ◽

Logistic Regressions ◽

The Impact

AbstractEmploying nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models are generally biased due to the changes in scale that accompany the changes in the set of explanatory variables, (ii) conducting a large number of tests induces a problem of multiplicity that can lead to spurious findings of significance if not heeded. This article aims to illustrate a practical strategy for conducting analyses in the face of these challenges. The challenges—and how to address them—are illustrated using a subset of the findings reported by Braun (Large-scale Assess Educ 6(4):1–52, 2018. 10.1186/s40536-018-0058-x), drawn from the Programme for the International Assessment of Adult Competencies (PIAAC), an international, large-scale assessment of adults. For each country in the dataset, a nested pair of logistic regression models was fit in order to investigate the role of Educational Attainment and Cognitive Skills in mediating the impact of family background and demographic characteristics on the location of an individual’s annual income in the national income distribution. A modified version of the Karlson–Holm–Breen (KHB) method was employed to obtain an unbiased estimate of the true differences in the coefficients between nested logistic models. In order to address the issue of multiplicity, a recent generalization of the Benjamini–Hochberg (BH) False Discovery Rate (FDR)-controlling procedure to hierarchically structured hypotheses was employed and compared to two conventional methods. The differences between the changes in coefficients calculated conventionally and with the KHB adjustment varied from negligible to very substantial. When combined with the actual magnitudes of the coefficients, we concluded that the more proximal factors indeed act as strong mediators for the background factors, but less so for Age, and hardly at all for Gender. With respect to multiplicity, applying the FDR-controlling procedure yielded results very similar to those obtained by applying a standard per-comparison procedure, but quite a few more discoveries in comparison to the Bonferroni procedure. The KHB methodology illustrated here can be applied wherever there is interest in comparing nested logistic regressions. Modifications to account for probability sampling are practicable. The categorization of variables and the order of entry should be determined by substantive considerations. On the other hand, the BH procedure is perfectly general and can be implemented to address multiplicity issues in a broad range of settings.

Download Full-text

A comparison of logistic regression models with alternative machine learning methods to predict the risk of in-hospital mortality in emergency medical admissions via external validation

Health Informatics Journal ◽

10.1177/1460458218813600 ◽

2018 ◽

Vol 26 (1) ◽

pp. 34-44 ◽

Cited By ~ 1

Author(s):

Muhammad Faisal ◽

Andy Scally ◽

Robin Howes ◽

Kevin Beatson ◽

Donald Richardson ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Hospital Mortality ◽

Receiver Operating Characteristic Curve ◽

Operating Characteristic ◽

Characteristic Curve ◽

External Validation ◽

Learning Methods ◽

Machine Learning Methods ◽

Operating Characteristic Curve

We compare the performance of logistic regression with several alternative machine learning methods to estimate the risk of death for patients following an emergency admission to hospital based on the patients’ first blood test results and physiological measurements using an external validation approach. We trained and tested each model using data from one hospital ( n = 24,696) and compared the performance of these models in data from another hospital ( n = 13,477). We used two performance measures – the calibration slope and area under the receiver operating characteristic curve. The logistic model performed reasonably well – calibration slope: 0.90, area under the receiver operating characteristic curve: 0.847 compared to the other machine learning methods. Given the complexity of choosing tuning parameters of these methods, the performance of logistic regression with transformations for in-hospital mortality prediction was competitive with the best performing alternative machine learning methods with no evidence of overfitting.

Download Full-text

Machine Learning Diagnostic Modeling for Classifying Fibromyalgia Using B-mode Ultrasound Images

Ultrasonic Imaging ◽

10.1177/0161734620908789 ◽

2020 ◽

Vol 42 (3) ◽

pp. 135-147 ◽

Cited By ~ 1

Author(s):

Michael Behr ◽

Saba Saiel ◽

Valerie Evans ◽

Dinesh Kumbhare

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Trapezius Muscle ◽

Image Texture ◽

Learning Models ◽

Test Set ◽

Logistic Regression Models ◽

Performance Accuracy ◽

Machine Learning Models

Fibromyalgia (FM) diagnosis remains a challenge for clinicians due to a lack of objective diagnostic tools. One proposed solution is the use of quantitative ultrasound (US) techniques, such as image texture analysis, which has demonstrated discriminatory capabilities with other chronic pain conditions. From this, we propose the use of image texture variables to construct and compare two machine learning models (support vector machine [SVM] and logistic regression) for differentiating between the trapezius muscle in healthy and FM patients. US videos of the right and left trapezius muscle were acquired from healthy ( n = 51) participants and those with FM ( n = 57). The videos were converted into 64,800 skeletal muscle regions of interest (ROIs) using MATLAB. The ROIs were filtered by an algorithm using the complex wavelet structural similarity index (CW-SSIM), which removed ROIs that were similar. Thirty-one texture variables were extracted from the ROIs, which were then used in nested cross-validation to construct SVM and elastic net regularized logistic regression models. The generalized performance accuracy of both models was estimated and confirmed with a final validation on a holdout test set. The predicted generalized performance accuracy of the SVM and logistic regression models was computed to be 83.9 ± 2.6% and 65.8 ± 1.7%, respectively. The models achieved accuracies of 84.1%, and 66.0% on the final holdout test set, validating performance estimates. Although both machine learning models differentiate between healthy trapezius muscle and that of patients with FM, only the SVM model demonstrated clinically relevant performance levels.

Download Full-text

Statistical and Machine Learning Methods for Software Fault Prediction Using CK Metric Suite: A Comparative Analysis

ISRN Software Engineering ◽

10.1155/2014/251083 ◽

2014 ◽

Vol 2014 ◽

pp. 1-15 ◽

Cited By ~ 10

Author(s):

Yeresime Suresh ◽

Lov Kumar ◽

Santanu Ku. Rath

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Linear Regression ◽

Object Oriented ◽

Fault Prediction ◽

Learning Methods ◽

Software Fault Prediction ◽

Machine Learning Methods ◽

Software Fault

Experimental validation of software metrics in fault prediction for object-oriented methods using statistical and machine learning methods is necessary. By the process of validation the quality of software product in a software organization is ensured. Object-oriented metrics play a crucial role in predicting faults. This paper examines the application of linear regression, logistic regression, and artificial neural network methods for software fault prediction using Chidamber and Kemerer (CK) metrics. Here, fault is considered as dependent variable and CK metric suite as independent variables. Statistical methods such as linear regression, logistic regression, and machine learning methods such as neural network (and its different forms) are being applied for detecting faults associated with the classes. The comparison approach was applied for a case study, that is, Apache integration framework (AIF) version 1.6. The analysis highlights the significance of weighted method per class (WMC) metric for fault classification, and also the analysis shows that the hybrid approach of radial basis function network obtained better fault prediction rate when compared with other three neural network models.

Download Full-text