Prediction of delayed graft function after kidney transplantation: comparison between logistic regression and machine learning methods

We compare the performance of logistic regression with several alternative machine learning methods to estimate the risk of death for patients following an emergency admission to hospital based on the patients’ first blood test results and physiological measurements using an external validation approach. We trained and tested each model using data from one hospital ( n = 24,696) and compared the performance of these models in data from another hospital ( n = 13,477). We used two performance measures – the calibration slope and area under the receiver operating characteristic curve. The logistic model performed reasonably well – calibration slope: 0.90, area under the receiver operating characteristic curve: 0.847 compared to the other machine learning methods. Given the complexity of choosing tuning parameters of these methods, the performance of logistic regression with transformations for in-hospital mortality prediction was competitive with the best performing alternative machine learning methods with no evidence of overfitting.

Download Full-text

Statistical and Machine Learning Methods for Software Fault Prediction Using CK Metric Suite: A Comparative Analysis

ISRN Software Engineering ◽

10.1155/2014/251083 ◽

2014 ◽

Vol 2014 ◽

pp. 1-15 ◽

Cited By ~ 10

Author(s):

Yeresime Suresh ◽

Lov Kumar ◽

Santanu Ku. Rath

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Linear Regression ◽

Object Oriented ◽

Fault Prediction ◽

Learning Methods ◽

Software Fault Prediction ◽

Machine Learning Methods ◽

Software Fault

Experimental validation of software metrics in fault prediction for object-oriented methods using statistical and machine learning methods is necessary. By the process of validation the quality of software product in a software organization is ensured. Object-oriented metrics play a crucial role in predicting faults. This paper examines the application of linear regression, logistic regression, and artificial neural network methods for software fault prediction using Chidamber and Kemerer (CK) metrics. Here, fault is considered as dependent variable and CK metric suite as independent variables. Statistical methods such as linear regression, logistic regression, and machine learning methods such as neural network (and its different forms) are being applied for detecting faults associated with the classes. The comparison approach was applied for a case study, that is, Apache integration framework (AIF) version 1.6. The analysis highlights the significance of weighted method per class (WMC) metric for fault classification, and also the analysis shows that the hybrid approach of radial basis function network obtained better fault prediction rate when compared with other three neural network models.

Download Full-text

ID: 3520519 LOGISTIC REGRESSION AND MACHINE LEARNING METHODS PREDICT CHOLEDOCHOLITHIASIS MORE ACCURATELY COMPARED TO CURRENT CRITERIA

Gastrointestinal Endoscopy ◽

10.1016/j.gie.2021.03.972 ◽

2021 ◽

Vol 93 (6) ◽

pp. AB145-AB146

Author(s):

John M. Azizian ◽

Camellia Dalai ◽

Harry Trieu ◽

Anand Rajan ◽

James H. Tabibian

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Can valid and practical risk-prediction or casemix adjustment models, including adjustment for comorbidity, be generated from English hospital administrative data (Hospital Episode Statistics)? A national observational study

Health Services and Delivery Research ◽

10.3310/hsdr02400 ◽

2014 ◽

Vol 2 (40) ◽

pp. 1-48 ◽

Cited By ~ 12

Author(s):

Alex Bottle ◽

Rene Gaudoin ◽

Rosalind Goudie ◽

Simon Jones ◽

Paul Aylin

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Risk Prediction ◽

Administrative Data ◽

Hospital Episode Statistics ◽

Readmission Rates ◽

Learning Methods ◽

Machine Learning Methods ◽

Outpatient Appointments ◽

Casemix Adjustment

BackgroundNHS hospitals collect a wealth of administrative data covering accident and emergency (A&E) department attendances, inpatient and day case activity, and outpatient appointments. Such data are increasingly being used to compare units and services, but adjusting for risk is difficult.ObjectivesTo derive robust risk-adjustment models for various patient groups, including those admitted for heart failure (HF), acute myocardial infarction, colorectal and orthopaedic surgery, and outcomes adjusting for available patient factors such as comorbidity, using England’s Hospital Episode Statistics (HES) data. To assess if more sophisticated statistical methods based on machine learning such as artificial neural networks (ANNs) outperform traditional logistic regression (LR) for risk prediction. To update and assess for the NHS the Charlson index for comorbidity. To assess the usefulness of outpatient data for these models.Main outcome measuresMortality, readmission, return to theatre, outpatient non-attendance. For HF patients we considered various readmission measures such as diagnosis-specific and total within a year.MethodsWe systematically reviewed studies comparing two or more comorbidity indices. Logistic regression, ANNs, support vector machines and random forests were compared for mortality and readmission. Models were assessed using discrimination and calibration statistics. Competing risks proportional hazards regression and various count models were used for future admissions and bed-days.ResultsOur systematic review and empirical analysis suggested that for general purposes comorbidity is currently best described by the set of 30 Elixhauser comorbidities plus dementia. Model discrimination was often high for mortality and poor, or at best moderate, for other outcomes, for examplec = 0.62 for readmission andc = 0.73 for death following stroke. Calibration was often good for procedure groups but poorer for diagnosis groups, with overprediction of low risk a common cause. The machine learning methods we investigated offered little beyond LR for their greater complexity and implementation difficulties. For HF, some patient-level predictors differed by primary diagnosis of readmission but not by length of follow-up. Prior non-attendance at outpatient appointments was a useful, strong predictor of readmission. Hospital-level readmission rates for HF did not correlate with readmission rates for non-HF; hospital performance on national audit process measures largely correlated only with HF readmission rates.ConclusionsMany practical risk-prediction or casemix adjustment models can be generated from HES data using LR, though an extra step is often required for accurate calibration. Including outpatient data in readmission models is useful. The three machine learning methods we assessed added little with these data. Readmission rates for HF patients should be divided by diagnosis on readmission when used for quality improvement.Future workAs HES data continue to develop and improve in scope and accuracy, they can be used more, for instance A&E records. The return to theatre metric appears promising and could be extended to other index procedures and specialties. While our data did not warrant the testing of a larger number of machine learning methods, databases augmented with physiological and pathology information, for example, might benefit from methods such as boosted trees. Finally, one could apply the HF readmissions analysis to other chronic conditions.FundingThe National Institute for Health Research Health Services and Delivery Research programme.

Download Full-text

Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data

Diagnostics ◽

10.3390/diagnostics10060415 ◽

2020 ◽

Vol 10 (6) ◽

pp. 415 ◽

Cited By ~ 2

Author(s):

Bomi Jeong ◽

Hyunjeong Cho ◽

Jieun Kim ◽

Soon Kil Kwon ◽

SeungWoo Hong ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Statistical Models ◽

Classification Performance ◽

Health Examination ◽

Learning Methods ◽

Response Variable ◽

Machine Learning Methods ◽

Sensitivity Specificity ◽

Weighted Values

This study aims to compare the classification performance of statistical models on highly imbalanced kidney data. The health examination cohort database provided by the National Health Insurance Service in Korea is utilized to build models with various machine learning methods. The glomerular filtration rate (GFR) is used to diagnose chronic kidney disease (CKD). It is calculated using the Modification of Diet in Renal Disease method and classified into five stages (1, 2, 3A and 3B, 4, and 5). Different CKD stages based on the estimated GFR are considered as six classes of the response variable. This study utilizes two representative generalized linear models for classification, namely, multinomial logistic regression (multinomial LR) and ordinal logistic regression (ordinal LR), as well as two machine learning models, namely, random forest (RF) and autoencoder (AE). The classification performance of the four models is compared in terms of accuracy, sensitivity, specificity, precision, and F1-Measure. To find the best model that classifies CKD stages correctly, the data are divided into a 10-fold dataset with the same rate for each CKD stage. Results indicate that RF and AE show better performance in accuracy than the multinomial and ordinal LR models when classifying the response variable. However, when a highly imbalanced dataset is modeled, the accuracy of the model performance can distort the actual performance. This occurs because accuracy is high even if a statistical model classifies a minority class into a majority class. To solve this problem in performance interpretation, we not only consider accuracy from the confusion matrix but also sensitivity, specificity, precision, and F-1 measure for each class. To present classification performance with a single value for each model, we calculate the macro-average and micro-weighted values for each model. We conclude that AE is the best model classifying CKD stages correctly for all performance indices.

Download Full-text

Network context matters: graph convolutional network model over social networks improves the detection of unknown HIV infections among young men who have sex with men

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz070 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1263-1271 ◽

Cited By ~ 3

Author(s):

Yang Xiang ◽

Kayo Fujimoto ◽

John Schneider ◽

Yuxi Jia ◽

Degui Zhi ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Social Network ◽

Random Forest ◽

Hiv Infections ◽

Hiv Status ◽

Learning Methods ◽

Network Information ◽

Machine Learning Methods ◽

Sex With Men

Abstract Objective HIV infection risk can be estimated based on not only individual features but also social network information. However, there have been insufficient studies using n machine learning methods that can maximize the utility of such information. Leveraging a state-of-the-art network topology modeling method, graph convolutional networks (GCN), our main objective was to include network information for the task of detecting previously unknown HIV infections. Materials and Methods We used multiple social network data (peer referral, social, sex partners, and affiliation with social and health venues) that include 378 young men who had sex with men in Houston, TX, collected between 2014 and 2016. Due to the limited sample size, an ensemble approach was engaged by integrating GCN for modeling information flow and statistical machine learning methods, including random forest and logistic regression, to efficiently model sparse features in individual nodes. Results Modeling network information using GCN effectively increased the prediction of HIV status in the social network. The ensemble approach achieved 96.6% on accuracy and 94.6% on F1 measure, which outperformed the baseline methods (GCN, logistic regression, and random forest: 79.0%, 90.5%, 94.4% on accuracy, respectively; and 57.7%, 80.2%, 90.4% on F1). In the networks with missing HIV status, the ensemble also produced promising results. Conclusion Network context is a necessary component in modeling infectious disease transmissions such as HIV. GCN, when combined with traditional machine learning approaches, achieved promising performance in detecting previously unknown HIV infections, which may provide a useful tool for combatting the HIV epidemic.

Download Full-text

Machine learning methods are comparable to logistic regression techniques in predicting severe walking limitation following total knee arthroplasty

Knee Surgery Sports Traumatology Arthroscopy ◽

10.1007/s00167-019-05822-7 ◽

2019 ◽

Vol 28 (10) ◽

pp. 3207-3216 ◽

Cited By ~ 2

Author(s):

Yong-Hao Pua ◽

Hakmook Kang ◽

Julian Thumboo ◽

Ross Allan Clark ◽

Eleanor Shu-Xian Chew ◽

...

Keyword(s):

Machine Learning ◽

Total Knee Arthroplasty ◽

Logistic Regression ◽

Knee Arthroplasty ◽

Learning Methods ◽

Machine Learning Methods ◽

Regression Techniques ◽

Total Knee

Download Full-text

ClinicNet: machine learning for personalized clinical order set recommendations

JAMIA Open ◽

10.1093/jamiaopen/ooaa021 ◽

2020 ◽

Vol 3 (2) ◽

pp. 216-224

Author(s):

Jonathan X Wang ◽

Delaney K Sullivan ◽

Alex C Wells ◽

Jonathan H Chen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Support ◽

Clinical Decision ◽

Clinical Event ◽

Learning Methods ◽

Machine Learning Methods ◽

Order Sets ◽

Institutional Order ◽

Order Set

Abstract Objective This study assesses whether neural networks trained on electronic health record (EHR) data can anticipate what individual clinical orders and existing institutional order set templates clinicians will use more accurately than existing decision support tools. Materials and Methods We process 57 624 patients worth of clinical event EHR data from 2008 to 2014. We train a feed-forward neural network (ClinicNet) and logistic regression applied to the traditional problem structure of predicting individual clinical items as well as our proposed workflow of predicting existing institutional order set template usage. Results ClinicNet predicts individual clinical orders (precision = 0.32, recall = 0.47) better than existing institutional order sets (precision = 0.15, recall = 0.46). The ClinicNet model predicts clinician usage of existing institutional order sets (avg. precision = 0.31) with higher average precision than a baseline of order set usage frequencies (avg. precision = 0.20) or a logistic regression model (avg. precision = 0.12). Discussion Machine learning methods can predict clinical decision-making patterns with greater accuracy and less manual effort than existing static order set templates. This can streamline existing clinical workflows, but may not fit if historical clinical ordering practices are incorrect. For this reason, manually authored content such as order set templates remain valuable for the purposeful design of care pathways. ClinicNet’s capability of predicting such personalized order set templates illustrates the potential of combining both top-down and bottom-up approaches to delivering clinical decision support content. Conclusion ClinicNet illustrates the capability for machine learning methods applied to the EHR to anticipate both individual clinical orders and existing order set templates, which has the potential to improve upon current standards of practice in clinical order entry.

Download Full-text

Application of Electrical Tomography Imaging Using Machine Learning Methods for the Monitoring of Flood Embankments Leaks

Energies ◽

10.3390/en14238081 ◽

2021 ◽

Vol 14 (23) ◽

pp. 8081

Author(s):

Tomasz Rymarczyk ◽

Krzysztof Król ◽

Edward Kozłowski ◽

Tomasz Wołowiec ◽

Marta Cholewa-Wiktor ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Single Point ◽

Learning System ◽

Reliable Measurement ◽

Electrical Tomography ◽

Learning Methods ◽

Conductivity Distribution ◽

Tomography Imaging ◽

Machine Learning Methods

This paper presents an application for the monitoring of leaks in flood embankments by reconstructing images in electrical tomography using logistic regression machine learning methods with elastic net regularisation, PCA and wave preprocessing. The main advantage of this solution is to obtain a more accurate spatial conductivity distribution inside the studied object. The described method assumes a learning system consisting of multiple equations working in parallel, where each equation creates a single point in the output image. This enables the efficient reconstruction of spatial images. The research focused on preparing, developing, and comparing algorithms and models for data analysis and reconstruction using a proprietary electrical tomography solution. A reliable measurement solution with sensors and machine learning methods makes it possible to analyse damage and leaks, leading to effective information and the eventual prevention of risks. The applied methods enable the improved resolution of the reconstructed images and the possibility to obtain them in real-time, which is their distinguishing feature compared to other methods. The use of electrical tomography in combination with specific methods for image reconstruction allows for an accurate spatial assessment of leaks and damage to dikes.

Download Full-text