scholarly journals Development of prediction models of spontaneous ureteral stone passage through machine learning: Comparison with conventional statistical analysis

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260517
Author(s):  
Jee Soo Park ◽  
Dong Wook Kim ◽  
Dongu Lee ◽  
Taeju Lee ◽  
Kyo Chul Koo ◽  
...  

Objectives To develop a prediction model of spontaneous ureteral stone passage (SSP) using machine learning and logistic regression and compare the performance of the two models. Indications for management of ureteral stones are unclear, and the clinician determines whether to wait for SSP or perform active treatment, especially in well-controlled patients, to avoid unwanted complications. Therefore, suggesting the possibility of SSP would help make a clinical decision regarding ureteral stones. Methods Patients diagnosed with unilateral ureteral stones at our emergency department between August 2014 and September 2018 were included and underwent non-contrast-enhanced computed tomography 4 weeks from the first stone episode. Predictors of SSP were applied to build and validate the prediction model using multilayer perceptron (MLP) with the Keras framework. Results Of 833 patients, SSP was observed in 606 (72.7%). SSP rates were 68.2% and 75.6% for stone sizes 5–10 mm and <5 mm, respectively. Stone opacity, location, and whether it was the first ureteral stone episode were significant predictors of SSP. Areas under the curve (AUCs) for receiver operating characteristic (ROC) curves for MLP, and logistic regression were 0.859 and 0.847, respectively, for stones <5 mm, and 0.881 and 0.817, respectively, for 5–10 mm stones. Conclusion SSP prediction models were developed in patients with well-controlled unilateral ureteral stones; the performance of the models was good, especially in identifying SSP for 5–10-mm ureteral stones without definite treatment guidelines. To further improve the performance of these models, future studies should focus on using machine learning techniques in image analysis.

2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Changhyun Choi ◽  
Jeonghwan Kim ◽  
Jongsung Kim ◽  
Donghyun Kim ◽  
Younghye Bae ◽  
...  

Prediction models of heavy rain damage using machine learning based on big data were developed for the Seoul Capital Area in the Republic of Korea. We used data on the occurrence of heavy rain damage from 1994 to 2015 as dependent variables and weather big data as explanatory variables. The model was developed by applying machine learning techniques such as decision trees, bagging, random forests, and boosting. As a result of evaluating the prediction performance of each model, the AUC value of the boosting model using meteorological data from the past 1 to 4 days was the highest at 95.87% and was selected as the final model. By using the prediction model developed in this study to predict the occurrence of heavy rain damage for each administrative region, we can greatly reduce the damage through proactive disaster management.


2020 ◽  
Author(s):  
Young Min Park ◽  
Byung-Joo Lee

Abstract Background: This study analyzed the prognostic significance of nodal factors, including the number of metastatic LNs and LNR, in patients with PTC, and attempted to construct a disease recurrence prediction model using machine learning techniques.Methods: We retrospectively analyzed clinico-pathologic data from 1040 patients diagnosed with papillary thyroid cancer between 2003 and 2009. Results: We analyzed clinico-pathologic factors related to recurrence through logistic regression analysis. Among the factors that we included, only sex and tumor size were significantly correlated with disease recurrence. Parameters such as age, sex, tumor size, tumor multiplicity, ETE, ENE, pT, pN, ipsilateral central LN metastasis, contralateral central LNs metastasis, number of metastatic LNs, and LNR were input for construction of a machine learning prediction model. The performance of five machine learning models related to recurrence prediction was compared based on accuracy. The Decision Tree model showed the best accuracy at 95%, and the lightGBM and stacking model together showed 93% accuracy. Conclusions: We confirmed that all machine learning prediction models showed an accuracy of 90% or more for predicting disease recurrence in PTC. Large-scale multicenter clinical studies should be performed to improve the performance of our prediction models and verify their clinical effectiveness.


2020 ◽  
Vol 10 (21) ◽  
pp. 7741
Author(s):  
Sang Yeob Kim ◽  
Gyeong Hee Nam ◽  
Byeong Mun Heo

Metabolic syndrome (MS) is an aggregation of coexisting conditions that can indicate an individual’s high risk of major diseases, including cardiovascular disease, stroke, cancer, and type 2 diabetes. We conducted a cross-sectional survey to evaluate potential risk factor indicators by identifying relationships between MS and anthropometric and spirometric factors along with blood parameters among Korean adults. A total of 13,978 subjects were enrolled from the Korea National Health and Nutrition Examination Survey. Statistical analysis was performed using a complex sampling design to represent the entire Korean population. We conducted binary logistic regression analysis to evaluate and compare potential associations of all included factors. We constructed prediction models based on Naïve Bayes and logistic regression algorithms. The performance evaluation of the prediction model improved the accuracy with area under the curve (AUC) and calibration curve. Among all factors, triglyceride exhibited a strong association with MS in both men (odds ratio (OR) = 2.711, 95% confidence interval (CI) [2.328–3.158]) and women (OR = 3.515 [3.042–4.062]). Regarding anthropometric factors, the waist-to-height ratio demonstrated a strong association in men (OR = 1.511 [1.311–1.742]), whereas waist circumference was the strongest indicator in women (OR = 2.847 [2.447–3.313]). Forced expiratory volume in 6s and forced expiratory flow 25–75% strongly associated with MS in both men (OR = 0.822 [0.749–0.903]) and women (OR = 1.150 [1.060–1.246]). Wrapper-based logistic regression prediction model showed the highest predictive power in both men and women (AUC = 0.868 and 0.932, respectively). Our findings revealed that several factors were associated with MS and suggested the potential of employing machine learning models to support the diagnosis of MS.


2020 ◽  
Author(s):  
Victoria Garcia-Montemayor ◽  
Alejandro Martin-Malo ◽  
Carlo Barbieri ◽  
Francesco Bellocchio ◽  
Sagrario Soriano ◽  
...  

Abstract Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.


The Bank Marketing data set at Kaggle is mostly used in predicting if bank clients will subscribe a long-term deposit. We believe that this data set could provide more useful information such as predicting whether a bank client could be approved for a loan. This is a critical choice that has to be made by decision makers at the bank. Building a prediction model for such high-stakes decision does not only require high model prediction accuracy, but also needs a reasonable prediction interpretation. In this research, different ensemble machine learning techniques have been deployed such as Bagging and Boosting. Our research results showed that the loan approval prediction model has an accuracy of 83.97%, which is approximately 25% better than most state-of-the-art other loan prediction models found in the literature. As well, the model interpretation efforts done in this research was able to explain a few critical cases that the bank decision makers may encounter; therefore, the high accuracy of the designed models was accompanied with a trust in prediction. We believe that the achieved model accuracy accompanied with the provided interpretation information are vitally needed for decision makers to understand how to maintain balance between security and reliability of their financial lending system, while providing fair credit opportunities to their clients.


2021 ◽  
Vol 9 ◽  
Author(s):  
Jie Liu ◽  
Jian Zhang ◽  
Haodong Huang ◽  
Yunting Wang ◽  
Zuyue Zhang ◽  
...  

Objective: We explored the risk factors for intravenous immunoglobulin (IVIG) resistance in children with Kawasaki disease (KD) and constructed a prediction model based on machine learning algorithms.Methods: A retrospective study including 1,398 KD patients hospitalized in 7 affiliated hospitals of Chongqing Medical University from January 2015 to August 2020 was conducted. All patients were divided into IVIG-responsive and IVIG-resistant groups, which were randomly divided into training and validation sets. The independent risk factors were determined using logistic regression analysis. Logistic regression nomograms, support vector machine (SVM), XGBoost and LightGBM prediction models were constructed and compared with the previous models.Results: In total, 1,240 out of 1,398 patients were IVIG responders, while 158 were resistant to IVIG. According to the results of logistic regression analysis of the training set, four independent risk factors were identified, including total bilirubin (TBIL) (OR = 1.115, 95% CI 1.067–1.165), procalcitonin (PCT) (OR = 1.511, 95% CI 1.270–1.798), alanine aminotransferase (ALT) (OR = 1.013, 95% CI 1.008–1.018) and platelet count (PLT) (OR = 0.998, 95% CI 0.996–1). Logistic regression nomogram, SVM, XGBoost, and LightGBM prediction models were constructed based on the above independent risk factors. The sensitivity was 0.617, 0.681, 0.638, and 0.702, the specificity was 0.712, 0.841, 0.967, and 0.903, and the area under curve (AUC) was 0.731, 0.814, 0.804, and 0.874, respectively. Among the prediction models, the LightGBM model displayed the best ability for comprehensive prediction, with an AUC of 0.874, which surpassed the previous classic models of Egami (AUC = 0.581), Kobayashi (AUC = 0.524), Sano (AUC = 0.519), Fu (AUC = 0.578), and Formosa (AUC = 0.575).Conclusion: The machine learning LightGBM prediction model for IVIG-resistant KD patients was superior to previous models. Our findings may help to accomplish early identification of the risk of IVIG resistance and improve their outcomes.


2021 ◽  
Vol 1 (4) ◽  
pp. 268-280
Author(s):  
Bamanga Mahmud , , , Ahmad ◽  
Ahmadu Asabe Sandra ◽  
Musa Yusuf Malgwi ◽  
Dahiru I. Sajoh

For the identification and prediction of different diseases, machine learning techniques are commonly used in clinical decision support systems. Since heart disease is the leading cause of death for both men and women around the world. Heart is one of the essential parts of human body, therefore, it is one of the most critical concerns in the medical domain, and several researchers have developed intelligent medical devices to support the systems and further to enhance the ability to diagnose and predict heart diseases. However, there are few studies that look at the capabilities of ensemble methods in developing a heart disease detection and prediction model. In this study, the researchers assessed that how to use ensemble model, which proposes a more stable performance than the use of base learning algorithm and these leads to better results than other heart disease prediction models. The University of California, Irvine (UCI) Machine Learning Repository archive was used to extract patient heart disease data records. To achieve the aim of this study, the researcher developed the meta-algorithm. The ensemble model is a superior solution in terms of high predictive accuracy and diagnostics output reliability, as per the results of the experiments. An ensemble heart disease prediction model is also presented in this work as a valuable, cost-effective, and timely predictive option with a user-friendly graphical user interface that is scalable and expandable. From the finding, the researcher suggests that Bagging is the best ensemble classifier to be adopted as the extended algorithm that has the high prediction probability score in the implementation of heart disease prediction.


2020 ◽  
Author(s):  
Kelly Yvonne Roger Stevens ◽  
Liesbet Lagaert ◽  
Tom Bakkes ◽  
Malou Evi Gelderblom ◽  
Saskia Houterman ◽  
...  

Abstract Background: Five percent of premenopausal women experience abnormal uterine bleeding. Endometrial ablation (EA) is one of the treatment options for this common problem. However, this technique shows a decrease in patient satisfaction and treatment efficacy on the long term Study objective: To develop a prediction model to predict surgical re-intervention (for example re-ablation or hysterectomy) within two years after EA by using Machine Learning (ML). The performance of the developed prediction model was compared with a previously published multivariate logistic regression model (LR). Design: This retrospective cohort study, with a minimal follow up time of two years, included 446 pre-menopausal women (18+) that underwent an EA for complaints of heavy menstrual bleeding. The performance of the ML- and the LR model was compared using the area under the Receiving Operating Characteristic (ROC) curve. Results: We found out that the ML model (AUC of 0.65 (95% CI 0.56-0.74)) is not superior compared to the LR model (AUC of 0.71 (95% CI 0.64-0.78)) in predicting the outcome of surgical re-intervention within two years after EA. Conclusion: Although Machine Learning techniques are gaining popularity in development of clinical prediction tools, this study shows that ML is not necessarily superior to the traditional statistical LR techniques. The performance of a prediction model is influenced by the sample size, the number of features of a dataset, hyperparameter tuning and the linearity of associations. Both techniques should be considered when developing a clinical prediction model.


Author(s):  
Pier Paolo Mattogno ◽  
Valerio M. Caccavella ◽  
Martina Giordano ◽  
Quintino G. D'Alessandris ◽  
Sabrina Chiloiro ◽  
...  

Abstract Purpose Transsphenoidal surgery (TSS) for pituitary adenomas can be complicated by the occurrence of intraoperative cerebrospinal fluid (CSF) leakage (IOL). IOL significantly affects the course of surgery predisposing to the development of postoperative CSF leakage, a major source of morbidity and mortality in the postoperative period. The authors trained and internally validated the Random Forest (RF) prediction model to preoperatively identify patients at high risk for IOL. A locally interpretable model-agnostic explanations (LIME) algorithm is employed to elucidate the main drivers behind each machine learning (ML) model prediction. Methods The data of 210 patients who underwent TSS were collected; first, risk factors for IOL were identified via conventional statistical methods (multivariable logistic regression). Then, the authors trained, optimized, and audited a RF prediction model. Results IOL reported in 45 patients (21.5%). The recursive feature selection algorithm identified the following variables as the most significant determinants of IOL: Knosp's grade, sellar Hardy's grade, suprasellar Hardy's grade, tumor diameter (on X, Y, and Z axes), intercarotid distance, and secreting status (nonfunctioning and growth hormone [GH] secreting). Leveraging the predictive values of these variables, the RF prediction model achieved an area under the curve (AUC) of 0.83 (95% confidence interval [CI]: 0.78; 0.86), significantly outperforming the multivariable logistic regression model (AUC = 0.63). Conclusion A RF model that reliably identifies patients at risk for IOL was successfully trained and internally validated. ML-based prediction models can predict events that were previously judged nearly unpredictable; their deployment in clinical practice may result in improved patient care and reduced postoperative morbidity and healthcare costs.


2021 ◽  
Vol 44 (4) ◽  
pp. 1-12
Author(s):  
Ratchainant Thammasudjarit ◽  
Punnathorn Ingsathit ◽  
Sigit Ari Saputro ◽  
Atiporn Ingsathit ◽  
Ammarin Thakkinstian

Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments. Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population. Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision. Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%. Conclusions: Risk prediction model of CKD constructed by the logit equation may yield better discrimination and lower tendency to get overfitting relative to ML models including the Neural Network and Random Forest.  


Sign in / Sign up

Export Citation Format

Share Document