scholarly journals Validating machine learning models for the prediction of labour induction intervention using routine data: a registry-based retrospective cohort study at a tertiary hospital in northern Tanzania

BMJ Open ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. e051925
Author(s):  
Clifford Silver Tarimo ◽  
Soumitra S Bhuyan ◽  
Quanman Li ◽  
Michael Johnson J Mahande ◽  
Jian Wu ◽  
...  

ObjectivesWe aimed at identifying the important variables for labour induction intervention and assessing the predictive performance of machine learning algorithms.SettingWe analysed the birth registry data from a referral hospital in northern Tanzania. Since July 2000, every birth at this facility has been recorded in a specific database.Participants21 578 deliveries between 2000 and 2015 were included. Deliveries that lacked information regarding the labour induction status were excluded.Primary outcomeDeliveries involving labour induction intervention.ResultsParity, maternal age, body mass index, gestational age and birth weight were all found to be important predictors of labour induction. Boosting method demonstrated the best discriminative performance (area under curve, AUC=0.75: 95% CI (0.73 to 0.76)) while logistic regression presented the least (AUC=0.71: 95% CI (0.70 to 0.73)). Random forest and boosting algorithms showed the highest net-benefits as per the decision curve analysis.ConclusionAll of the machine learning algorithms performed well in predicting the likelihood of labour induction intervention. Further optimisation of these classifiers through hyperparameter tuning may result in an improved performance. Extensive research into the performance of other classifier algorithms is warranted.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Chengmao Zhou ◽  
Junhong Hu ◽  
Ying Wang ◽  
Mu-Huo Ji ◽  
Jianhua Tong ◽  
...  

AbstractTo explore the predictive performance of machine learning on the recurrence of patients with gastric cancer after the operation. The available data is divided into two parts. In particular, the first part is used as a training set (such as 80% of the original data), and the second part is used as a test set (the remaining 20% of the data). And we use fivefold cross-validation. The weight of recurrence factors shows the top four factors are BMI, Operation time, WGT and age in order. In training group:among the 5 machine learning models, the accuracy of gbm was 0.891, followed by gbm algorithm was 0.876; The AUC values of the five machine learning algorithms are from high to low as forest (0.962), gbm (0.922), GradientBoosting (0.898), DecisionTree (0.790) and Logistic (0.748). And the precision of the forest is the highest 0.957, followed by the GradientBoosting algorithm (0.878). At the same time, in the test group is as follows: the highest accuracy of Logistic was 0.801, followed by forest algorithm and gbm; the AUC values of the five algorithms are forest (0.795), GradientBoosting (0.774), DecisionTree (0.773), Logistic (0.771) and gbm (0.771), from high to low. Among the five machine learning algorithms, the highest precision rate of Logistic is 1.000, followed by the gbm (0.487). Machine learning can predict the recurrence of gastric cancer patients after an operation. Besides, the first four factors affecting postoperative recurrence of gastric cancer were BMI, Operation time, WGT and age.


Author(s):  
Michael McCartney ◽  
Matthias Haeringer ◽  
Wolfgang Polifke

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.


Mathematics ◽  
2021 ◽  
Vol 9 (20) ◽  
pp. 2537
Author(s):  
Luis Rolando Guarneros-Nolasco ◽  
Nancy Aracely Cruz-Ramos ◽  
Giner Alor-Hernández ◽  
Lisbeth Rodríguez-Mazahua ◽  
José Luis Sánchez-Cervantes

Cardiovascular Diseases (CVDs) are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. As an effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc—using the train-test split technique and k-fold cross-validation. Our study identifies the top-two and top-four attributes from CVD datasets analyzing the performance of the accuracy metrics to determine that they are the best for predicting and diagnosing CVD. As our main findings, the ten ML classifiers exhibited appropriate diagnosis in classification and predictive performance with accuracy metric with top-two attributes, identifying three main attributes for diagnosis and prediction of a CVD such as arrhythmia and tachycardia; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
André F. M. Batista ◽  
Carmen S. G. Diniz ◽  
Eliana A. Bonilha ◽  
Ichiro Kawachi ◽  
Alexandre D. P. Chiavegatto Filho

Abstract Background Recent decreases in neonatal mortality have been slower than expected for most countries. This study aims to predict the risk of neonatal mortality using only data routinely available from birth records in the largest city of the Americas. Methods A probabilistic linkage of every birth record occurring in the municipality of São Paulo, Brazil, between 2012 e 2017 was performed with the death records from 2012 to 2018 (1,202,843 births and 447,687 deaths), and a total of 7282 neonatal deaths were identified (a neonatal mortality rate of 6.46 per 1000 live births). Births from 2012 and 2016 (N = 941,308; or 83.44% of the total) were used to train five different machine learning algorithms, while births occurring in 2017 (N = 186,854; or 16.56% of the total) were used to test their predictive performance on new unseen data. Results The best performance was obtained by the extreme gradient boosting trees (XGBoost) algorithm, with a very high AUC of 0.97 and F1-score of 0.55. The 5% births with the highest predicted risk of neonatal death included more than 90% of the actual neonatal deaths. On the other hand, there were no deaths among the 5% births with the lowest predicted risk. There were no significant differences in predictive performance for vulnerable subgroups. The use of a smaller number of variables (WHO’s five minimum perinatal indicators) decreased overall performance but the results still remained high (AUC of 0.91). With the addition of only three more variables, we achieved the same predictive performance (AUC of 0.97) as using all the 23 variables originally available from the Brazilian birth records. Conclusion Machine learning algorithms were able to identify with very high predictive performance the neonatal mortality risk of newborns using only routinely collected data.


Author(s):  
Luis Rolando Guarneros-Nolasco ◽  
Nancy Aracely Cruz-Ramos ◽  
Giner Alor-Hernández ◽  
Lisbeth Rodríguez-Mazahua ◽  
José Luis Sánchez-Cervantes

CVDs are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. Since effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc – using the train-test split technique and k-fold cross-validation. Our study identifies the top two and four attributes from each CVD diagnosis/prediction dataset. As our main findings, the ten MLAs exhibited appropriate diagnosis and predictive performance; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.


2021 ◽  
Author(s):  
Nuno Moniz ◽  
Susana Barbosa

<p>The Dansgaard-Oeschger (DO) events are one of the most striking examples of abrupt climate change in the Earth's history, representing temperature oscillations of about 8 to 16 degrees Celsius within a few decades. DO events have been studied extensively in paleoclimatic records, particularly in ice core proxies. Examples include the Greenland NGRIP record of oxygen isotopic composition.<br>This work addresses the anticipation of DO events using machine learning algorithms. We consider the NGRIP time series from 20 to 60 kyr b2k with the GICC05 timescale and 20-year temporal resolution. Forecasting horizons range from 0 (nowcasting) to 400 years. We adopt three different machine learning algorithms (random forests, support vector machines, and logistic regression) in training windows of 5 kyr. We perform validation on subsequent test windows of 5 kyr, based on timestamps of previous DO events' classification in Greenland by Rasmussen et al. (2014). We perform experiments with both sliding and growing windows.<br>Results show that predictions on sliding windows are better overall, indicating that modelling is affected by non-stationary characteristics of the time series. The three algorithms' predictive performance is similar, with a slightly better performance of random forest models for shorter forecast horizons. The prediction models' predictive capability decreases as the forecasting horizon grows more extensive but remains reasonable up to 120 years. Model performance deprecation is mostly related to imprecision in accurately determining the start and end time of events and identifying some periods as DO events when such is not valid.</p>


2019 ◽  
Vol 31 (4) ◽  
pp. 568-578 ◽  
Author(s):  
Anshit Goyal ◽  
Che Ngufor ◽  
Panagiotis Kerezoudis ◽  
Brandon McCutcheon ◽  
Curtis Storlie ◽  
...  

OBJECTIVENonhome discharge and unplanned readmissions represent important cost drivers following spinal fusion. The authors sought to utilize different machine learning algorithms to predict discharge to rehabilitation and unplanned readmissions in patients receiving spinal fusion.METHODSThe authors queried the 2012–2013 American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) for patients undergoing cervical or lumbar spinal fusion. Outcomes assessed included discharge to nonhome facility and unplanned readmissions within 30 days after surgery. A total of 7 machine learning algorithms were evaluated. Predictive hierarchical clustering of procedure codes was used to increase model performance. Model performance was evaluated using overall accuracy and area under the receiver operating characteristic curve (AUC), as well as sensitivity, specificity, and positive and negative predictive values. These performance metrics were computed for both the imputed and unimputed (missing values dropped) datasets.RESULTSA total of 59,145 spinal fusion cases were analyzed. The incidence rates of discharge to nonhome facility and 30-day unplanned readmission were 12.6% and 4.5%, respectively. All classification algorithms showed excellent discrimination (AUC > 0.80, range 0.85–0.87) for predicting nonhome discharge. The generalized linear model showed comparable performance to other machine learning algorithms. By comparison, all models showed poorer predictive performance for unplanned readmission, with AUC ranging between 0.63 and 0.66. Better predictive performance was noted with models using imputed data.CONCLUSIONSIn an analysis of patients undergoing spinal fusion, multiple machine learning algorithms were found to reliably predict nonhome discharge with modest performance noted for unplanned readmissions. These results provide early evidence regarding the feasibility of modern machine learning classifiers in predicting these outcomes and serve as possible clinical decision support tools to facilitate shared decision making.


2020 ◽  
pp. 1420326X2092707 ◽  
Author(s):  
Junseok Park ◽  
Bongchan Jeong ◽  
Young-Tae Chae ◽  
Jae-Weon Jeong

The manual control of windows is one of the common adaptive behaviours for occupants to adjust their indoor environment in homes. The cross-ventilation by the window opening provides a useful tool to control the thermal comfort and indoor air quality in homes. The objective of this study was to develop a modelling methodology for predicting individual occupant's behaviour relating to the manual control of windows by using machine learning algorithms. The proposed six machine learning algorithms were trained by the field monitoring data of 23 sample homes. The predictive performance of the machine learning algorithms was analysed. The algorithms predicted the occupant's behaviour more precisely compared with the logistic model. Among the algorithms, K-Nearest Neighbours (KNN) shows the best fitness with the monitored data set. The driving parameters of the manual control of windows in each sample home can be clearly drawn by the algorithms. The proposed machine learning algorithms can help to understand the influence of the occupant's behaviour on the indoor environment in buildings.


Sign in / Sign up

Export Citation Format

Share Document