scholarly journals Exploiting Machine Learning Algorithms and Methods for the Prediction of Agitated Delirium After Cardiac Surgery: Models Development and Validation Study (Preprint)

2019 ◽  
Author(s):  
Hani Nabeel Mufti ◽  
Gregory Marshal Hirsch ◽  
Samina Raza Abidi ◽  
Syed Sibte Raza Abidi

BACKGROUND Delirium is a temporary mental disorder that occasionally affects patients undergoing surgery, especially cardiac surgery. It is strongly associated with major adverse events, which in turn leads to increased cost and poor outcomes (eg, need for nursing home due to cognitive impairment, stroke, and death). The ability to foresee patients at risk of delirium will guide the timely initiation of multimodal preventive interventions, which will aid in reducing the burden and negative consequences associated with delirium. Several studies have focused on the prediction of delirium. However, the number of studies in cardiac surgical patients that have used machine learning methods is very limited. OBJECTIVE This study aimed to explore the application of several machine learning predictive models that can pre-emptively predict delirium in patients undergoing cardiac surgery and compare their performance. METHODS We investigated a number of machine learning methods to develop models that can predict delirium after cardiac surgery. A clinical dataset comprising over 5000 actual patients who underwent cardiac surgery in a single center was used to develop the models using logistic regression, artificial neural networks (ANN), support vector machines (SVM), Bayesian belief networks (BBN), naïve Bayesian, random forest, and decision trees. RESULTS Only 507 out of 5584 patients (11.4%) developed delirium. We addressed the underlying class imbalance, using random undersampling, in the training dataset. The final prediction performance was validated on a separate test dataset. Owing to the target class imbalance, several measures were used to evaluate algorithm’s performance for the delirium class on the test dataset. Out of the selected algorithms, the SVM algorithm had the best F1 score for positive cases, kappa, and positive predictive value (40.2%, 29.3%, and 29.7%, respectively) with a <italic>P</italic>=.01, .03, .02, respectively. The ANN had the best receiver-operator area-under the curve (78.2%; <italic>P</italic>=.03). The BBN had the best precision-recall area-under the curve for detecting positive cases (30.4%; <italic>P</italic>=.03). CONCLUSIONS Although delirium is inherently complex, preventive measures to mitigate its negative effect can be applied proactively if patients at risk are prospectively identified. Our results highlight 2 important points: (1) addressing class imbalance on the training dataset will augment machine learning model’s performance in identifying patients likely to develop postoperative delirium, and (2) as the prediction of postoperative delirium is difficult because it is multifactorial and has complex pathophysiology, applying machine learning methods (complex or simple) may improve the prediction by revealing hidden patterns, which will lead to cost reduction by prevention of complications and will optimize patients’ outcomes.

10.2196/14993 ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. e14993
Author(s):  
Hani Nabeel Mufti ◽  
Gregory Marshal Hirsch ◽  
Samina Raza Abidi ◽  
Syed Sibte Raza Abidi

Background Delirium is a temporary mental disorder that occasionally affects patients undergoing surgery, especially cardiac surgery. It is strongly associated with major adverse events, which in turn leads to increased cost and poor outcomes (eg, need for nursing home due to cognitive impairment, stroke, and death). The ability to foresee patients at risk of delirium will guide the timely initiation of multimodal preventive interventions, which will aid in reducing the burden and negative consequences associated with delirium. Several studies have focused on the prediction of delirium. However, the number of studies in cardiac surgical patients that have used machine learning methods is very limited. Objective This study aimed to explore the application of several machine learning predictive models that can pre-emptively predict delirium in patients undergoing cardiac surgery and compare their performance. Methods We investigated a number of machine learning methods to develop models that can predict delirium after cardiac surgery. A clinical dataset comprising over 5000 actual patients who underwent cardiac surgery in a single center was used to develop the models using logistic regression, artificial neural networks (ANN), support vector machines (SVM), Bayesian belief networks (BBN), naïve Bayesian, random forest, and decision trees. Results Only 507 out of 5584 patients (11.4%) developed delirium. We addressed the underlying class imbalance, using random undersampling, in the training dataset. The final prediction performance was validated on a separate test dataset. Owing to the target class imbalance, several measures were used to evaluate algorithm’s performance for the delirium class on the test dataset. Out of the selected algorithms, the SVM algorithm had the best F1 score for positive cases, kappa, and positive predictive value (40.2%, 29.3%, and 29.7%, respectively) with a P=.01, .03, .02, respectively. The ANN had the best receiver-operator area-under the curve (78.2%; P=.03). The BBN had the best precision-recall area-under the curve for detecting positive cases (30.4%; P=.03). Conclusions Although delirium is inherently complex, preventive measures to mitigate its negative effect can be applied proactively if patients at risk are prospectively identified. Our results highlight 2 important points: (1) addressing class imbalance on the training dataset will augment machine learning model’s performance in identifying patients likely to develop postoperative delirium, and (2) as the prediction of postoperative delirium is difficult because it is multifactorial and has complex pathophysiology, applying machine learning methods (complex or simple) may improve the prediction by revealing hidden patterns, which will lead to cost reduction by prevention of complications and will optimize patients’ outcomes.


2020 ◽  
Author(s):  
Abdur Rahman M. A. Basher ◽  
Steven J. Hallam

AbstractMachine learning methods show great promise in predicting metabolic pathways at different levels of biological organization. However, several complications remain that can degrade prediction performance including inadequately labeled training data, missing feature information, and inherent imbalances in the distribution of enzymes and pathways within a dataset. This class imbalance problem is commonly encountered by the machine learning community when the proportion of instances over class labels within a dataset are uneven, resulting in poor predictive performance for underrepresented classes. Here, we present leADS, multi-label learning based on active dataset subsampling, that leverages the idea of subsampling points from a pool of data to reduce the negative impact of training loss due to class imbalance. Specifically, leADS performs an iterative process to: (i)-construct an acquisition model in an ensemble framework; (ii) select informative points using an appropriate acquisition function; and (iii) train on selected samples. Multiple base learners are implemented in parallel where each is assigned a portion of labeled training data to learn pathways. We benchmark leADS using a corpora of 10 experimental datasets manifesting diverse multi-label properties used in previous pathway prediction studies, including manually curated organismal genomes, synthetic microbial communities and low complexity microbial communities. Resulting performance metrics equaled or exceeded previously reported machine learning methods for both organismal and multi-organismal genomes while establishing an extensible framework for navigating class imbalances across diverse real world datasets.Availability and implementationThe software package, and installation instructions are published on github.com/[email protected]


2021 ◽  
Author(s):  
Jill M Westcott ◽  
Francine Hughes ◽  
Wenke Liu ◽  
Mark Grivainis ◽  
Iffath Hoskins ◽  
...  

BACKGROUND Postpartum hemorrhage remains one of the largest causes of maternal morbidity and mortality in the United States. OBJECTIVE To utilize machine learning techniques to identify patients at risk for postpartum hemorrhage at obstetric delivery. METHODS Women aged 18 to 55 delivering at a major academic center from July 2013 to October 2018 were included for analysis (n = 30,867). A total of 497 variables were collected from the electronic medical record including demographic information, obstetric, medical, surgical, and family history, vital signs, laboratory results, labor medication exposures, and delivery outcomes. Postpartum hemorrhage was defined as a blood loss of ≥ 1000 mL at the time of delivery, regardless of delivery method, with 2179 positive cases observed (7.06%). Supervised learning with regression-, tree-, and kernel-based machine learning methods was used to create classification models based upon training (n = 21,606) and validation (n = 4,630) cohorts. Models were tuned using feature selection algorithms and domain knowledge. An independent test cohort (n = 4,631) determined final performance by assessing for accuracy, area under the receiver operating curve (AUC), and sensitivity for proper classification of postpartum hemorrhage. Separate models were created using all collected data versus limited to data available prior to the second stage of labor/at the time of decision to proceed with cesarean delivery. Additional models examined patients by mode of delivery. RESULTS Gradient boosted decision trees achieved the best discrimination in the overall model. The model including all data mildly outperformed the second stage model (AUC 0.979, 95% CI 0.971-0.986 vs. AUC 0.955, 95% CI 0.939-0.970). Optimal model accuracy was 98.1% with a sensitivity of 0.763 for positive prediction of postpartum hemorrhage. The second stage model achieved an accuracy of 98.0% with a sensitivity of 0.737. Other selected algorithms returned models that performed with decreased discrimination. Models stratified by mode of delivery achieved good to excellent discrimination, but lacked sensitivity necessary for clinical applicability. CONCLUSIONS Machine learning methods can be used to identify women at risk for postpartum hemorrhage who may benefit from individualized preventative measures. Models limited to data available prior to delivery perform nearly as well as those with more complete datasets, supporting their potential utility in the clinical setting. Further work is necessary to create successful models based upon mode of delivery. An unbiased approach to hemorrhage risk prediction may be superior to human risk assessment and represents an area for future research.


2018 ◽  
Author(s):  
Quazi Abidur Rahman ◽  
Tahir Janmohamed ◽  
Meysam Pirbaglou ◽  
Hance Clarke ◽  
Paul Ritvo ◽  
...  

BACKGROUND Measuring and predicting pain volatility (fluctuation or variability in pain scores over time) can help improve pain management. Perceptions of pain and its consequent disabling effects are often heightened under the conditions of greater uncertainty and unpredictability associated with pain volatility. OBJECTIVE This study aimed to use data mining and machine learning methods to (1) define a new measure of pain volatility and (2) predict future pain volatility levels from users of the pain management app, Manage My Pain, based on demographic, clinical, and app use features. METHODS Pain volatility was defined as the mean of absolute changes between 2 consecutive self-reported pain severity scores within the observation periods. The k-means clustering algorithm was applied to users’ pain volatility scores at the first and sixth month of app use to establish a threshold discriminating low from high volatility classes. Subsequently, we extracted 130 demographic, clinical, and app usage features from the first month of app use to predict these 2 volatility classes at the sixth month of app use. Prediction models were developed using 4 methods: (1) logistic regression with ridge estimators; (2) logistic regression with Least Absolute Shrinkage and Selection Operator; (3) Random Forests; and (4) Support Vector Machines. Overall prediction accuracy and accuracy for both classes were calculated to compare the performance of the prediction models. Training and testing were conducted using 5-fold cross validation. A class imbalance issue was addressed using a random subsampling of the training dataset. Users with at least five pain records in both the predictor and outcome periods (N=782 users) are included in the analysis. RESULTS k-means clustering algorithm was applied to pain volatility scores to establish a threshold of 1.6 to differentiate between low and high volatility classes. After validating the threshold using random subsamples, 2 classes were created: low volatility (n=611) and high volatility (n=171). In this class-imbalanced dataset, all 4 prediction models achieved 78.1% (611/782) to 79.0% (618/782) in overall accuracy. However, all models have a prediction accuracy of less than 18.7% (32/171) for the high volatility class. After addressing the class imbalance issue using random subsampling, results improved across all models for the high volatility class to greater than 59.6% (102/171). The prediction model based on Random Forests performs the best as it consistently achieves approximately 70% accuracy for both classes across 3 random subsamples. CONCLUSIONS We propose a novel method for measuring pain volatility. Cluster analysis was applied to divide users into subsets of low and high volatility classes. These classes were then predicted at the sixth month of app use with an acceptable degree of accuracy using machine learning methods based on the features extracted from demographic, clinical, and app use information from the first month.


2020 ◽  
Vol 10 (9) ◽  
pp. 3307 ◽  
Author(s):  
Khishigsuren Davagdorj ◽  
Jong Seol Lee ◽  
Van Huy Pham ◽  
Keun Ho Ryu

Smoking is one of the major public health issues, which has a significant impact on premature death. In recent years, numerous decision support systems have been developed to deal with smoking cessation based on machine learning methods. However, the inevitable class imbalance is considered a major challenge in deploying such systems. In this paper, we study an empirical comparison of machine learning techniques to deal with the class imbalance problem in the prediction of smoking cessation intervention among the Korean population. For the class imbalance problem, the objective of this paper is to improve the prediction performance based on the utilization of synthetic oversampling techniques, which we called the synthetic minority over-sampling technique (SMOTE) and an adaptive synthetic (ADASYN). This has been achieved by the experimental design, which comprises three components. First, the selection of the best representative features is performed in two phases: the lasso method and multicollinearity analysis. Second, generate the newly balanced data utilizing SMOTE and ADASYN technique. Third, machine learning classifiers are applied to construct the prediction models among all subjects and each gender. In order to justify the effectiveness of the prediction models, the f-score, type I error, type II error, balanced accuracy and geometric mean indices are used. Comprehensive analysis demonstrates that Gradient Boosting Trees (GBT), Random Forest (RF) and multilayer perceptron neural network (MLP) classifiers achieved the best performances in all subjects and each gender when SMOTE and ADASYN were utilized. The SMOTE with GBT and RF models also provide feature importance scores that enhance the interpretability of the decision-support system. In addition, it is proven that the presented synthetic oversampling techniques with machine learning models outperformed baseline models in smoking cessation prediction.


2021 ◽  
Vol 0 (0) ◽  
pp. 0-0
Author(s):  
Santino R. Rellum ◽  
Jaap Schuurmans ◽  
Ward H. van der Ven ◽  
Susanne Eberl ◽  
Antoine H. G. Driessen ◽  
...  

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kai-Yao Huang ◽  
Justin Bo-Kai Hsu ◽  
Tzong-Yi Lee

Abstract Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.


2021 ◽  
Vol 11 (2) ◽  
pp. 150
Author(s):  
Hasan Aykut Karaboga ◽  
Aslihan Gunel ◽  
Senay Vural Korkut ◽  
Ibrahim Demir ◽  
Resit Celik

Clinical diagnosis of amyotrophic lateral sclerosis (ALS) is difficult in the early period. But blood tests are less time consuming and low cost methods compared to other methods for the diagnosis. The ALS researchers have been used machine learning methods to predict the genetic architecture of disease. In this study we take advantages of Bayesian networks and machine learning methods to predict the ALS patients with blood plasma protein level and independent personal features. According to the comparison results, Bayesian Networks produced best results with accuracy (0.887), area under the curve (AUC) (0.970) and other comparison metrics. We confirmed that sex and age are effective variables on the ALS. In addition, we found that the probability of onset involvement in the ALS patients is very high. Also, a person’s other chronic or neurological diseases are associated with the ALS disease. Finally, we confirmed that the Parkin level may also have an effect on the ALS disease. While this protein is at very low levels in Parkinson’s patients, it is higher in the ALS patients than all control groups.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sam Andersson ◽  
Deepti R. Bathula ◽  
Stavros I. Iliadis ◽  
Martin Walter ◽  
Alkistis Skalkidou

AbstractPostpartum depression (PPD) is a detrimental health condition that affects 12% of new mothers. Despite negative effects on mothers’ and children’s health, many women do not receive adequate care. Preventive interventions are cost-efficient among high-risk women, but our ability to identify these is poor. We leveraged the power of clinical, demographic, and psychometric data to assess if machine learning methods can make accurate predictions of postpartum depression. Data were obtained from a population-based prospective cohort study in Uppsala, Sweden, collected between 2009 and 2018 (BASIC study, n = 4313). Sub-analyses among women without previous depression were performed. The extremely randomized trees method provided robust performance with highest accuracy and well-balanced sensitivity and specificity (accuracy 73%, sensitivity 72%, specificity 75%, positive predictive value 33%, negative predictive value 94%, area under the curve 81%). Among women without earlier mental health issues, the accuracy was 64%. The variables setting women at most risk for PPD were depression and anxiety during pregnancy, as well as variables related to resilience and personality. Future clinical models that could be implemented directly after delivery might consider including these variables in order to identify women at high risk for postpartum depression to facilitate individualized follow-up and cost-effectiveness.


Cancers ◽  
2021 ◽  
Vol 13 (20) ◽  
pp. 5140
Author(s):  
Gun Oh Chong ◽  
Shin-Hyung Park ◽  
Nora Jee-Young Park ◽  
Bong Kyung Bae ◽  
Yoon Hee Lee ◽  
...  

Background: Our previous study demonstrated that tumor budding (TB) status was associated with inferior overall survival in cervical cancer. The purpose of this study is to evaluate whether radiomic features can predict TB status in cervical cancer patients. Methods: Seventy-four patients with cervical cancer who underwent preoperative MRI and radical hysterectomy from 2011 to 2015 at our institution were enrolled. The patients were randomly allocated to the training dataset (n = 48) and test dataset (n = 26). Tumors were segmented on axial gadolinium-enhanced T1- and T2-weighted images. A total of 2074 radiomic features were extracted. Four machine learning classifiers, including logistic regression (LR), random forest (RF), support vector machine (SVM), and neural network (NN), were used. The trained models were validated on the test dataset. Results: Twenty radiomic features were selected; all were features from filtered-images and 85% were texture-related features. The area under the curve values and accuracy of the models by LR, RF, SVM and NN were 0.742 and 0.769, 0.782 and 0.731, 0.849 and 0.885, and 0.891 and 0.731, respectively, in the test dataset. Conclusion: MRI-based radiomic features could predict TB status in patients with cervical cancer.


Sign in / Sign up

Export Citation Format

Share Document