Predicting Hospital Readmission in Heart Failure Patients in Iran: A Comparison of Various Machine Learning Methods

Roya Najafi-Vosough; Javad Faradmal; Seyed Kianoosh Hosseini; Abbas Moghimbeigi; Hossein Mahjub

doi:10.4258/hir.2021.27.4.307

Predicting Hospital Readmission in Heart Failure Patients in Iran: A Comparison of Various Machine Learning Methods

Healthcare Informatics Research ◽

10.4258/hir.2021.27.4.307 ◽

2021 ◽

Vol 27 (4) ◽

pp. 307-314

Author(s):

Roya Najafi-Vosough ◽

Javad Faradmal ◽

Seyed Kianoosh Hosseini ◽

Abbas Moghimbeigi ◽

Hossein Mahjub

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Predictive Value ◽

Hospital Readmission ◽

Class Imbalance ◽

Least Square ◽

Support Vector ◽

Common Disease ◽

Hospital Readmission Rate ◽

Sensitivity Specificity

Objectives: Heart failure (HF) is a common disease with a high hospital readmission rate. This study considered class imbalance and missing data, which are two common issues in medical data. The current study’s main goal was to compare the performance of six machine learning (ML) methods for predicting hospital readmission in HF patients.Methods: In this retrospective cohort study, information of 1,856 HF patients was analyzed. These patients were hospitalized in Farshchian Heart Center in Hamadan Province in Western Iran, from October 2015 to July 2019. The support vector machine (SVM), least-square SVM (LS-SVM), bagging, random forest (RF), AdaBoost, and naïve Bayes (NB) methods were used to predict hospital readmission. These methods’ performance was evaluated using sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Two imputation methods were also used to deal with missing data.Results: Of the 1,856 HF patients, 29.9% had at least one hospital readmission. Among the ML methods, LS-SVM performed the worst, with accuracy in the range of 0.57–0.60, while RF performed the best, with the highest accuracy (range, 0.90–0.91). Other ML methods showed relatively good performance, with accuracy exceeding 0.84 in the test datasets. Furthermore, the performance of the SVM and LS-SVM methods in terms of accuracy was higher with the multiple imputation method than with the median imputation method.Conclusions: This study showed that RF performed better, in terms of accuracy, than other methods for predicting hospital readmission in HF patients.

Download Full-text

Application of Machine Learning for Predicting Anastomotic Leakage in Patients with Gastric Adenocarcinoma Who Received Total or Proximal Gastrectomy

Journal of Personalized Medicine ◽

10.3390/jpm11080748 ◽

2021 ◽

Vol 11 (8) ◽

pp. 748

Author(s):

Shengli Shao ◽

Lu Liu ◽

Yufeng Zhao ◽

Lei Mu ◽

Qiyi Lu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Anastomotic Leakage ◽

Gastric Adenocarcinoma ◽

Predictive Value ◽

High Performance ◽

Characteristic Curve ◽

Proximal Gastrectomy ◽

Support Vector ◽

Sensitivity Specificity

Anastomotic leakage is a life-threatening complication in patients with gastric adenocarcinoma who received total or proximal gastrectomy, and there is still no model accurately predicting anastomotic leakage. In this study, we aim to develop a high-performance machine learning tool to predict anastomotic leakage in patients with gastric adenocarcinoma received total or proximal gastrectomy. A total of 1660 cases of gastric adenocarcinoma patients who received total or proximal gastrectomy in a large academic hospital from 1 January 2010 to 31 December 2019 were investigated, and these patients were randomly divided into training and testing sets at a ratio of 8:2. Four machine learning models, such as logistic regression, random forest, support vector machine, and XGBoost, were employed, and 24 clinical preoperative and intraoperative variables were included to develop the predictive model. Regarding the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy, random forest had a favorable performance with an AUC of 0.89, a sensitivity of 81.8% and specificity of 82.2% in the testing set. Moreover, we built a web app based on random forest model to achieve real-time predictions for guiding surgeons’ intraoperative decision making.

Download Full-text

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports (Preprint)

10.2196/preprints.12109 ◽

2018 ◽

Author(s):

Sunyang Fu ◽

Lester Y Leung ◽

Yanshan Wang ◽

Anne-Olivia Raulli ◽

David F Kallmes ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Mayo Clinic ◽

Predictive Value ◽

Support Vector ◽

Rule Based ◽

Rule Based System ◽

Sensitivity Specificity

BACKGROUND Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports. OBJECTIVE This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center. METHODS Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. RESULTS A total of 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. CONCLUSIONS We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.

Download Full-text

Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

Remote Sensing ◽

10.3390/rs13040641 ◽

2021 ◽

Vol 13 (4) ◽

pp. 641

Author(s):

Gopal Ramdas Mahajan ◽

Bappa Das ◽

Dayesh Murgaokar ◽

Ittai Herrmann ◽

Katja Berger ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Support Vector ◽

Spectral Indices ◽

Learning Models ◽

Leaf Nutrients ◽

Machine Learning Models

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.

Download Full-text

Kernel Based Data-Adaptive Support Vector Machines for Multi-Class Classification

Mathematics ◽

10.3390/math9090936 ◽

2021 ◽

Vol 9 (9) ◽

pp. 936

Author(s):

Jianli Shao ◽

Xin Liu ◽

Wenqing He

Keyword(s):

Machine Learning ◽

Spatial Association ◽

Class Imbalance ◽

Imbalanced Data ◽

Real Data ◽

Kernel Functions ◽

Support Vector ◽

Classification Problems ◽

Rare Class ◽

Data Adaptive

Imbalanced data exist in many classification problems. The classification of imbalanced data has remarkable challenges in machine learning. The support vector machine (SVM) and its variants are popularly used in machine learning among different classifiers thanks to their flexibility and interpretability. However, the performance of SVMs is impacted when the data are imbalanced, which is a typical data structure in the multi-category classification problem. In this paper, we employ the data-adaptive SVM with scaled kernel functions to classify instances for a multi-class population. We propose a multi-class data-dependent kernel function for the SVM by considering class imbalance and the spatial association among instances so that the classification accuracy is enhanced. Simulation studies demonstrate the superb performance of the proposed method, and a real multi-class prostate cancer image dataset is employed as an illustration. Not only does the proposed method outperform the competitor methods in terms of the commonly used accuracy measures such as the F-score and G-means, but also successfully detects more than 60% of instances from the rare class in the real data, while the competitors can only detect less than 20% of the rare class instances. The proposed method will benefit other scientific research fields, such as multiple region boundary detection.

Download Full-text

Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18147534 ◽

2021 ◽

Vol 18 (14) ◽

pp. 7534

Author(s):

Ke Wang ◽

Qingwen Xue ◽

Jian John Lu

Keyword(s):

Machine Learning ◽

High Risk ◽

Loss Function ◽

Class Imbalance ◽

Support Vector ◽

Trajectory Data ◽

Recognition Model ◽

Learning Framework ◽

Sampling Cost ◽

Automated Machine Learning

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.

Download Full-text

Civil Aeroengine Fault Diagnosis Based on Fuzzy Least Square Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.130-134.2047 ◽

2011 ◽

Vol 130-134 ◽

pp. 2047-2050 ◽

Cited By ~ 1

Author(s):

Hong Chun Qu ◽

Xie Bin Ding

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Support Vector Machine ◽

Fault Diagnosis ◽

Coefficient Matrix ◽

Least Square ◽

Support Vector ◽

Influence Coefficient ◽

Structural Risk ◽

Better Than

SVM(Support Vector Machine) is a new artificial intelligence methodolgy, basing on structural risk mininization principle, which has better generalization than the traditional machine learning and SVM shows powerfulability in learning with limited samples. To solve the problem of lack of engine fault samples, FLS-SVM theory, an improved SVM, which is a method is applied. 10 common engine faults are trained and recognized in the paper.The simulated datas are generated from PW4000-94 engine influence coefficient matrix at cruise, and the results show that the diagnostic accuracy of FLS-SVM is better than LS-SVM.

Download Full-text

Machine learning for identification of surgeries with high risks of cancellation

Health Informatics Journal ◽

10.1177/1460458218813602 ◽

2018 ◽

Vol 26 (1) ◽

pp. 141-155 ◽

Cited By ~ 2

Author(s):

Li Luo ◽

Fengyi Zhang ◽

Yao Yao ◽

RenRong Gong ◽

Martina Fu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Value ◽

Operating Characteristic ◽

Sampling Methods ◽

Characteristic Curve ◽

Support Vector ◽

Chi Square ◽

Stable Performance ◽

Operating Characteristic Curve

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.

Download Full-text

Integration of synthetic minority oversampling technique for imbalanced class

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i1.pp102-108 ◽

2019 ◽

Vol 13 (1) ◽

pp. 102

Author(s):

Noviyanti Santoso ◽

Wahyu Wibowo ◽

Hilda Hikmawati

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Class Imbalance ◽

Original Data ◽

Support Vector ◽

Classification Methods ◽

Problematic Issue ◽

Imbalanced Class ◽

F Measure

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.

Download Full-text

Impacts of multicollinearity on CAPT modalities: An heterogeneous machine learning framework for computer-assisted French phoneme pronunciation training

PLoS ONE ◽

10.1371/journal.pone.0257901 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0257901

Author(s):

Yanjing Bi ◽

Chao Li ◽

Yannick Benezeth ◽

Fan Yang

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machines ◽

Partial Least Square ◽

Least Square ◽

Support Vector ◽

Computer Assisted ◽

Long Distance ◽

Relationship Analysis ◽

Vector Machines

Phoneme pronunciations are usually considered as basic skills for learning a foreign language. Practicing the pronunciations in a computer-assisted way is helpful in a self-directed or long-distance learning environment. Recent researches indicate that machine learning is a promising method to build high-performance computer-assisted pronunciation training modalities. Many data-driven classifying models, such as support vector machines, back-propagation networks, deep neural networks and convolutional neural networks, are increasingly widely used for it. Yet, the acoustic waveforms of phoneme are essentially modulated from the base vibrations of vocal cords, and this fact somehow makes the predictors collinear, distorting the classifying models. A commonly-used solution to address this issue is to suppressing the collinearity of predictors via partial least square regressing algorithm. It allows to obtain high-quality predictor weighting results via predictor relationship analysis. However, as a linear regressor, the classifiers of this type possess very simple topology structures, constraining the universality of the regressors. For this issue, this paper presents an heterogeneous phoneme recognition framework which can further benefit the phoneme pronunciation diagnostic tasks by combining the partial least square with support vector machines. A French phoneme data set containing 4830 samples is established for the evaluation experiments. The experiments of this paper demonstrates that the new method improves the accuracy performance of the phoneme classifiers by 0.21 − 8.47% comparing to state-of-the-arts with different data training data density.

Download Full-text

Development and Validation of Unplanned Extubation Prediction Models Using Intensive Care Unit Data: Comparative Machine Learning Study (Preprint)

10.2196/preprints.23508 ◽

2020 ◽

Author(s):

Sujeong Hur ◽

Ji Young Min ◽

Junsang Yoo ◽

Kyunga Kim ◽

Chi Ryang Chung ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Patient Safety ◽

Intensive Care ◽

Predictive Value ◽

Prediction Models ◽

Support Vector ◽

Unplanned Extubation ◽

Electronic Health Record Data ◽

Icu Patients

BACKGROUND Patient safety in the intensive care unit (ICU) is one of the most critical issues, and unplanned extubation (UE) is considered as the most adverse event for patient safety. Prevention and early detection of such an event is an essential but difficult component of quality care. OBJECTIVE This study aimed to develop and validate prediction models for UE in ICU patients using machine learning. METHODS This study was conducted an academic tertiary hospital in Seoul. The hospital had approximately 2,000 inpatient beds and 120 intensive care unit (ICU) beds. The number of patients, on daily basis, was approximately 9,000 for the out-patient. The number of annual ICU admission was approximately 10,000. We conducted a retrospective study between January 1, 2010 and December 31, 2018. A total of 6,914 extubation cases were included. We developed an unplanned extubation prediction model using machine learning algorithms, which included random forest (RF), logistic regression (LR), artificial neural network (ANN), and support vector machine (SVM). For evaluating the model’s performance, we used area under the receiver operator characteristic curve (AUROC). Sensitivity, specificity, positive predictive value negative predictive value, and F1-score were also determined for each model. For performance evaluation, we also used calibration curve, the Brier score, and the Hosmer-Lemeshow goodness-of-fit statistic. RESULTS Among the 6,914 extubation cases, 248 underwent UE. In the UE group, there were more males than females, higher use of physical restraints, and fewer surgeries. The incidence of UE was more likely to occur during the night shift compared to the planned extubation group. The rate of reintubation within 24 hours and hospital mortality was higher in the UE group. The UE prediction algorithm was developed, and the AUROC for RF was 0.787, for LR was 0.762, for ANN was 0.762, and for SVM was 0.740. CONCLUSIONS We successfully developed and validated machine learning-based prediction models to predict UE in ICU patients using electronic health record data. The best AUROC was 0.787, which was obtained using RF. CLINICALTRIAL N/A

Download Full-text