Artificial intelligence predicts clinically relevant atrial high-rate episodes in patients with cardiac implantable electronic devices

AbstractTo assess the utility of machine learning (ML) algorithms in predicting clinically relevant atrial high-rate episodes (AHREs), which can be recorded by a pacemaker. We aimed to develop ML-based models to predict clinically relevant AHREs based on the clinical parameters of patients with implanted pacemakers in comparison to logistic regression (LR). We included 721 patients without known atrial fibrillation or atrial flutter from a prospective multicenter (11 tertiary hospitals) registry comprising all geographical regions of Korea from September 2017 to July 2020. Predictive models of clinically relevant AHREs were developed using the random forest (RF) algorithm, support vector machine (SVM) algorithm, and extreme gradient boosting (XGB) algorithm. Model prediction training was conducted by seven hospitals, and model performance was evaluated using data from four hospitals. During a median follow-up of 18 months, clinically relevant AHREs were noted in 104 patients (14.4%). The three ML-based models improved the discrimination of the AHREs (area under the receiver operating characteristic curve: RF: 0.742, SVM: 0.675, and XGB: 0.745 vs. LR: 0.669). The XGB model had a greater resolution in the Brier score (RF: 0.008, SVM: 0.008, and XGB: 0.021 vs. LR: 0.013) than the other models. The use of the ML-based models in patient classification was associated with improved prediction of clinically relevant AHREs after pacemaker implantation.

Download Full-text

Hierarchical attention networks for information extraction from cancer pathology reports

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx131 ◽

2017 ◽

Vol 25 (3) ◽

pp. 321-330 ◽

Cited By ~ 29

Author(s):

Shang Gao ◽

Michael T Young ◽

John X Qiu ◽

Hong-Jun Yoon ◽

James B Christian ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Information Extraction ◽

Model Performance ◽

Gradient Boosting ◽

Support Vector ◽

Attention Networks ◽

Cancer Pathology ◽

Extreme Gradient Boosting ◽

Pathology Reports

Abstract Objective We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufﬁciently capture syntactic and semantic contexts from free-text documents. Materials and Methods Data for our analyses were obtained from 942 deidentiﬁed pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classiﬁcation, matched to G1–G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w/A), and a convolutional neural network. Results Our results demonstrate that for both information tasks, HAN performed signiﬁcantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macroF-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w/A (0.637, 0.471), and convolutional neural network (0.714, 0.460). Conclusions HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.

Download Full-text

A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya

10.20944/preprints202010.0186.v1 ◽

2020 ◽

Author(s):

Nelson Yego ◽

Juma Kasozi ◽

Joseph Nkrunziza

Keyword(s):

Machine Learning ◽

Random Forest ◽

Characteristic Curve ◽

Confusion Matrix ◽

Gradient Boosting ◽

Support Vector ◽

Sampled Data ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.

Download Full-text

Predicting CO2 Trapping Efficiency In Saline Aquifers By Machine Learning System: Implication To Carbon Sequestration

10.21203/rs.3.rs-841564/v1 ◽

2021 ◽

Author(s):

Hung Vo-Thanh ◽

Kang-Kun Lee

Keyword(s):

Machine Learning ◽

Model Performance ◽

Learning System ◽

Trapping Efficiency ◽

Saline Aquifers ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting ◽

Data Points ◽

Saline Formation

Abstract Carbon dioxide (CO2) storage in saline formations has been identified as a practical approach to reducing CO2 levels in the atmosphere. The residual and solubility of CO2 in deep saline aquifers are essential mechanisms to enhance security in storing CO2. In this research, CO2 residual and solubility in saline formations have been predicted by adapting three Machine Learning models called Random Forest (RF), extreme gradient boosting (XGboost), and Support Vector Regression (SVR). Consequently, a diversity of the field-scale simulation database including 1509 data samples retrieved from reliable studies, was considered to train and test the proposed models to achieve this task. Graphical and statistical indicators were evaluated and compared the predictive ML model performance. The predicted results denoted that the proposed ML models are ranked from high to low as follows: XGboost>RF>SVR. Additionally, the performance analyses revealed that the XGboost model demonstrates higher accuracy in predicting CO2 trapping efficiency in saline formation than previous ML models. The XGboost model yields very low root mean square error (RMSE) and R2 for both residual and solubility trapping efficiency. At last, the applicable domain of XGboost model was validated, and only 24 suspected data points were recognized from the entire databank.

Download Full-text

Machine learning to predict distal caries in mandibular second molars associated with impacted third molars

Scientific Reports ◽

10.1038/s41598-021-95024-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sung-Hwi Hur ◽

Eun-Young Lee ◽

Min-Kyung Kim ◽

Somi Kim ◽

Ji-Yeon Kang ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Decision Making ◽

Prediction Models ◽

Contact Point ◽

Characteristic Curve ◽

Gradient Boosting ◽

Support Vector ◽

Third Molars ◽

Extreme Gradient Boosting

AbstractImpacted mandibular third molars (M3M) are associated with the occurrence of distal caries on the adjacent mandibular second molars (DCM2M). In this study, we aimed to develop and validate five machine learning (ML) models designed to predict the occurrence of DCM2Ms due to the proximity with M3Ms and determine the relative importance of predictive variables for DCM2Ms that are important for clinical decision making. A total of 2642 mandibular second molars adjacent to M3Ms were analyzed and DCM2Ms were identified in 322 cases (12.2%). The models were trained using logistic regression, random forest, support vector machine, artificial neural network, and extreme gradient boosting ML methods and were subsequently validated using testing datasets. The performance of the ML models was significantly superior to that of single predictors. The area under the receiver operating characteristic curve of the machine learning models ranged from 0.88 to 0.89. Six features (sex, age, contact point at the cementoenamel junction, angulation of M3Ms, Winter's classification, and Pell and Gregory classification) were identified as relevant predictors. These prediction models could be used to detect patients at a high risk of developing DCM2M and ultimately contribute to caries prevention and treatment decision-making for impacted M3Ms.

Download Full-text

Prediction of Post-Intubation Tachycardia Using Machine-Learning Models

Applied Sciences ◽

10.3390/app10031151 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1151

Author(s):

Hanna Kim ◽

Young-Seob Jeong ◽

Ah Reum Kang ◽

Woohyun Jung ◽

Yang Hoon Chung ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Characteristic Curve ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Feature Sets ◽

Extreme Gradient Boosting ◽

Post Intubation ◽

Machine Learning Models

Tachycardia is defined as a heart rate greater than 100 bpm for more than 1 min. Tachycardia often occurs after endotracheal intubation and can cause serious complication in patients with cardiovascular disease. The ability to predict post-intubation tachycardia would help clinicians by notifying a potential event to pre-treat. In this paper, we predict the potential post-intubation tachycardia. Given electronic medical record and vital signs collected before tracheal intubation, we predict whether post-intubation tachycardia will occur within 10 min. Of 1931 available patient datasets, 257 remained after filtering those with inappropriate data such as outliers and inappropriate annotations. Three feature sets were designed using feature selection algorithms, and two additional feature sets were defined by statistical inspection or manual examination. The five feature sets were compared with various machine learning models such as naïve Bayes classifiers, logistic regression, random forest, support vector machines, extreme gradient boosting, and artificial neural networks. Parameters of the models were optimized for each feature set. By 10-fold cross validation, we found that an logistic regression model with eight-dimensional hand-crafted features achieved an accuracy of 80.5%, recall of 85.1%, precision of 79.9%, an F1 score of 79.9%, and an area under the receiver operating characteristic curve of 0.85.

Download Full-text

Machine Learning Based Prediction of Insufficient Herbage Allowance with Automated Feeding Behaviour and Activity Data

Sensors ◽

10.3390/s19204479 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4479 ◽

Cited By ~ 1

Author(s):

Abu Zar Shafiullah ◽

Jessica Werner ◽

Emer Kennedy ◽

Lorenzo Leso ◽

Bernadette O’Brien ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Characteristic Curve ◽

Training Data ◽

Sensor Data ◽

Gradient Boosting ◽

Support Vector ◽

Pasture Management ◽

Activity Data ◽

Extreme Gradient Boosting

Sensor technologies that measure grazing and ruminating behaviour as well as physical activities of individual cows are intended to be included in precision pasture management. One of the advantages of sensor data is they can be analysed to support farmers in many decision-making processes. This article thus considers the performance of a set of RumiWatchSystem recorded variables in the prediction of insufficient herbage allowance for spring calving dairy cows. Several commonly used models in machine learning (ML) were applied to the binary classification problem, i.e., sufficient or insufficient herbage allowance, and the predictive performance was compared based on the classification evaluation metrics. Most of the ML models and generalised linear model (GLM) performed similarly in leave-out-one-animal (LOOA) approach to validation studies. However, cross validation (CV) studies, where a portion of features in the test and training data resulted from the same cows, revealed that support vector machine (SVM), random forest (RF) and extreme gradient boosting (XGBoost) performed relatively better than other candidate models. In general, these ML models attained 88% AUC (area under receiver operating characteristic curve) and around 80% sensitivity, specificity, accuracy, precision and F-score. This study further identified that number of rumination chews per day and grazing bites per minute were the most important predictors and examined the marginal effects of the variables on model prediction towards a decision support system.

Download Full-text

Machine Learning Model to Identify Sepsis Patients in the Emergency Department: Algorithm Development and Validation

Journal of Personalized Medicine ◽

10.3390/jpm11111055 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1055

Author(s):

Pei-Chen Lin ◽

Kuo-Tai Chen ◽

Huan-Chieh Chen ◽

Md. Mohaimenul Islam ◽

Ming-Chin Lin

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Characteristic Curve ◽

External Validation ◽

Model Performance ◽

Learning Model ◽

Gradient Boosting ◽

Machine Learning Model ◽

Extreme Gradient Boosting ◽

Development And Validation

Accurate stratification of sepsis can effectively guide the triage of patient care and shared decision making in the emergency department (ED). However, previous research on sepsis identification models focused mainly on ICU patients, and discrepancies in model performance between the development and external validation datasets are rarely evaluated. The aim of our study was to develop and externally validate a machine learning model to stratify sepsis patients in the ED. We retrospectively collected clinical data from two geographically separate institutes that provided a different level of care at different time periods. The Sepsis-3 criteria were used as the reference standard in both datasets for identifying true sepsis cases. An eXtreme Gradient Boosting (XGBoost) algorithm was developed to stratify sepsis patients and the performance of the model was compared with traditional clinical sepsis tools; quick Sequential Organ Failure Assessment (qSOFA) and Systemic Inflammatory Response Syndrome (SIRS). There were 8296 patients (1752 (21%) being septic) in the development and 1744 patients (506 (29%) being septic) in the external validation datasets. The mortality of septic patients in the development and validation datasets was 13.5% and 17%, respectively. In the internal validation, XGBoost achieved an area under the receiver operating characteristic curve (AUROC) of 0.86, exceeding SIRS (0.68) and qSOFA (0.56). The performance of XGBoost deteriorated in the external validation (the AUROC of XGBoost, SIRS and qSOFA was 0.75, 0.57 and 0.66, respectively). Heterogeneity in patient characteristics, such as sepsis prevalence, severity, age, comorbidity and infection focus, could reduce model performance. Our model showed good discriminative capabilities for the identification of sepsis patients and outperformed the existing sepsis identification tools. Implementation of the ML model in the ED can facilitate timely sepsis identification and treatment. However, dataset discrepancies should be carefully evaluated before implementing the ML approach in clinical practice. This finding reinforces the necessity for future studies to perform external validation to ensure the generalisability of any developed ML approaches.

Download Full-text

Machine-Learning-Based Prediction of Corrosion Behavior in Additively Manufactured Inconel 718

Data ◽

10.3390/data6080080 ◽

2021 ◽

Vol 6 (8) ◽

pp. 80

Author(s):

O. V. Mythreyi ◽

M. Rohith Srinivaas ◽

Tigga Amit Kumar ◽

R. Jayaganthan

Keyword(s):

Machine Learning ◽

Corrosion Behavior ◽

Inconel 718 ◽

Polynomial Regression ◽

Research Work ◽

Model Performance ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

This research work focuses on machine-learning-assisted prediction of the corrosion behavior of laser-powder-bed-fused (LPBF) and postprocessed Inconel 718. Corrosion testing data of these specimens were collected and fit into the following machine learning algorithms: polynomial regression, support vector regression, decision tree, and extreme gradient boosting. The model performance, after hyperparameter optimization, was evaluated using a set of established metrics: R2, mean absolute error, and root mean square error. Among the algorithms, the extreme gradient boosting algorithm performed best in predicting the corrosion behavior, closely followed by other algorithms. Feature importance analysis was executed in order to determine the postprocessing parameters that influenced the most the corrosion behavior in Inconel 718 manufactured by LPBF.

Download Full-text

An Interpretable Aid Decision-Making Model for Flag State Control Ship Detention Based on SMOTE and XGBoost

Journal of Marine Science and Engineering ◽

10.3390/jmse9020156 ◽

2021 ◽

Vol 9 (2) ◽

pp. 156

Author(s):

Jian He ◽

Yong Hao ◽

Xiaoqiong Wang

Keyword(s):

Machine Learning ◽

Decision Making ◽

Model Performance ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

State Control ◽

Extreme Gradient Boosting ◽

Decision Making Model ◽

Flag State

The reasonable decision of ship detention plays a vital role in flag state control (FSC). Machine learning algorithms can be applied as aid tools for identifying ship detention. In this study, we propose a novel interpretable ship detention decision-making model based on machine learning, termed SMOTE-XGBoost-Ship detention model (SMO-XGB-SD), using the extreme gradient boosting (XGBoost) algorithm and the synthetic minority oversampling technique (SMOTE) algorithm to identify whether a ship should be detained. Our verification results show that the SMO-XGB-SD algorithm outperforms random forest (RF), support vector machine (SVM), and logistic regression (LR) algorithm. In addition, the new algorithm also provides a reasonable interpretation of model performance and highlights the most important features for identifying ship detention using the Shapley additive explanations (SHAP) algorithm. The SMO-XGB-SD model provides an effective basis for aiding decisions on ship detention by inland flag state control officers (FSCOs) and the ship safety management of ship operating companies, as well as training services for new FSCOs in maritime organizations.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text