A Study of Machine-Learning Classifiers for Hypertension Based on Radial Pulse Wave

Objective. In this study, machine learning was utilized to classify and predict pulse wave of hypertensive group and healthy group and assess the risk of hypertension by observing the dynamic change of the pulse wave and provide an objective reference for clinical application of pulse diagnosis in traditional Chinese medicine (TCM). Method. The basic information from 450 hypertensive cases and 479 healthy cases was collected by self-developed H20 questionnaires and pulse wave information was acquired by self-developed pulse diagnostic instrument (PDA-1). H20 questionnaires and pulse wave information were used as input variables to obtain different machine learning classification models of hypertension. This method was aimed at analyzing the influence of pulse wave on the accuracy and stability of machine learning model, as well as the feature contribution of hypertension model after removing noise by K-means. Result. Compared with the classification results before removing noise, the accuracy and the area under the curve (AUC) had been improved. The accuracy rates of AdaBoost, Gradient Boosting, and Random Forest (RF) were 86.41%, 86.41%, and 85.33%, respectively. AUC were 0.86, 0.86, and 0.85, respectively. The maximum accuracy of SVM increased from 79.57% to 83.15%, and the AUC stability increased from 0.79 to 0.83. In addition, the features of importance on traditional statistics and machine learning were consistent. After removing noise, the features with large changes were h1/t1, w1/t, t, w2, h2, t1, and t5 in AdaBoost and Gradient Boosting (top10). The common variables for machine learning and traditional statistics were h1/t1, h5, t, Ad, BMI, and t2. Conclusion. Pulse wave-based diagnostic method of hypertension has significant value in reference. In view of the feasibility of digital-pulse-wave diagnosis and dynamically evaluating hypertension, it provides the research direction and foundation for Chinese medicine in the dynamic evaluation of modern disease diagnosis and curative effect.

Download Full-text

Gully Erosion Susceptibility Mapping in Highly Complex Terrain Using Machine Learning Models

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100680 ◽

2021 ◽

Vol 10 (10) ◽

pp. 680

Author(s):

Annan Yang ◽

Chunmei Wang ◽

Guowei Pang ◽

Yongqing Long ◽

Lei Wang ◽

...

Keyword(s):

Machine Learning ◽

Complex Terrain ◽

Large Scale ◽

Area Under The Curve ◽

Gully Erosion ◽

Susceptibility Mapping ◽

Weight Of Evidence ◽

Gradient Boosting ◽

Machine Learning Classification ◽

Extreme Gradient Boosting

Gully erosion is the most severe type of water erosion and is a major land degradation process. Gully erosion susceptibility mapping (GESM)’s efficiency and interpretability remains a challenge, especially in complex terrain areas. In this study, a WoE-MLC model was used to solve the above problem, which combines machine learning classification algorithms and the statistical weight of evidence (WoE) model in the Loess Plateau. The three machine learning (ML) algorithms utilized in this research were random forest (RF), gradient boosted decision trees (GBDT), and extreme gradient boosting (XGBoost). The results showed that: (1) GESM were well predicted by combining both machine learning regression models and WoE-MLC models, with the area under the curve (AUC) values both greater than 0.92, and the latter was more computationally efficient and interpretable; (2) The XGBoost algorithm was more efficient in GESM than the other two algorithms, with the strongest generalization ability and best performance in avoiding overfitting (averaged AUC = 0.947), followed by the RF algorithm (averaged AUC = 0.944), and GBDT algorithm (averaged AUC = 0.938); and (3) slope gradient, land use, and altitude were the main factors for GESM. This study may provide a possible method for gully erosion susceptibility mapping at large scale.

Download Full-text

Development of Machine Learning Strategy for Predicting the Risk Range of Ship’s Berthing Velocity

Journal of Marine Science and Engineering ◽

10.3390/jmse8050376 ◽

2020 ◽

Vol 8 (5) ◽

pp. 376

Author(s):

Hyeong-Tak Lee ◽

Jeong-Seok Lee ◽

Woo-Ju Son ◽

Ik-Soon Cho

Keyword(s):

Machine Learning ◽

Learning Strategy ◽

Characteristic Curve ◽

Confusion Matrix ◽

Area Under The Curve ◽

Gradient Boosting ◽

Classification Algorithms ◽

Factors Affecting ◽

Machine Learning Classification ◽

The Republic

Ships are prone to accidents when approaching in a berthing velocity greater than that allowed when determining the risk range corresponding to a port. Therefore, this study develops a machine learning strategy to predict the risk range of an unsafe berthing velocity when the ship approaches in port. To perform analysis, the input parameters were based on the factors affecting the berthing velocity, and the output parameter, i.e., the berthing velocity, was measured at a tanker terminal in the Republic of Korea. Nine machine learning classification algorithms were used to analyze each model, and the top four optimal models were selected through evaluation methods based on the confusion matrix. As a result of the analysis, extra trees, random forest, bagging, and gradient boosting classifiers were identified as good models. As a result of testing using the receiving operator characteristic curve, it was confirmed that the area under the curve of the most dangerous range of berthing velocity was the highest, thus, the risk range was appropriately classified. As such, the derived models can classify and predict the risk range of unsafe berthing velocity before approaching a port; therefore, it is possible to safely berth a ship.

Download Full-text

A Machine Learning Framework Based on Extreme Gradient Boosting for Intelligent Alzheimer’s Disease Diagnosis Using Structure MRI

10.1007/978-3-030-75506-5_66 ◽

2021 ◽

pp. 815-827

Author(s):

Hong Ong ◽

Hoang Le ◽

Hoang Nguyen ◽

Dong Nguyen ◽

Huong Ha ◽

...

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Disease Diagnosis ◽

Gradient Boosting ◽

Learning Framework ◽

Extreme Gradient Boosting ◽

Alzheimer’S Disease Diagnosis

Download Full-text

Prediction of Adverse Events in Stable Non-Variceal Gastrointestinal Bleeding Using Machine Learning

Journal of Clinical Medicine ◽

10.3390/jcm9082603 ◽

2020 ◽

Vol 9 (8) ◽

pp. 2603 ◽

Cited By ~ 1

Author(s):

Dong-Woo Seo ◽

Hahn Yi ◽

Beomhee Park ◽

Youn-Jung Kim ◽

Dae Ho Jung ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Adverse Events ◽

Gastrointestinal Bleeding ◽

Area Under The Curve ◽

Scoring Systems ◽

Hemodynamic Instability ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Importance Analysis

Clinical risk-scoring systems are important for identifying patients with upper gastrointestinal bleeding (UGIB) who are at a high risk of hemodynamic instability. We developed an algorithm that predicts adverse events in patients with initially stable non-variceal UGIB using machine learning (ML). Using prospective observational registry, 1439 out of 3363 consecutive patients were enrolled. Primary outcomes included adverse events such as mortality, hypotension, and rebleeding within 7 days. Four machine learning algorithms, namely, logistic regression with regularization (LR), random forest classifier (RF), gradient boosting classifier (GB), and voting classifier (VC), were compared with the Glasgow–Blatchford score (GBS) and Rockall scores. The RF model showed the highest accuracies and significant improvement over conventional methods for predicting mortality (area under the curve: RF 0.917 vs. GBS 0.710), but the performance of the VC model was best in hypotension (VC 0.757 vs. GBS 0.668) and rebleeding within 7 days (VC 0.733 vs. GBS 0.694). Clinically significant variables including blood urea nitrogen, albumin, hemoglobin, platelet, prothrombin time, age, and lactate were identified by the global feature importance analysis. These results suggest that ML models will be useful early predictive tools for identifying high-risk patients with initially stable non-variceal UGIB admitted at an emergency department.

Download Full-text

SAT-LB121 Development of a Machine-Learning Method for Predicting New Onset of Diabetes Mellitus: A Retrospective Analysis of 509,153 Annual Specific Health Checkup Records

Journal of the Endocrine Society ◽

10.1210/jendso/bvaa046.2194 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

Author(s):

Akihiro Nomura ◽

Sho Yamamoto ◽

Yuta Hayakawa ◽

Kouki Taniguchi ◽

Takuya Higashitani ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Prediction Model ◽

Performance Test ◽

Bootstrap Method ◽

Area Under The Curve ◽

Training Dataset ◽

Gradient Boosting ◽

Health Checkup ◽

Specific Health

Abstract Diabetes mellitus (DM) is a chronic disorder, characterized by impaired glucose metabolism. It is linked to increased risks of several diseases such as atrial fibrillation, cancer, and cardiovascular diseases. Therefore, DM prevention is essential. However, the traditional regression-based DM-onset prediction methods are incapable of investigating future DM for generally healthy individuals without DM. Employing gradient-boosting decision trees, we developed a machine learning-based prediction model to identify the DM signatures, prior to the onset of DM. We employed the nationwide annual specific health checkup records, collected during the years 2008 to 2018, from Kanazawa city, Ishikawa, Japan. The data included the physical examinations, blood and urine tests, and participant questionnaires. Individuals without DM (at baseline), who underwent more than two annual health checkups during the said period, were included. The new cases of DM onset were recorded when the participants were diagnosed with DM in the annual check-ups. The dataset was divided into three subsets in a 6:2:2 ratio to constitute the training, tuning (internal validation), and testing datasets. Employing the testing dataset, the ability of our trained prediction model to calculate the area under the curve (AUC), precision, recall, F1 score, and overall accuracy was evaluated. Using a 1,000-iteration bootstrap method, every performance test resulted in a two-sided 95% confidence interval (CI). We included 509,153 annual health checkup records of 139,225 participants. Among them, 65,505 participants without DM were included, which constituted36,303 participants in the training dataset and 13,101 participants in each of the tuning and testing datasets. We identified a total of 4,696 new DM-onset patients (7.2%) in the study period. Our trained model predicted the future incidence of DM with the AUC, precision, recall, F1 score, and overall accuracy of 0.71 (0.69-0.72 with 95% CI), 75.3% (71.6-78.8), 42.2% (39.3-45.2), 54.1% (51.2-56.7), and 94.9% (94.5-95.2), respectively. In conclusion, the machine learning-based prediction model satisfactorily identified the DM onset prior to the actual incidence.

Download Full-text

Prediction of hepatocellular carcinoma patient survival using machine learning classification rules.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e15649 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e15649-e15649

Author(s):

Wei Zhou ◽

Huan Chen ◽

Wenbo Han ◽

Ji He ◽

Henghui Zhang

Keyword(s):

Machine Learning ◽

Hepatocellular Carcinoma ◽

Gradient Boosting ◽

Great Promise ◽

Survival Prediction ◽

Surgical Removal ◽

Features Selection ◽

Machine Learning Classification ◽

Learning Techniques ◽

Pathologic Features

e15649 Background: The outcome prediction of hepatocellular carcinoma (HCC) is conventionally determined by evaluating tissue samples obtained during surgical removal of the primary tumor focusing on their clinical and pathologic features. Recently, accumulating evidence suggests that cancer development is comprehensively modulated by the host’s immune system underlying the importance of immunological biomarkers for the prediction of HCC prognosis. However, an integrated predictive algorism incorporating clinical characteristic and immune features still remain to be established. Methods: We obtained respectable stage II HCC specimens, along with adjacent para-tumor tissues from 221 patients who underwent surgical resection at Eastern Hepatobiliary Surgery Hospital, (Shanghai, China) from 2015 through April 2018. Characteristics such as CD8+, CD163+, tumor-infiltrating lymphocytes (TILs) were obtained for further model construction used to predict the status of 3 survival indexes: Overall Survival (OS ,≤ 24 or > 24 month), Progression Free Survival (PFS, ≤ 6 or > 6 month), and Recurrence/Death (RD). Mutual information and coefficient between each feature and the survival indexes were tested to remove low scoring features after data cleaning and standardization. Furthermore, recursive features selection was preformed to obtain the optimal features combination. Finally, supervised learning techniques include either boosting or bagging strategy were used to fit and predict model with a grid-search method optimizing the parameters. Meanwhile, a cross validation procedure with 0.2 proportion of test cohort was randomly carried out for 10 times to evaluate the model. Results: We finally confirmed 15 biomarkers from the 46 candidates as features for the survival status prediction by using a 221 patients cohort. Among them, the top 10 most important biomarkers, included both clinical and immune attributes. The AUC of our model for survival indexes (OS, PFS, RD) was ranged from 0.76 (RD) to 0.8 (PFS), and the accuracy was above 0.85. Conclusions: We describe the integrative analysis of the clinical and immune features which collectively contribute to the survival index of HCC. Machine learning techniques, such as Gradient Boosting and random forest classifier , have a great promise for using in HCC cancer survival prediction.

Download Full-text

Machine Learning–Based Signal Quality Evaluation of Single-Period Radial Artery Pulse Waves: Model Development and Validation (Preprint)

10.2196/preprints.18134 ◽

2020 ◽

Author(s):

Xiaodong Ding ◽

Feng Cheng ◽

Robert Morris ◽

Cong Chen ◽

Yiqin Wang

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Radial Artery ◽

Pulse Wave ◽

Disease Diagnosis ◽

Recursive Feature Elimination ◽

Physiological Parameter ◽

Support Vector ◽

External Interference ◽

Pulse Waves

BACKGROUND The radial artery pulse wave is a widely used physiological signal for disease diagnosis and personal health monitoring because it provides insight into the overall health of the heart and blood vessels. Periodic radial artery pulse signals are subsequently decomposed into single pulse wave periods (segments) for physiological parameter evaluations. However, abnormal periods frequently arise due to external interference, the inherent imperfections of current segmentation methods, and the quality of the pulse wave signals. OBJECTIVE The objective of this paper was to develop a machine learning model to detect abnormal pulse periods in real clinical data. METHODS Various machine learning models, such as k-nearest neighbor, logistic regression, and support vector machines, were applied to classify the normal and abnormal periods in 8561 segments extracted from the radial pulse waves of 390 outpatients. The recursive feature elimination method was used to simplify the classifier. RESULTS It was found that a logistic regression model with only four input features can achieve a satisfactory result. The area under the receiver operating characteristic curve from the test set was 0.9920. In addition, these classifiers can be easily interpreted. CONCLUSIONS We expect that this model can be applied in smart sport watches and watchbands to accurately evaluate human health status.

Download Full-text

Count Vectorized Spam and Ham Discernment of Short Message Service using Machine Learning Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7287.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 557-561

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Gradient Boosting ◽

Svm Classifier ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Machine Learning Classification ◽

Tree Classifier

With the growing volume and the amount of spam message, the demand for identifying the effective method for spam detection is in claim. The growth of mobile phone and Smartphone has led to the drastic increase in the SMS spam messages. The advancement and the clean process of mobile message servicing channel have attracted the hackers to perform their hacking through SMS messages. This leads to the fraud usage of other accounts and transaction that result in the loss of service and profit to the owners. With this background, this paper focuses on predicting the Spam SMS messages. The SMS Spam Message Detection dataset from KAGGLE machine learning Repository is used for prediction analysis. The analysis of Spam message detection is achieved in four ways. Firstly, the distribution of the target variable Spam Type the dataset is identified and represented by the graphical notations. Secondly, the top word features for the Spam and Ham messages in the SMS messages is extracted using Count Vectorizer and it is displayed using spam and Ham word cloud. Thirdly, the extracted Counter vectorized feature importance SMS Spam Message detection dataset is fitted to various classifiers like KNN classifier, Random Forest classifier, Linear SVM classifier, Ada Boost classifier, Kernel SVM classifier, Logistic Regression classifier, Gaussian Naive Bayes classifier, Decision Tree classifier, Extra Tree classifier, Gradient Boosting classifier and Multinomial Naive Bayes classifier. Performance analysis is done by analyzing the performance metrics like Accuracy, FScore, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator. Experimental Results shows that the Multinomial Naive Bayes classifier have achieved the effective prediction with the precision of 0.98, recall of 0.98, FScore of 0.98 , and Accuracy of 98.20%..

Download Full-text

Machine Learning–Based Signal Quality Evaluation of Single-Period Radial Artery Pulse Waves: Model Development and Validation

JMIR Medical Informatics ◽

10.2196/18134 ◽

2020 ◽

Vol 8 (6) ◽

pp. e18134

Author(s):

Xiaodong Ding ◽

Feng Cheng ◽

Robert Morris ◽

Cong Chen ◽

Yiqin Wang

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Radial Artery ◽

Pulse Wave ◽

Disease Diagnosis ◽

Recursive Feature Elimination ◽

Physiological Parameter ◽

Support Vector ◽

External Interference ◽

Pulse Waves

Background The radial artery pulse wave is a widely used physiological signal for disease diagnosis and personal health monitoring because it provides insight into the overall health of the heart and blood vessels. Periodic radial artery pulse signals are subsequently decomposed into single pulse wave periods (segments) for physiological parameter evaluations. However, abnormal periods frequently arise due to external interference, the inherent imperfections of current segmentation methods, and the quality of the pulse wave signals. Objective The objective of this paper was to develop a machine learning model to detect abnormal pulse periods in real clinical data. Methods Various machine learning models, such as k-nearest neighbor, logistic regression, and support vector machines, were applied to classify the normal and abnormal periods in 8561 segments extracted from the radial pulse waves of 390 outpatients. The recursive feature elimination method was used to simplify the classifier. Results It was found that a logistic regression model with only four input features can achieve a satisfactory result. The area under the receiver operating characteristic curve from the test set was 0.9920. In addition, these classifiers can be easily interpreted. Conclusions We expect that this model can be applied in smart sport watches and watchbands to accurately evaluate human health status.

Download Full-text

Predicting 1-Hour Thrombolysis Effect of r-tPA in Patients With Acute Ischemic Stroke Using Machine Learning Algorithm

Frontiers in Pharmacology ◽

10.3389/fphar.2021.759782 ◽

2022 ◽

Vol 12 ◽

Author(s):

Bin Zhu ◽

Jianlei Zhao ◽

Mingnan Cao ◽

Wanliang Du ◽

Liuqing Yang ◽

...

Keyword(s):

Machine Learning ◽

Ischemic Stroke ◽

Acute Ischemic Stroke ◽

Learning Algorithm ◽

Degradation Products ◽

Early Stage ◽

Area Under The Curve ◽

Recursive Feature Elimination ◽

Gradient Boosting ◽

Data Sets

Background: Thrombolysis with r-tPA is recommended for patients after acute ischemic stroke (AIS) within 4.5 h of symptom onset. However, only a few patients benefit from this therapeutic regimen. Thus, we aimed to develop an interpretable machine learning (ML)–based model to predict the thrombolysis effect of r-tPA at the super-early stage.Methods: A total of 353 patients with AIS were divided into training and test data sets. We then used six ML algorithms and a recursive feature elimination (RFE) method to explore the relationship among the clinical variables along with the NIH stroke scale score 1 h after thrombolysis treatment. Shapley additive explanations and local interpretable model–agnostic explanation algorithms were applied to interpret the ML models and determine the importance of the selected features.Results: Altogether, 353 patients with an average age of 63.0 (56.0–71.0) years were enrolled in the study. Of these patients, 156 showed a favorable thrombolysis effect and 197 showed an unfavorable effect. A total of 14 variables were enrolled in the modeling, and 6 ML algorithms were used to predict the thrombolysis effect. After RFE screening, seven variables under the gradient boosting decision tree (GBDT) model (area under the curve = 0.81, specificity = 0.61, sensitivity = 0.9, and F1 score = 0.79) demonstrated the best performance. Of the seven variables, activated partial thromboplastin clotting time (time), B-type natriuretic peptide, and fibrin degradation products were the three most important clinical characteristics that might influence r-tPA efficiency.Conclusion: This study demonstrated that the GBDT model with the seven variables could better predict the early thrombolysis effect of r-tPA.

Download Full-text