Comparing Performance of Ensemble-Based Machine Learning Algorithms to Identify Potential Obesity Risk Factors from Public Health Datasets

Author(s):  
Ayan Chatterjee ◽  
Martin W. Gerdes ◽  
Andreas Prinz ◽  
Santiago G. Martinez
2021 ◽  
Vol 9 ◽  
Author(s):  
Huanhuan Zhao ◽  
Xiaoyu Zhang ◽  
Yang Xu ◽  
Lisheng Gao ◽  
Zuchang Ma ◽  
...  

Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.


2013 ◽  
Vol 99 (3) ◽  
pp. S4 ◽  
Author(s):  
Joseph Lee ◽  
Jennifer Cohen ◽  
Hrishikesh Karvir ◽  
Piraye Yurttas Beim ◽  
Jason Barritt ◽  
...  

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Bum-Joo Cho ◽  
Kyoung Min Kim ◽  
Sanchir-Erdene Bilegsaikhan ◽  
Yong Joon Suh

Abstract Febrile neutropenia (FN) is one of the most concerning complications of chemotherapy, and its prediction remains difficult. This study aimed to reveal the risk factors for and build the prediction models of FN using machine learning algorithms. Medical records of hospitalized patients who underwent chemotherapy after surgery for breast cancer between May 2002 and September 2018 were selectively reviewed for development of models. Demographic, clinical, pathological, and therapeutic data were analyzed to identify risk factors for FN. Using machine learning algorithms, prediction models were developed and evaluated for performance. Of 933 selected inpatients with a mean age of 51.8 ± 10.7 years, FN developed in 409 (43.8%) patients. There was a significant difference in FN incidence according to age, staging, taxane-based regimen, and blood count 5 days after chemotherapy. The area under the curve (AUC) built based on these findings was 0.870 on the basis of logistic regression. The AUC improved by machine learning was 0.908. Machine learning improves the prediction of FN in patients undergoing chemotherapy for breast cancer compared to the conventional statistical model. In these high-risk patients, primary prophylaxis with granulocyte colony-stimulating factor could be considered.


2020 ◽  
Vol 134 ◽  
pp. e325-e338 ◽  
Author(s):  
Farrokh Farrokhi ◽  
Quinlan D. Buchlak ◽  
Matt Sikora ◽  
Nazanin Esmaili ◽  
Maria Marsans ◽  
...  

Author(s):  
Junggu Choi ◽  
Seoyoung Cho ◽  
Inhwan Ko ◽  
Sanghoon Han

Investigating suicide risk factors is critical for socioeconomic and public health, and many researchers have tried to identify factors associated with suicide. In this study, the risk factors for suicidal ideation were compared, and the contributions of different factors to suicidal ideation and attempt were investigated. To reflect the diverse characteristics of the population, the large-scale and longitudinal dataset used in this study included both socioeconomic and clinical variables collected from the Korean public. Three machine learning algorithms (XGBoost classifier, support vector classifier, and logistic regression) were used to detect the risk factors for both suicidal ideation and attempt. The importance of the variables was determined using the model with the best classification performance. In addition, a novel risk-factor score, calculated from the rank and importance scores of each variable, was proposed. Socioeconomic and sociodemographic factors showed a high correlation with risks for both ideation and attempt. Mental health variables ranked higher than other factors in suicidal attempts, posing a relatively higher suicide risk than ideation. These trends were further validated using the conditions from the integrated and yearly dataset. This study provides novel insights into suicidal risk factors for suicidal ideations and attempts.


Sign in / Sign up

Export Citation Format

Share Document