scholarly journals Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

Author(s):  
Cheng-Chien Lai ◽  
Wei-Hsin Huang ◽  
Betty Chia-Chen Chang ◽  
Lee-Ching Hwang

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Susan Idicula-Thomas ◽  
Ulka Gawde ◽  
Prabhat Jha

Abstract Background Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). Methods From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. Results SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. Conclusions Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.


Symmetry ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 728 ◽  
Author(s):  
Lijuan Yan ◽  
Yanshen Liu

Student performance prediction has become a hot research topic. Most of the existing prediction models are built by a machine learning method. They are interested in prediction accuracy but pay less attention to interpretability. We propose a stacking ensemble model to predict and analyze student performance in academic competition. In this model, student performance is classified into two symmetrical categorical classes. To improve accuracy, three machine learning algorithms, including support vector machine (SVM), random forest, and AdaBoost are established in the first level and then integrated by logistic regression via stacking. A feature importance analysis was applied to identify important variables. The experimental data were collected from four academic years in Hankou University. According to comparative studies on five evaluation metrics (precision, recall, F1, error, and area   under   the   receiver   operating   characteristic   curve ( AUC ) in this analysis, the proposed model generally performs better than compared models. The important variables identified from the analysis are interpretable, they can be used as guidance to select potential students.


Author(s):  
Nabil Mohamed Eldakhly ◽  
Magdy Aboul-Ela ◽  
Areeg Abdalla

The particulate matter air pollutant of diameter less than 10 micrometers (PM[Formula: see text]), a category of pollutants including solid and liquid particles, can be a health hazard for several reasons: it can harm lung tissues and throat, aggravate asthma and increase respiratory illness. Accurate prediction models of PM[Formula: see text] concentrations are essential for proper management, control, and making public warning strategies. Therefore, machine learning techniques have the capability to develop methods or tools that can be used to discover unseen patterns from given data to solve a particular task or problem. The chance theory has advanced concepts pertinent to treat cases where both randomness and fuzziness play simultaneous roles at one time. The main objective is to study the modification of a single machine learning algorithm — support vector machine (SVM) — applying the chance weight of the target variable, based on the chance theory, to the corresponding dataset point to be superior to the ensemble machine learning algorithms. The results of this study are outperforming of the SVM algorithms when modifying and combining with the right theory/technique, especially the chance theory over other modern ensemble learning algorithms.


Author(s):  
Henock M. Deberneh ◽  
Intaek Kim

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.


2021 ◽  
Author(s):  
Rumana Rois ◽  
Manik Ray ◽  
Atikur Rahman ◽  
Swapan. K. Roy

Abstract Background: Stress-related mental health problems are one of the most common causes of the burden in university students worldwide. Many studies have been conducted to predict the prevalence of stress among university students, however most of these analyses were predominantly performed using the basic logistic regression model. As an alternative, we used the advanced machine learning approaches for detecting significant risk factors and to predict the prevalence of stress among Bangladeshi university students.Methods: This prevalence study surveyed 355 students from twenty-eight different Bangladeshi universities using questions concerning anthropometric measurements, academic, lifestyles, and health-related information, which referred to the perceived stress status of the respondents (yes or no). Boruta algorithm was used in determining the significant prognostic factors of the prevalence of stress. Prediction models were built using decision tree (DT), random forest (RF), support vector machine (SVM), and logistic regression (LR), and their performances were evaluated using parameters of confusion matrix, ROC curves, and k-fold cross-validation techniques. Results: One-third of university students reported stress within the last 12 months. Students’ pulse rate, systolic and diastolic blood pressures, sleep status, smoking status, and academic background were selected as the important features for predicting the prevalence of stress. Evaluated performance revealed that the highest performance observed from RF (accuracy=0.8972, precision=0.9241, sensitivity=0.9250, specificity=0.8148, AUC=0.8715, k-fold accuracy=0.8983) and the lowest from LR (accuracy=0.7476, precision=0.8354, sensitivity=0.8250, specificity=0.5185, AUC=0.7822, k-fold accuracy=07713) and SVM with polynomial kernel of degree 2 (accuracy=0.7570, precision=0.7975, sensitivity=0.8630, specificity=0.5294, AUC=0.7717, k-fold accuracy=0.7855). The RF model perfectly predicted stress including individual and interaction effects of predictors.Conclusion: The machine learning framework can be detected the significant prognostic factors and predicted this psychological problem more accurately, thereby helping the policy-makers, stakeholders, and families to understand and prevent this serious crisis by improving policy-making strategies, mental health promotion, and establishing effective university counseling services.


2021 ◽  
Author(s):  
Nuno Moniz ◽  
Susana Barbosa

<p>The Dansgaard-Oeschger (DO) events are one of the most striking examples of abrupt climate change in the Earth's history, representing temperature oscillations of about 8 to 16 degrees Celsius within a few decades. DO events have been studied extensively in paleoclimatic records, particularly in ice core proxies. Examples include the Greenland NGRIP record of oxygen isotopic composition.<br>This work addresses the anticipation of DO events using machine learning algorithms. We consider the NGRIP time series from 20 to 60 kyr b2k with the GICC05 timescale and 20-year temporal resolution. Forecasting horizons range from 0 (nowcasting) to 400 years. We adopt three different machine learning algorithms (random forests, support vector machines, and logistic regression) in training windows of 5 kyr. We perform validation on subsequent test windows of 5 kyr, based on timestamps of previous DO events' classification in Greenland by Rasmussen et al. (2014). We perform experiments with both sliding and growing windows.<br>Results show that predictions on sliding windows are better overall, indicating that modelling is affected by non-stationary characteristics of the time series. The three algorithms' predictive performance is similar, with a slightly better performance of random forest models for shorter forecast horizons. The prediction models' predictive capability decreases as the forecasting horizon grows more extensive but remains reasonable up to 120 years. Model performance deprecation is mostly related to imprecision in accurately determining the start and end time of events and identifying some periods as DO events when such is not valid.</p>


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Wenlong Jing ◽  
Xia Zhou ◽  
Chen Zhang ◽  
Chongyang Wang ◽  
Hao Jiang

Hyperspectral sensors provide detailed information for dust retention content (DRC) estimation. However, rich hyperspectral data are not fully utilized by traditional image analysis techniques. We integrated several recently developed machine learning algorithms to estimate DRC on plant leaves using the spectra measured by the ASD FieldSpec 3. The experiments were carried out on three common green plants of southern China. The important hyperspectral variables were first identified by applying the random forest (RF) algorithm. Three estimation models were then developed using the support vector machine (SVM), classification and regression tree (CART), and RF algorithms. The results showed that the increase in dust retention contents on plant leaves enhanced their reflectance in the visible wavelength but weakened their reflectance in the infrared wavelength. Wavelengths in the ranges of 450–500 nm, 550–600 nm, 750–1000 nm, and 1100–1300 nm were identified as important variables using the RF algorithm and were used to estimate the DRC. The comparison of the three machine learning techniques for DRC estimation confirmed that the SVM and RF models performed well because their estimations were similar to the measured DRC. Specifically, the average R2 for SVM and RF model are 0.85 and 0.88. The technical approach of this study proved to be a successful illustration of using hyperspectral measurements to estimate the DRC on plant leaves. The findings of this study can be applied to monitor the DRC on leaves of other plants and can also be integrated with other types of spectral data to measure the DRC at a regional scale.


2021 ◽  
Vol 309 ◽  
pp. 01042
Author(s):  
L. Chandrika ◽  
K. Madhavi ◽  
B. Sindhuja ◽  
M. Arshi

Prediction of a cardiovascular diseases has always a tedious challenge for doctors and medical practitioners. Most of the practitioners and hospital staff offers expensive medication, care and surgeries to treat the cardiovascular patients. At early-stage of prediction of heart-oriented problems will be giving a chance of survival by taking necessary precautions. Over the years there are different types of methodologies were proposed to predict the cardiovascular diseases one of the best methodologies is a Machine learning approach. These years many scientific advancements take place in the Artificial Intelligence, Machine learning, and Deep learning which gives an extra push up to help and implement the path in the field of medical image processing and medical data analysis. By using the enormous dataset from various medical experts used to help the researchers to predict the coronary problems prior to happening. Many researchers have tried and implemented different machine learning algorithms to automate the prediction analysis using the enormous number of datasets. There are numerous algorithms and procedures to predict the cardiovascular diseases and accessible to be specific Classification methods including Artificial Neural Networks (AI), Decision tree (DT), Support vector machine (SVM), Genetic algorithm (GA), Neural network (NN), Naive Bayes (NB) and Clustering algorithms like K-NN. A few examinations have been done for creating expectation models utilizing singular procedures and additionally concatenating at least two strategies. This paper gives a speedy and simple survey and knowledge of approachable prediction models using different researchers work from 2004 to 2019. The examination indicates the precision of individual experiments done by various researchers.


2021 ◽  
Vol 40 (1) ◽  
Author(s):  
Rumana Rois ◽  
Manik Ray ◽  
Atikur Rahman ◽  
Swapan K. Roy

Abstract Background Stress-related mental health problems are one of the most common causes of the burden in university students worldwide. Many studies have been conducted to predict the prevalence of stress among university students, however most of these analyses were predominantly performed using the basic logistic regression (LR) model. As an alternative, we used the advanced machine learning (ML) approaches for detecting significant risk factors and to predict the prevalence of stress among Bangladeshi university students. Methods This prevalence study surveyed 355 students from twenty-eight different Bangladeshi universities using questions concerning anthropometric measurements, academic, lifestyles, and health-related information, which referred to the perceived stress status of the respondents (yes or no). Boruta algorithm was used in determining the significant prognostic factors of the prevalence of stress. Prediction models were built using decision tree (DT), random forest (RF), support vector machine (SVM), and LR, and their performances were evaluated using parameters of confusion matrix, receiver operating characteristics (ROC) curves, and k-fold cross-validation techniques. Results One-third of university students reported stress within the last 12 months. Students’ pulse rate, systolic and diastolic blood pressures, sleep status, smoking status, and academic background were selected as the important features for predicting the prevalence of stress. Evaluated performance revealed that the highest performance observed from RF (accuracy = 0.8972, precision = 0.9241, sensitivity = 0.9250, specificity = 0.8148, area under the ROC curve (AUC) = 0.8715, k-fold accuracy = 0.8983) and the lowest from LR (accuracy = 0.7476, precision = 0.8354, sensitivity = 0.8250, specificity = 0.5185, AUC = 0.7822, k-fold accuracy = 07713) and SVM with polynomial kernel of degree 2 (accuracy = 0.7570, precision = 0.7975, sensitivity = 0.8630, specificity = 0.5294, AUC = 0.7717, k-fold accuracy = 0.7855). Overall, the RF model performs better and authentically predicted stress compared with other ML techniques, including individual and interaction effects of predictors. Conclusion The machine learning framework can be detected the significant prognostic factors and predicted this psychological problem more accurately, thereby helping the policy-makers, stakeholders, and families to understand and prevent this serious crisis by improving policy-making strategies, mental health promotion, and establishing effective university counseling services.


2021 ◽  
Vol 18 (4(Suppl.)) ◽  
pp. 1406
Author(s):  
Fadratul Hafinaz Hassan ◽  
Mohd Adib Omar

Recurrent strokes can be devastating, often resulting in severe disability or death. However, nearly 90% of the causes of recurrent stroke are modifiable, which means recurrent strokes can be averted by controlling risk factors, which are mainly behavioral and metabolic in nature. Thus, it shows that from the previous works that recurrent stroke prediction model could help in minimizing the possibility of getting recurrent stroke. Previous works have shown promising results in predicting first-time stroke cases with machine learning approaches. However, there are limited works on recurrent stroke prediction using machine learning methods. Hence, this work is proposed to perform an empirical analysis and to investigate machine learning algorithms implementation in the recurrent stroke prediction models. This research aims to investigate and compare the performance of machine learning algorithms using recurrent stroke clinical public datasets. In this study, Artificial Neural Network (ANN), Support Vector Machine (SVM) and Bayesian Rule List (BRL) are used and compared their performance in the domain of recurrent stroke prediction model. The result of the empirical experiments shows that ANN scores the highest accuracy at 80.00%, follows by BRL with 75.91% and SVM with 60.45%.


Sign in / Sign up

Export Citation Format

Share Document