scholarly journals Machine Learning-Based Nicotine Addiction Prediction Models for Youth E-Cigarette and Waterpipe (Hookah) Users

2021 ◽  
Vol 10 (5) ◽  
pp. 972
Author(s):  
Jeeyae Choi ◽  
Hee-Tae Jung ◽  
Anastasiya Ferrell ◽  
Seoyoon Woo ◽  
Linda Haddad

Despite the harmful effect on health, e-cigarette and hookah smoking in youth in the U.S. has increased. Developing tailored e-cigarette and hookah cessation programs for youth is imperative. The aim of this study was to identify predictor variables such as social, mental, and environmental determinants that cause nicotine addiction in youth e-cigarette or hookah users and build nicotine addiction prediction models using machine learning algorithms. A total of 6511 participants were identified as ever having used e-cigarettes or hookah from the National Youth Tobacco Survey (2019) datasets. Prediction models were built by Random Forest with ReliefF and Least Absolute Shrinkage and Selection Operator (LASSO). ReliefF identified important predictor variables, and the Davies–Bouldin clustering evaluation index selected the optimal number of predictors for Random Forest. A total of 193 predictor variables were included in the final analysis. Performance of prediction models was measured by Root Mean Square Error (RMSE) and Confusion Matrix. The results suggested high performance of prediction. Identified predictor variables were aligned with previous research. The noble predictors found, such as ‘witnessed e-cigarette use in their household’ and ‘perception of their tobacco use’, could be used in public awareness or targeted e-cigarette and hookah youth education and for policymakers.

2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


2020 ◽  
Vol 12 (15) ◽  
pp. 5972
Author(s):  
Nicholas Fiorentini ◽  
Massimo Losa

Screening procedures in road blackspot detection are essential tools for road authorities for quickly gathering insights on the safety level of each road site they manage. This paper suggests a road blackspot screening procedure for two-lane rural roads, relying on five different machine learning algorithms (MLAs) and real long-term traffic data. The network analyzed is the one managed by the Tuscany Region Road Administration, mainly composed of two-lane rural roads. An amount of 995 road sites, where at least one accident occurred in 2012–2016, have been labeled as “Accident Case”. Accordingly, an equal number of sites where no accident occurred in the same period, have been randomly selected and labeled as “Non-Accident Case”. Five different MLAs, namely Logistic Regression, Classification and Regression Tree, Random Forest, K-Nearest Neighbor, and Naïve Bayes, have been trained and validated. The output response of the MLAs, i.e., crash occurrence susceptibility, is a binary categorical variable. Therefore, such algorithms aim to classify a road site as likely safe (“Accident Case”) or potentially susceptible to an accident occurrence (“Non-Accident Case”) over five years. Finally, algorithms have been compared by a set of performance metrics, including precision, recall, F1-score, overall accuracy, confusion matrix, and the Area Under the Receiver Operating Characteristic. Outcomes show that the Random Forest outperforms the other MLAs with an overall accuracy of 73.53%. Furthermore, all the MLAs do not show overfitting issues. Road authorities could consider MLAs to draw up a priority list of on-site inspections and maintenance interventions.


Author(s):  
You-Hyun Park ◽  
Sung-Hwa Kim ◽  
Yoon-Young Choi

In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1–5 years from the Korea National Health and Nutrition Examination Survey data (2007–2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.


2021 ◽  
Vol 12 (10) ◽  
pp. 7488-7496
Author(s):  
Yusuf Aliyu Adamu, Et. al.

Measures have been taking to ensure the safety of individuals from the burden of vector-borne disease but it remains the causative agent of death than any other diseases in Africa. Many human lives are lost particularly of children below five years regardless of the efforts made. The effect of malaria is much more challenging mostly in developing countries. In 2019, 51% of malaria fatality happen in Africa which it increased by 20% in 2020 due to the covid-19 pandemic. The majority of African countries lack a proper or a sound health care system, proper environmental settlement, economic hardship, limited funding in the health sector, and absence of good policies to ensure the safety of individuals. Information has to become available to the peoples on the effect of malaria by making public awareness program to make sure people become acquainted with the disease so that certain measure can be maintained. The prediction model can help the policymakers to know more about the expected time of the malaria occurrence based on the existing features so that people will get to know the information regarding the disease on time, health equipment and medication to be made available by government through it policy. In this research weather condition, non-climatic features, and malaria cases are considered in designing the model for prediction purposes and also the performance of six different machine learning classifiers for instance Support Vector Machine, K-Nearest Neighbour, Random Forest, Decision Tree, Logistic Regression, and Naïve Bayes is identified and found that Random Forest is the best with accuracy (97.72%), AUC (98%) AUC, and (100%) precision based on the data set used in the analysis.  


Author(s):  
Nicolás Amigo ◽  
Alvaro Valencia ◽  
Wei Wu ◽  
Sourav Patnaik ◽  
Ender Finol

Morphological characterization and fluid dynamics simulations were carried out to classify the rupture status of 71 (36 unruptured, 35 ruptured) patient specific cerebral aneurysms using a machine learning approach together with statistical techniques. Eleven morphological and six hemodynamic parameters were evaluated individually and collectively for significance as rupture status predictors. The performance of each parameter was inspected using hypothesis testing, accuracy, confusion matrix, and the area under the receiver operating characteristic curve. Overall, the size ratio exhibited the best performance, followed by the diastolic wall shear stress, and systolic wall shear stress. The prediction capability of all 17 parameters together was evaluated using eight different machine learning algorithms. The logistic regression achieved the highest accuracy (0.75), whereas the random forest had the highest area under curve value among all the classifiers (0.82), surpassing the performance exhibited by the size ratio. Hence, we propose the random forest model as a tool that can help improve the rupture status prediction of cerebral aneurysms.


2019 ◽  
Author(s):  
Ruilin Li ◽  
Xinyin Han ◽  
Liping Sun ◽  
Yannan Feng ◽  
Xiaolin Sun ◽  
...  

AbstractPrecisely predicting the required pre-surgery blood volume (PBV) in surgical patients is a formidable challenge in China. Inaccurate estimation is associate with excessive costs, postponed surgeries and adverse outcome after surgery due to in sufficient supply or inventory. This study aimed to predict required PBV based on machine learning techniques. 181,027 medical documents over 6 years were cleaned and finally obtained 92,057 blood transfusion records. The blood transfusion and surgery related factors of perioperative patients, surgeons experience volumes and the actual volumes of transfused RBCs were extracted. 6 machine learning algorithms were used to build prediction models. The surgery patients received allogenic RBCs or without transfusion, had total volume less than 10 units, or had the latest laboratory examinations of pre-surgery within 7 days were included, providing 118,823 data points. 39 predictive factors related to the RBCs transfusion were identified. Random forest model was selected to predict the required PBV of RBCs with 72.9% accuracy and strikingly improved the accuracy by 30.4% compared with surgeons experience, where 90% of data was used for training. We tested and demonstrated that both the data-driven models and the random forest model achieved higher accuracy than surgeons experience. Furthermore, we developed a computational tool, PTRBC, to precisely estimate the required PBV in surgical patients and we believe this tool will find more applications in assisting clinician decisions, not only confined to making accurate pre-surgery blood requirement predicting.


2021 ◽  
pp. 60-66
Author(s):  
Sarun Duangsuwan ◽  
◽  
Myo Myint Maw

The comparison of path loss model for the unmanned aerial vehicle (UAV) and Internet of Things (IoT) air-to-ground communication system was proposed for rural precision farming. Due to the uncertainty of propagation channel in rural precision farming environment, the comparison of path loss prediction was investigated by the conventional particle swarm optimization (PSO) algorithms: PSO (exponential or Exp), PSO (polynomial or Poly) and the machine learning algorithms: k-nearest neighbor (k-NN), and random forest, are exploited to accurate the path loss models on the basic of the measured dataset. Meanwhile, the empirical model in the rural precision farming was considered. By using the machine learning-based algorithms, the coefficient of determination (R-squared: R2) and root mean squared error (RMSE) were evaluated as highly accuracy and precision more than the conventional PSO algorithms. According to the results, the random forest method was able to perform more than other methods. It has the smallest prediction errors.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Yan Zhang ◽  
Jinxiao Wen ◽  
Guanshu Yang ◽  
Zunwen He ◽  
Xinran Luo

Recently, unmanned aerial vehicle (UAV) plays an important role in many applications because of its high flexibility and low cost. To realize reliable UAV communications, a fundamental work is to investigate the propagation characteristics of the channels. In this paper, we propose path loss models for the UAV air-to-air (AA) scenario based on machine learning. A ray-tracing software is employed to generate samples for multiple routes in a typical urban environment, and different altitudes of Tx and Rx UAVs are taken into consideration. Two machine-learning algorithms, Random Forest and KNN, are exploited to build prediction models on the basis of the training data. The prediction performance of trained models is assessed on the test set according to the metrics including the mean absolute error (MAE) and root mean square error (RMSE). Meanwhile, two empirical models are presented for comparison. It is shown that the machine-learning-based models are able to provide high prediction accuracy and acceptable computational efficiency in the AA scenario. Moreover, Random Forest outperforms other models and has the smallest prediction errors. Further investigation is made to evaluate the impacts of five different parameters on the path loss. It is demonstrated that the path visibility is crucial for the path loss.


2020 ◽  
Vol 13 (1) ◽  
pp. 10
Author(s):  
Andrea Sulova ◽  
Jamal Jokar Arsanjani

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.


Sign in / Sign up

Export Citation Format

Share Document