scholarly journals Machine learning models of tobacco susceptibility and current use among adolescents from 97 countries in the Global Youth Tobacco Survey, 2013-2017

2021 ◽  
Vol 1 (12) ◽  
pp. e0000060
Author(s):  
Nayoung Kim ◽  
Wei-Yin Loh ◽  
Danielle E. McCarthy

Adolescents are particularly vulnerable to tobacco initiation and escalation. Identifying factors associated with adolescent tobacco susceptibility and use can guide tobacco prevention efforts. Novel machine learning (ML) approaches efficiently identify interactive relations among factors of tobacco risks and identify high-risk subpopulations that may benefit from targeted prevention interventions. Nationally representative cross-sectional 2013–2017 Global Youth Tobacco Survey (GYTS) data from 97 countries (28 high-income and 69 low-and middle-income countries) from 342,481 adolescents aged 13–15 years (weighted N = 52,817,455) were analyzed using ML regression tree models, accounting for sampling weights. Predictors included demographics (sex, age), geography (region, country-income), and self-reported exposure to tobacco marketing, secondhand smoke, and tobacco control policies. 11.9% (95% CI 11.1%-12.6%) of tobacco-naïve adolescents were susceptible to tobacco use and 11.7% (11.0%-12.5%) of adolescents reported using any tobacco product (cigarettes, other smoked tobacco, smokeless tobacco) in the past 30 days. Regression tree models found that exposure or receptivity to tobacco industry promotions and secondhand smoke exposure predicted increased risks of susceptibility and use, while support for smoke-free air policies predicted decreased risks of tobacco susceptibility and use. Anti-tobacco school education and health warning messages on product packs predicted susceptibility or use, but their protective effects were not evident across all adolescent subgroups. Sex, region, and country-income moderated the effects of tobacco promotion and control factors on susceptibility or use, showing higher rates of susceptibility and use in males and high-income countries, Africa and the Americas (susceptibility), and Europe and Southeast Asia (use). Tobacco policy-related factors robustly predicted both tobacco susceptibility and use in global adolescents, and interacted with adolescent characteristics and other environments in complex ways that stratified adolescents based on their tobacco risk. These findings emphasize the importance of efficient ML modeling of interactions in tobacco risk prediction and suggest a role for targeted prevention strategies for high-risk adolescents.

2021 ◽  
Author(s):  
Damià Valero-Bover ◽  
Pedro González ◽  
Gerard Carot-Sans ◽  
Isaac Cano ◽  
Pilar Saura ◽  
...  

Objective: To develop and validate an algorithm for predicting non-attendance to outpatient appointments. Results: We developed two decision tree models for dermatology and pneumology services (trained with 33,329 and 21,050 appointments, respectively). The prospective validation showed a specificity of 78.34% (95%CI 71.07, 84.51) and a balanced accuracy of 70.45% for dermatology; and 69.83% (95%CI 60.61, 78.00) - 65.53% for pneumology, respectively. When using the algorithm for identifying patients at high risk of non-attendance in the context of a phone-call reminder program, the non-attendance rate decreased 50.61% (P<.001) and 39.33% (P=.048) in the dermatology and pneumology services, respectively. Conclusions: A machine learning model can effectively identify patients at high risk of non-attendance based on information stored in electronic medical records. The use of this model to prioritize phone call reminders to patients at high risk of non-attendance significantly reduced the non-attendance rate.


2021 ◽  
Author(s):  
Damià Valero-Bover ◽  
Pedro González ◽  
Gerard Carot-Sans ◽  
Isaac Cano ◽  
Pilar Saura ◽  
...  

Objective: To develop and validate an algorithm for predicting non-attendance to outpatient appointments. Results: We developed two decision tree models for dermatology and pneumology services (trained with 33,329 and 21,050 appointments, respectively). The prospective validation showed a specificity of 78.34% (95%CI 71.07, 84.51) and a balanced accuracy of 70.45% for dermatology; and 69.83% (95%CI 60.61, 78.00) - 65.53% for pneumology, respectively. When using the algorithm for identifying patients at high risk of non-attendance in the context of a phone-call reminder program, the non-attendance rate decreased 50.61% (P<.001) and 39.33% (P=.048) in the dermatology and pneumology services, respectively. Conclusions: A machine learning model can effectively identify patients at high risk of non-attendance based on information stored in electronic medical records. The use of this model to prioritize phone call reminders to patients at high risk of non-attendance significantly reduced the non-attendance rate.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2020 ◽  
Author(s):  
Carson Lam ◽  
Jacob Calvert ◽  
Gina Barnes ◽  
Emily Pellegrini ◽  
Anna Lynn-Palevsky ◽  
...  

BACKGROUND In the wake of COVID-19, the United States has developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans that should continue to stay at home due to being at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for severe illness and who should therefore not return to work until vaccination or widespread serological testing is available. OBJECTIVE This study evaluated a machine learning algorithm for the prediction of serious illness due to COVID-19 using inpatient data collected from electronic health records. METHODS The algorithm was trained to identify patients for whom a diagnosis of COVID-19 was likely to result in hospitalization, and compared against four U.S policy-based criteria: age over 65, having a serious underlying health condition, age over 65 or having a serious underlying health condition, and age over 65 and having a serious underlying health condition. RESULTS This algorithm identified 80% of patients at risk for hospitalization due to COVID-19, versus at most 62% that are identified by government guidelines. The algorithm also achieved a high specificity of 95%, outperforming government guidelines. CONCLUSIONS This algorithm may help to enable a broad reopening of the American economy while ensuring that patients at high risk for serious disease remain home until vaccination and testing become available.


RMD Open ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e001524
Author(s):  
Nina Marijn van Leeuwen ◽  
Marc Maurits ◽  
Sophie Liem ◽  
Jacopo Ciaffi ◽  
Nina Ajmone Marsan ◽  
...  

ObjectivesTo develop a prediction model to guide annual assessment of systemic sclerosis (SSc) patients tailored in accordance to disease activity.MethodsA machine learning approach was used to develop a model that can identify patients without disease progression. SSc patients included in the prospective Leiden SSc cohort and fulfilling the ACR/EULAR 2013 criteria were included. Disease progression was defined as progression in ≥1 organ system, and/or start of immunosuppression or death. Using elastic-net-regularisation, and including 90 independent clinical variables (100% complete), we trained the model on 75% and validated it on 25% of the patients, optimising on negative predictive value (NPV) to minimise the likelihood of missing progression. Probability cutoffs were identified for low and high risk for disease progression by expert assessment.ResultsOf the 492 SSc patients (follow-up range: 2–10 years), disease progression during follow-up was observed in 52% (median time 4.9 years). Performance of the model in the test set showed an AUC-ROC of 0.66. Probability score cutoffs were defined: low risk for disease progression (<0.197, NPV:1.0; 29% of patients), intermediate risk (0.197–0.223, NPV:0.82; 27%) and high risk (>0.223, NPV:0.78; 44%). The relevant variables for the model were: previous use of cyclophosphamide or corticosteroids, start with immunosuppressive drugs, previous gastrointestinal progression, previous cardiovascular event, pulmonary arterial hypertension, modified Rodnan Skin Score, creatine kinase and diffusing capacity for carbon monoxide.ConclusionOur machine-learning-assisted model for progression enabled us to classify 29% of SSc patients as ‘low risk’. In this group, annual assessment programmes could be less extensive than indicated by international guidelines.


Soil Systems ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 41
Author(s):  
Tulsi P. Kharel ◽  
Amanda J. Ashworth ◽  
Phillip R. Owens ◽  
Dirk Philipp ◽  
Andrew L. Thomas ◽  
...  

Silvopasture systems combine tree and livestock production to minimize market risk and enhance ecological services. Our objective was to explore and develop a method for identifying driving factors linked to productivity in a silvopastoral system using machine learning. A multi-variable approach was used to detect factors that affect system-level output (i.e., plant production (tree and forage), soil factors, and animal response based on grazing preference). Variables from a three-year (2017–2019) grazing study, including forage, tree, soil, and terrain attribute parameters, were analyzed. Hierarchical variable clustering and random forest model selected 10 important variables for each of four major clusters. A stepwise multiple linear regression and regression tree approach was used to predict cattle grazing hours per animal unit (h ha−1 AU−1) using 40 variables (10 per cluster) selected from 130 total variables. Overall, the variable ranking method selected more weighted variables for systems-level analysis. The regression tree performed better than stepwise linear regression for interpreting factor-level effects on animal grazing preference. Cattle were more likely to graze forage on soils with Cd levels <0.04 mg kg−1 (126% greater grazing hours per AU), soil Cr <0.098 mg kg−1 (108%), and a SAGA wetness index of <2.7 (57%). Cattle also preferred grazing (88%) native grasses compared to orchardgrass (Dactylis glomerata L.). The result shows water flow within the landscape position (wetness index), and associated metals distribution may be used as an indicator of animal grazing preference. Overall, soil nutrient distribution patterns drove grazing response, although animal grazing preference was also influenced by aboveground (forage and tree), soil, and landscape attributes. Machine learning approaches helped explain pasture use and overall drivers of grazing preference in a multifunctional system.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
José Castela Forte ◽  
Galiya Yeshmagambetova ◽  
Maureen L. van der Grinten ◽  
Bart Hiemstra ◽  
Thomas Kaufmann ◽  
...  

AbstractCritically ill patients constitute a highly heterogeneous population, with seemingly distinct patients having similar outcomes, and patients with the same admission diagnosis having opposite clinical trajectories. We aimed to develop a machine learning methodology that identifies and provides better characterization of patient clusters at high risk of mortality and kidney injury. We analysed prospectively collected data including co-morbidities, clinical examination, and laboratory parameters from a minimally-selected population of 743 patients admitted to the ICU of a Dutch hospital between 2015 and 2017. We compared four clustering methodologies and trained a classifier to predict and validate cluster membership. The contribution of different variables to the predicted cluster membership was assessed using SHapley Additive exPlanations values. We found that deep embedded clustering yielded better results compared to the traditional clustering algorithms. The best cluster configuration was achieved for 6 clusters. All clusters were clinically recognizable, and differed in in-ICU, 30-day, and 90-day mortality, as well as incidence of acute kidney injury. We identified two high mortality risk clusters with at least 60%, 40%, and 30% increased. ICU, 30-day and 90-day mortality, and a low risk cluster with 25–56% lower mortality risk. This machine learning methodology combining deep embedded clustering and variable importance analysis, which we made publicly available, is a possible solution to challenges previously encountered by clustering analyses in heterogeneous patient populations and may help improve the characterization of risk groups in critical care.


2021 ◽  
Vol 22 (3) ◽  
pp. 1075
Author(s):  
Luca Bedon ◽  
Michele Dal Bo ◽  
Monica Mossenta ◽  
Davide Busato ◽  
Giuseppe Toffoli ◽  
...  

Although extensive advancements have been made in treatment against hepatocellular carcinoma (HCC), the prognosis of HCC patients remains unsatisfied. It is now clearly established that extensive epigenetic changes act as a driver in human tumors. This study exploits HCC epigenetic deregulation to define a novel prognostic model for monitoring the progression of HCC. We analyzed the genome-wide DNA methylation profile of 374 primary tumor specimens using the Illumina 450 K array data from The Cancer Genome Atlas. We initially used a novel combination of Machine Learning algorithms (Recursive Features Selection, Boruta) to capture early tumor progression features. The subsets of probes obtained were used to train and validate Random Forest models to predict a Progression Free Survival greater or less than 6 months. The model based on 34 epigenetic probes showed the best performance, scoring 0.80 accuracy and 0.51 Matthews Correlation Coefficient on testset. Then, we generated and validated a progression signature based on 4 methylation probes capable of stratifying HCC patients at high and low risk of progression. Survival analysis showed that high risk patients are characterized by a poorer progression free survival compared to low risk patients. Moreover, decision curve analysis confirmed the strength of this predictive tool over conventional clinical parameters. Functional enrichment analysis highlighted that high risk patients differentiated themselves by the upregulation of proliferative pathways. Ultimately, we propose the oncogenic MCM2 gene as a methylation-driven gene of which the representative epigenetic markers could serve both as predictive and prognostic markers. Briefly, our work provides several potential HCC progression epigenetic biomarkers as well as a new signature that may enhance patients surveillance and advances in personalized treatment.


Sign in / Sign up

Export Citation Format

Share Document