1364Predicting obesity and smoking using medication data: a machine-learning approach

Abstract Background Administrative health datasets are widely used in public health research but often lack information about common confounders. We aimed to develop and validate machine learning (ML)-based models using medication data from Australia’s Pharmaceutical Benefits Scheme (PBS) database to predict obesity and smoking. Methods We used data from the D-Health Trial (N = 18,000) and the QSkin Study (N = 43,794). Smoking history, and height and weight were self-reported at study entry. Linkage to the PBS dataset captured 5 years of medication data after cohort entry. We used age, sex, and medication use, classified using Anatomical Therapeutic Classification codes, as potential predictors of smoking and obesity. We trained gradient-boosted machine learning models using data for the first 80% of participants enrolled; models were validated using the remaining 20%. We assessed model performance overall and by sex and age, and compared models generated using 3 and 5 years of PBS data. Results Based on the validation dataset using 3 years of PBS data, the area under the receiver operating characteristic curve (AUC) was 0.70 (95% confidence interval (CI) 0.68 – 0.71) for predicting obesity and 0.71 (95% CI 0.70 – 0.72) for predicting smoking. Models performed better in women than in men. Using 5 years of PBS data resulted in marginal improvement. Conclusions Medication data in combination with age and sex can be used to predict obesity and smoking. These models may be of value to researchers using data collected for administrative purposes.

Download Full-text

A machine learning approach to predict extreme inactivity in COPD patients using non-activity-related clinical data

PLoS ONE ◽

10.1371/journal.pone.0255977 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255977

Author(s):

Bernard Aguilaniu ◽

David Hess ◽

Eric Kelkel ◽

Amandine Briault ◽

Marie Destors ◽

...

Keyword(s):

Machine Learning ◽

Operating Characteristic ◽

Learning Algorithm ◽

Characteristic Curve ◽

Real Life ◽

Learning Approach ◽

Copd Patients ◽

Machine Learning Approach ◽

Using Data ◽

Walking Time

Facilitating the identification of extreme inactivity (EI) has the potential to improve morbidity and mortality in COPD patients. Apart from patients with obvious EI, the identification of a such behavior during a real-life consultation is unreliable. We therefore describe a machine learning algorithm to screen for EI, as actimetry measurements are difficult to implement. Complete datasets for 1409 COPD patients were obtained from COLIBRI-COPD, a database of clinicopathological data submitted by French pulmonologists. Patient- and pulmonologist-reported estimates of PA quantity (daily walking time) and intensity (domestic, recreational, or fitness-directed) were first used to assign patients to one of four PA groups (extremely inactive [EI], overtly active [OA], intermediate [INT], inconclusive [INC]). The algorithm was developed by (i) using data from 80% of patients in the EI and OA groups to identify ‘phenotype signatures’ of non-PA-related clinical variables most closely associated with EI or OA; (ii) testing its predictive validity using data from the remaining 20% of EI and OA patients; and (iii) applying the algorithm to identify EI patients in the INT and INC groups. The algorithm’s overall error for predicting EI status among EI and OA patients was 13.7%, with an area under the receiver operating characteristic curve of 0.84 (95% confidence intervals: 0.75–0.92). Of the 577 patients in the INT/INC groups, 306 (53%) were reclassified as EI by the algorithm. Patient- and physician- reported estimation may underestimate EI in a large proportion of COPD patients. This algorithm may assist physicians in identifying patients in urgent need of interventions to promote PA.

Download Full-text

Discovery of Highly Polymorphic Organic Materials: A New Machine Learning Approach

10.26434/chemrxiv.9524219 ◽

2019 ◽

Author(s):

Zied Hosni ◽

Annalisa Riccardi ◽

Stephanie Yerdelen ◽

Alan R. G. Martin ◽

Deborah Bowering ◽

...

Keyword(s):

Machine Learning ◽

Structure Prediction ◽

External Validation ◽

New Drugs ◽

Training Dataset ◽

Validation Dataset ◽

Machine Learning Classification ◽

Novel Approach ◽

Physical Form ◽

Machine Learning Approach

<div><div><p>Polymorphism is the capacity of a molecule to adopt different conformations or molecular packing arrangements in the solid state. This is a key property to control during pharmaceutical manufacturing because it can impact a range of properties including stability and solubility. In this study, a novel approach based on machine learning classification methods is used to predict the likelihood for an organic compound to crystallise in multiple forms. A training dataset of drug-like molecules was curated from the Cambridge Structural Database (CSD) and filtered according to entries in the Drug Bank database. The number of separate forms in the CSD for each molecule was recorded. A metaclassifier was trained using this dataset to predict the expected number of crystalline forms from the compound descriptors. This approach was used to estimate the number of crystallographic forms for an external validation dataset. These results suggest this novel methodology can be used to predict the extent of polymorphism of new drugs or not-yet experimentally screened molecules. This promising method complements expensive ab initio methods for crystal structure prediction and as integral to experimental physical form screening, may identify systems that with unexplored potential.</p> </div> </div>

Download Full-text

Quantifying changes in bicycle volumes using crowdsourced data

Environment and Planning B Urban Analytics and City Science ◽

10.1177/23998083211066103 ◽

2022 ◽

pp. 239980832110661

Author(s):

Ali Al-Ramini ◽

Mohammad A Takallou ◽

Daniel P Piatkowski ◽

Fadi Alsaleem

Keyword(s):

Machine Learning ◽

The United States ◽

Crowdsourced Data ◽

Machine Learning Approach ◽

Bicycle Infrastructure ◽

The Difference ◽

Infrastructure Investments ◽

Using Data ◽

The Impact ◽

The City

Most cities in the United States lack comprehensive or connected bicycle infrastructure; therefore, inexpensive and easy-to-implement solutions for connecting existing bicycle infrastructure are increasingly being employed. Signage is one of the promising solutions. However, the necessary data for evaluating its effect on cycling ridership is lacking. To overcome this challenge, this study tests the potential of using readily-available crowdsourced data in concert with machine-learning methods to provide insight into signage intervention effectiveness. We do this by assessing a natural experiment to identify the potential effects of adding or replacing signage within existing bicycle infrastructure in 2019 in the city of Omaha, Nebraska. Specifically, we first visually compare cycling traffic changes in 2019 to those from the previous two years (2017–2018) using data extracted from the Strava fitness app. Then, we use a new three-step machine-learning approach to quantify the impact of signage while controlling for weather, demographics, and street characteristics. The steps are as follows: Step 1 (modeling and validation) build and train a model from the available 2017 crowdsourced data (i.e., Strava, Census, and weather) that accurately predicts the cycling traffic data for any street within the study area in 2018; Step 2 (prediction) use the model from Step 1 to predict bicycle traffic in 2019 while assuming new signage was not added; Step 3 (impact evaluation) use the difference in prediction from actual traffic in 2019 as evidence of the likely impact of signage. While our work does not demonstrate causality, it does demonstrate an inexpensive method, using readily-available data, to identify changing trends in bicycling over the same time that new infrastructure investments are being added.

Download Full-text

A Comparative Study to analyze crime threats using data mining and machine learning approach

10.1109/icscan53069.2021.9526489 ◽

2021 ◽

Author(s):

Puninder Kaur ◽

Geeta Rani ◽

Taruna Sharma ◽

Avinash Sharma

Keyword(s):

Machine Learning ◽

Data Mining ◽

Comparative Study ◽

Learning Approach ◽

Machine Learning Approach ◽

Using Data

Download Full-text

Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

Disaster Medicine and Public Health Preparedness ◽

10.1017/dmp.2019.73 ◽

2019 ◽

Vol 14 (3) ◽

pp. 302-307

Author(s):

Benjamin Q. Huynh ◽

Sanjay Basu

Keyword(s):

Machine Learning ◽

Food Prices ◽

Internally Displaced ◽

Fuel Prices ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Using Data ◽

Diverse Data ◽

Internally Displaced Population ◽

Persistence Model

ABSTRACTObjectives:Armed conflict has contributed to an unprecedented number of internally displaced persons (IDPs), individuals who are forced out of their homes but remain within their country. IDPs often urgently require shelter, food, and healthcare, yet prediction of when IDPs will migrate to an area remains a major challenge for aid delivery organizations. We sought to develop an IDP migration forecasting framework that could empower humanitarian aid groups to more effectively allocate resources during conflicts.Methods:We modeled monthly IDP migration between provinces within Syria and within Yemen using data on food prices, fuel prices, wages, location, time, and conflict reports. We compared machine learning methods with baseline persistence methods of forecasting.Results:We found a machine learning approach that more accurately forecast migration trends than baseline persistence methods. A random forest model outperformed the best persistence model in terms of root mean square error of log migration by 26% and 17% for the Syria and Yemen datasets, respectively.Conclusions:Integrating diverse data sources into a machine learning model appears to improve IDP migration prediction. Further work should examine whether implementation of such models can enable proactive aid allocation for IDPs in anticipation of forecast arrivals.

Download Full-text

A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography

Scientific Reports ◽

10.1038/s41598-021-95533-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hojjat Salehinejad ◽

Jumpei Kitamura ◽

Noah Ditkofsky ◽

Amy Lin ◽

Aditya Bharatha ◽

...

Keyword(s):

Machine Learning ◽

Medical Imaging ◽

Intracranial Hemorrhage ◽

Real World ◽

External Validation ◽

Model Performance ◽

Training Dataset ◽

Validation Dataset ◽

Great Promise ◽

Clinical Environments

AbstractMachine learning (ML) holds great promise in transforming healthcare. While published studies have shown the utility of ML models in interpreting medical imaging examinations, these are often evaluated under laboratory settings. The importance of real world evaluation is best illustrated by case studies that have documented successes and failures in the translation of these models into clinical environments. A key prerequisite for the clinical adoption of these technologies is demonstrating generalizable ML model performance under real world circumstances. The purpose of this study was to demonstrate that ML model generalizability is achievable in medical imaging with the detection of intracranial hemorrhage (ICH) on non-contrast computed tomography (CT) scans serving as the use case. An ML model was trained using 21,784 scans from the RSNA Intracranial Hemorrhage CT dataset while generalizability was evaluated using an external validation dataset obtained from our busy trauma and neurosurgical center. This real world external validation dataset consisted of every unenhanced head CT scan (n = 5965) performed in our emergency department in 2019 without exclusion. The model demonstrated an AUC of 98.4%, sensitivity of 98.8%, and specificity of 98.0%, on the test dataset. On external validation, the model demonstrated an AUC of 95.4%, sensitivity of 91.3%, and specificity of 94.1%. Evaluating the ML model using a real world external validation dataset that is temporally and geographically distinct from the training dataset indicates that ML generalizability is achievable in medical imaging applications.

Download Full-text

A Study of Cross-National Differences in Happiness Factors Using Machine Learning Approach

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015710023 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1699-1702 ◽

Cited By ~ 2

Author(s):

Theresia Ratih Dewi Saputri ◽

Seok-Won Lee

Keyword(s):

Machine Learning ◽

Information Gain ◽

Development Project ◽

Support Vector ◽

National Differences ◽

Dimensionality Reduction Technique ◽

Machine Learning Approach ◽

National Happiness ◽

Using Data ◽

Cross National

National happiness has been actively studied throughout the past years. The happiness factor varies due to different human perspectives. The factors used in this work include both physical needs and the mental needs of humanity, for example, the educational factor. This work identified more than 90 features that can be used to predict the country happiness. Due to numerous features, it is unwise to rely on the prediction of national happiness by manual analysis. Therefore, this work used a machine learning technique called Support Vector Machine (SVM) to learn and predict the country happiness. In order to improve the prediction accuracy, dimensionality reduction technique which is the information gain was also used in this work. This technique was chosen due to its ability to explore the interrelationships among a set of variables. Using data of 187 countries from the UN Development Project, this work is able to identify which factor needed to be improved by a certain country to increase the happiness of their citizens.

Download Full-text

Mapping Informal Settlements in the Middle East Environment using an Object-Based Machine-Learning Approach

10.20944/preprints201809.0219.v1 ◽

2018 ◽

Author(s):

Ahmad Fallatah ◽

Simon Jones ◽

David Mitchell

Keyword(s):

Machine Learning ◽

Urban Areas ◽

Hybrid Approach ◽

Informal Settlements ◽

Validation Dataset ◽

Relative Importance ◽

Object Based Image Analysis ◽

Object Based ◽

Machine Learning Approach ◽

Very High

The identification of informal settlements in urban areas is an important step in developing and implementing pro-poor urban policies. Understanding when, where and who lives inside informal settlements is critical to efforts to improve their resilience. This study aims to analyse the capability of machine-learning (ML) methods to map informal areas in Jeddah, Saudi Arabia, using very-high-resolution (VHR) imagery and terrain data. Fourteen indicators of settlement characteristics were derived and mapped using an object-based ML approach and VHR imagery. These indicators were categorised according to three different spatial levels: environ, settlement and object. The most useful indicators for prediction were found to be density and texture measures, (with random forest (RF) relative importance measures of over 25% and 23% respectively). The success of this approach was evaluated using a small, fully independent validation dataset. Informal areas were mapped with an overall accuracy of 91%. Object-based ML as a hybrid approach performed better (8%) than object-based image analysis alone due to its ability to encompass all available geospatial levels.

Download Full-text

Machine-Learning Prediction of Comorbid Substance Use Disorders in ADHD Youth Using Swedish Registry Data

10.1101/661983 ◽

2019 ◽

Author(s):

Yanli Zhang-James ◽

Qi Chen ◽

Ralf Kuja-Halkola ◽

Paul Lichtenstein ◽

Henrik Larsson ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Substance Use ◽

Substance Use Disorders ◽

Prediction Models ◽

Characteristic Curve ◽

Registry Data ◽

Longitudinal Models ◽

Cross Sectional ◽

Using Data

AbstractBackgroundChildren with attention-deficit/hyperactivity disorder (ADHD) have a high risk for substance use disorders (SUDs). Early identification of at-risk youth would help allocate scarce resources for prevention programs.MethodsPsychiatric and somatic diagnoses, family history of these disorders, measures of socioeconomic distress and information about birth complications were obtained from the national registers in Sweden for 19,787 children with ADHD born between 1989-1993. We trained 1) cross-sectional machine learning models using data available by age 17 to predict SUD diagnosis between ages 18-19; and 2) a longitudinal model to predict new diagnoses at each age.ResultsThe area under the receiver operating characteristic curve (AUC) was 0.73 and 0.71 for the random forest and multilayer perceptron cross-sectional models. A prior diagnosis of SUD was the most important predictor, accounting for 25% of correct predictions. However, after excluding this predictor, our model still significantly predicted the first-time diagnosis of SUD during age 18-19 with an AUC of 0.67. The average of the AUCs from longitudinal models predicting new diagnoses one, two, five and ten years in the future was 0.63.ConclusionsSignificant predictions of at-risk co-morbid SUDs in individuals with ADHD can be achieved using population registry data, even many years prior to the first diagnosis. Longitudinal models can potentially monitor their risks over time. More work is needed to create prediction models based on electronic health records or linked population-registers that are sufficiently accurate for use in the clinic.

Download Full-text

Can machine learning augment clinician adjudication of events in cardiovascular trials? A case study of major adverse cardiovascular events (MACE) across CVRM trials

European Heart Journal ◽

10.1093/eurheartj/ehab724.3061 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

H Lea ◽

E Hutchinson ◽

A Meeson ◽

S Nampally ◽

G Dennis ◽

...

Keyword(s):

Machine Learning ◽

Clinical Trials ◽

Ischemic Stroke ◽

Cardiovascular Events ◽

Learning Algorithms ◽

Model Performance ◽

Machine Learning Algorithms ◽

Classification Models ◽

Using Data

Abstract Background and introduction Accurate identification of clinical outcome events is critical to obtaining reliable results in cardiovascular outcomes trials (CVOTs). Current processes for event adjudication are expensive and hampered by delays. As part of a larger project to more reliably identify outcomes, we evaluated the use of machine learning to automate event adjudication using data from the SOCRATES trial (NCT01994720), a large randomized trial comparing ticagrelor and aspirin in reducing risk of major cardiovascular events after acute ischemic stroke or transient ischemic attack (TIA). Purpose We studied whether machine learning algorithms could replicate the outcome of the expert adjudication process for clinical events of ischemic stroke and TIA. Could classification models be trained on historical CVOT data and demonstrate performance comparable to human adjudicators? Methods Using data from the SOCRATES trial, multiple machine learning algorithms were tested using grid search and cross validation. Models tested included Support Vector Machines, Random Forest and XGBoost. Performance was assessed on a validation subset of the adjudication data not used for training or testing in model development. Metrics used to evaluate model performance were Receiver Operating Characteristic (ROC), Matthews Correlation Coefficient, Precision and Recall. The contribution of features, attributes of data used by the algorithm as it is trained to classify an event, that contributed to a classification were examined using both Mutual Information and Recursive Feature Elimination. Results Classification models were trained on historical CVOT data using adjudicator consensus decision as the ground truth. Best performance was observed on models trained to classify ischemic stroke (ROC 0.95) and TIA (ROC 0.97). Top ranked features that contributed to classification of Ischemic Stroke or TIA corresponded to site investigator decision or variables used to define the event in the trial charter, such as duration of symptoms. Model performance was comparable across the different machine learning algorithms tested with XGBoost demonstrating the best ROC on the validation set for correctly classifying both stroke and TIA. Conclusions Our results indicate that machine learning may augment or even replace clinician adjudication in clinical trials, with potential to gain efficiencies, speed up clinical development, and retain reliability. Our current models demonstrate good performance at binary classification of ischemic stroke and TIA within a single CVOT with high consistency and accuracy between automated and clinician adjudication. Further work will focus on harmonizing features between multiple historical clinical trials and training models to classify several different endpoint events across trials. Our aim is to utilize these clinical trial datasets to optimize the delivery of CVOTs in further cardiovascular drug development. FUNDunding Acknowledgement Type of funding sources: Private company. Main funding source(s): AstraZenca Plc

Download Full-text