Improving Machine Learning Prediction of ADHD Using Gene Set Polygenic Risk Scores and Risk Scores from Genetically Correlated Phenotypes

Background: Polygenic risk scores (PRSs), which sum the effects of SNPs throughout the genome to measure risk afforded by common genetic variants, have improved our ability to estimate disorder risk for Attention-Deficit/Hyperactivity Disorder (ADHD) but the accuracy of risk prediction is rarely investigated. Methods: With the goal of improving risk prediction, we performed gene set analysis of GWAS data to select gene sets associated with ADHD within a training subset. For each selected gene set, we generated gene set polygenic risk scores (gsPRSs), which sum the effects of SNPs for each selected gene set. We created gsPRS for ADHD and for phenotypes having a high genetic correlation with ADHD. These gsPRS were added to the standard PRS as input to machine learning models predicting ADHD. We used feature importance scores to select gsPRS for a final model and to generate a ranking of the most consistently predictive gsPRS. Results: For a test subset that had not been used for training or validation, a random forest (RF) model using PRSs from ADHD and genetically correlated phenotypes and an optimized group of 20 gsPRS had an area under the receiving operating characteristic curve (AUC) of 0.72 (95% CI: 0.70 to 0.74). This AUC was a statistically significant improvement over logistic regression models and RF models using only PRS from ADHD and genetically correlated phenotypes. Conclusions: Summing risk at the gene set level and incorporating genetic risk from disorders with high genetic correlations with ADHD improved the accuracy of predicting ADHD. Learning curves suggest that additional improvements would be expected with larger study sizes. Our study suggests that better accounting of genetic risk and the genetic context of allelic differences results in more predictive models.

Download Full-text

Risk Prediction Using Polygenic Risk Scores for Prevention of Stroke and Other Cardiovascular Diseases

Stroke ◽

10.1161/strokeaha.120.032619 ◽

2021 ◽

Author(s):

Gad Abraham ◽

Loes Rutten-Jacobs ◽

Michael Inouye

Keyword(s):

Risk Prediction ◽

Risk Scores ◽

Future Research ◽

Cvd Risk ◽

Polygenic Risk ◽

Monogenic Disorders ◽

Current State ◽

Common Genetic Variants ◽

Artery Disease

Early prediction of risk of cardiovascular disease (CVD), including stroke, is a cornerstone of disease prevention. Clinical risk scores have been widely used for predicting CVD risk from known risk factors. Most CVDs have a substantial genetic component, which also has been confirmed for stroke in recent gene discovery efforts. However, the role of genetics in prediction of risk of CVD, including stroke, has been limited to testing for highly penetrant monogenic disorders. In contrast, the importance of polygenic variation, the aggregated effect of many common genetic variants across the genome with individually small effects, has become more apparent in the last 5 to 10 years, and powerful polygenic risk scores for CVD have been developed. Here we review the current state of the field of polygenic risk scores for CVD including stroke, and their potential to improve CVD risk prediction. We present findings and lessons from diseases such as coronary artery disease as these will likely be useful to inform future research in stroke polygenic risk prediction.

Download Full-text

29.3 IMPROVING MACHINE LEARNING PREDICTION OF ADHD USING GENE SET POLYGENIC RISK SCORES AND RISK SCORES FROM GENETICALLY CORRELATED DISORDERS

Journal of the American Academy of Child & Adolescent Psychiatry ◽

10.1016/j.jaac.2021.07.721 ◽

2021 ◽

Vol 60 (10) ◽

pp. S303

Author(s):

Stephen V. Faraone

Keyword(s):

Machine Learning ◽

Risk Scores ◽

Polygenic Risk ◽

Gene Set

Download Full-text

Integrating genome-wide polygenic risk scores and non-genetic risk factors to develop and validate risk prediction models for colorectal cancer

10.1101/2021.09.22.21263962 ◽

2021 ◽

Author(s):

Sarah EW Briggs ◽

Philip Law ◽

James E East ◽

Sarah Wordsworth ◽

Malcolm Dunlop ◽

...

Keyword(s):

Colorectal Cancer ◽

Risk Factors ◽

Risk Prediction ◽

Genetic Risk ◽

Prediction Models ◽

Fold Increase ◽

Risk Scores ◽

Risk Prediction Models ◽

Polygenic Risk ◽

Genome Wide

Objective While population screening programs for cancer colorectal (CRC) have proven benefit, risk-stratified approaches may improve screening outcomes further. To date, genome-wide polygenic risk scores (PRS) for CRC have not been integrated with non-genetic risk factors. We aimed to evaluate several genome-wide approaches, and the benefit of adding PRS to the QCancer-10 (colorectal cancer) non-genetic risk model, to identify those at highest risk of CRC. Design Using UK Biobank we developed and compared six different PRS for CRC. The top-performing genome-wide and GWAS-significant PRS were then combined with QCancer-10 and performance compared to QCancer-10 alone. Results PRS derived using LDpred2 software performed best, with an odds-ratio per standard deviation of 1.58, and top age- and sex-adjusted C-statistic of 0.733 in logistic regression and 0.724 in Cox regression models in the Geographic Validation Cohort. Integrated QCancer-10+PRS models out-performed QCancer-10, with C-statistics of 0.730 and 0.693, and explained variation of 28.1% and 21.0% from QCancer-10+LDpred2 and QCancer-10 respectively in men; performance improvements in women were similar. Men in the top 20% of risk accounted for 47.6% of cases, and women 42.5% using QCancer-10+LDpred2 models, with a 3.49-fold increase in risk in men and 2.75-fold increase in women in the top 5% of risk, compared to average risk. Decision curve analysis showed that adding PRS to QCancer-10 improved net-benefit and interventions avoided across most probability thresholds. Conclusion Integrated QCancer-10+PRS models out-perform existing CRC risk prediction models. Evaluation of risk stratified screening using this approach in a bowel screening population could be warranted.

Download Full-text

Genetic risk prediction of late‐onset Alzheimer’s disease based on tissue‐specific transcriptomic analysis and polygenic risk scores

Alzheimer s & Dementia ◽

10.1002/alz.045184 ◽

2020 ◽

Vol 16 (S2) ◽

Author(s):

Sang‐Hyuk Jung ◽

Kwangsik Nho ◽

Dokyoon Kim ◽

Hong‐Hee Won ◽

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Risk Prediction ◽

Genetic Risk ◽

Late Onset ◽

Transcriptomic Analysis ◽

Risk Scores ◽

Tissue Specific ◽

Polygenic Risk ◽

Genetic Risk Prediction

Download Full-text

Effects of polygenic risk for major mental disorders and cross-disorder on cortical complexity

Psychological Medicine ◽

10.1017/s0033291721001082 ◽

2021 ◽

pp. 1-12

Author(s):

Simon Schmitt ◽

Tina Meller ◽

Frederike Stein ◽

Katharina Brosch ◽

Kai Ringwald ◽

...

Keyword(s):

Bipolar Disorder ◽

Mental Disorders ◽

Cortical Thickness ◽

Genetic Risk ◽

Right Hemisphere ◽

Autism Spectrum ◽

Risk Scores ◽

Grey Matter Volume ◽

Polygenic Risk ◽

Risk For Depression

Abstract Background MRI-derived cortical folding measures are an indicator of largely genetically driven early developmental processes. However, the effects of genetic risk for major mental disorders on early brain development are not well understood. Methods We extracted cortical complexity values from structural MRI data of 580 healthy participants using the CAT12 toolbox. Polygenic risk scores (PRS) for schizophrenia, bipolar disorder, major depression, and cross-disorder (incorporating cumulative genetic risk for depression, schizophrenia, bipolar disorder, autism spectrum disorder, and attention-deficit hyperactivity disorder) were computed and used in separate general linear models with cortical complexity as the regressand. In brain regions that showed a significant association between polygenic risk for mental disorders and cortical complexity, volume of interest (VOI)/region of interest (ROI) analyses were conducted to investigate additional changes in their volume and cortical thickness. Results The PRS for depression was associated with cortical complexity in the right orbitofrontal cortex (right hemisphere: p = 0.006). A subsequent VOI/ROI analysis showed no association between polygenic risk for depression and either grey matter volume or cortical thickness. We found no associations between cortical complexity and polygenic risk for either schizophrenia, bipolar disorder or psychiatric cross-disorder when correcting for multiple testing. Conclusions Changes in cortical complexity associated with polygenic risk for depression might facilitate well-established volume changes in orbitofrontal cortices in depression. Despite the absence of psychopathology, changed cortical complexity that parallels polygenic risk for depression might also change reward systems, which are also structurally affected in patients with depressive syndrome.

Download Full-text

Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores

Nature Communications ◽

10.1038/s41467-021-24082-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ganna Leonenko ◽

Emily Baker ◽

Joshua Stevenson-Hoare ◽

Annerieke Sierksma ◽

Mark Fiers ◽

...

Keyword(s):

High Risk ◽

Risk Score ◽

Genetic Risk ◽

Prediction Accuracy ◽

Genetic Risk Score ◽

Low Risk ◽

Risk Scores ◽

Sample Mean ◽

Polygenic Risk ◽

Reliable Identification

AbstractPolygenic Risk Scores (PRS) for AD offer unique possibilities for reliable identification of individuals at high and low risk of AD. However, there is little agreement in the field as to what approach should be used for genetic risk score calculations, how to model the effect of APOE, what the optimal p-value threshold (pT) for SNP selection is and how to compare scores between studies and methods. We show that the best prediction accuracy is achieved with a model with two predictors (APOE and PRS excluding APOE region) with pT<0.1 for SNP selection. Prediction accuracy in a sample across different PRS approaches is similar, but individuals’ scores and their associated ranking differ. We show that standardising PRS against the population mean, as opposed to the sample mean, makes the individuals’ scores comparable between studies. Our work highlights the best strategies for polygenic profiling when assessing individuals for AD risk.

Download Full-text

Dynamic Detection of Delayed Cerebral Ischemia Using Machine Learning

10.1101/2020.04.15.20067041 ◽

2020 ◽

Author(s):

Murad Megjhani ◽

Kalijah Terilli ◽

Ayham Alkhachroum ◽

David J. Roh ◽

Sachin Agarwal ◽

...

Keyword(s):

Machine Learning ◽

Cerebral Ischemia ◽

Characteristic Curve ◽

Delayed Cerebral Ischemia ◽

Risk Scores ◽

Support Vector ◽

Model Parameters ◽

Learning Approaches ◽

Physiologic Data ◽

Over Time

AbstractObjectiveTo develop a machine learning based tool, using routine vital signs, to assess delayed cerebral ischemia (DCI) risk over time.MethodsIn this retrospective analysis, physiologic data for 540 consecutive acute subarachnoid hemorrhage patients were collected and annotated as part of a prospective observational cohort study between May 2006 and December 2014. Patients were excluded if (i) no physiologic data was available, (ii) they expired prior to the DCI onset window (< post bleed day 3) or (iii) early angiographic vasospasm was detected on admitting angiogram. DCI was prospectively labeled by consensus of treating physicians. Occurrence of DCI was classified using various machine learning approaches including logistic regression, random forest, support vector machine (linear and kernel), and an ensemble classifier, trained on vitals and subject characteristic features. Hourly risk scores were generated as the posterior probability at time t. We performed five-fold nested cross validation to tune the model parameters and to report the accuracy. All classifiers were evaluated for good discrimination using the area under the receiver operating characteristic curve (AU-ROC) and confusion matrices.ResultsOf 310 patients included in our final analysis, 101 (32.6%) patients developed DCI. We achieved maximal classification of 0.81 [0.75-0.82] AU-ROC. We also predicted 74.7 % of all DCI events 12 hours before typical clinical detection with a ratio of 3 true alerts for every 2 false alerts.ConclusionA data-driven machine learning based detection tool offered hourly assessments of DCI risk and incorporated new physiologic information over time.

Download Full-text

Molecular Genetic Risk for Psychosis Is Associated With Psychosis Risk Symptoms in a Population-Based UK Cohort: Findings From Generation Scotland

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa042 ◽

2020 ◽

Vol 46 (5) ◽

pp. 1045-1052

Author(s):

Anna R Docherty ◽

Andrey A Shabalin ◽

Daniel E Adkins ◽

Frank Mann ◽

Robert F Krueger ◽

...

Keyword(s):

Genetic Risk ◽

Molecular Genetic ◽

Large Population ◽

Genetic Data ◽

Population Based ◽

Health Study ◽

Polygenic Risk ◽

Generation Scotland ◽

Common Genetic Variants ◽

Psychosis Risk

Abstract Objective Subthreshold psychosis risk symptoms in the general population may be associated with molecular genetic risk for psychosis. This study sought to optimize the association of risk symptoms with genetic risk for psychosis in a large population-based cohort in the UK (N = 9104 individuals 18–65 years of age) by properly accounting for population stratification, factor structure, and sex. Methods The newly expanded Generation Scotland: Scottish Family Health Study includes 5391 females and 3713 males with age M [SD] = 45.2 [13] with both risk symptom data and genetic data. Subthreshold psychosis symptoms were measured using the Schizotypal Personality Questionnaire-Brief (SPQ-B) and calculation of polygenic risk for schizophrenia was based on 11 425 349 imputed common genetic variants passing quality control. Follow-up examination of other genetic risks included attention-deficit hyperactivity disorder (ADHD), autism, bipolar disorder, major depression, and neuroticism. Results Empirically derived symptom factor scores reflected interpersonal/negative symptoms and were positively associated with polygenic risk for schizophrenia. This signal was largely sex specific and limited to males. Across both sexes, scores were positively associated with neuroticism and major depressive disorder. Conclusions A data-driven phenotypic analysis enabled detection of association with genetic risk for schizophrenia in a population-based sample. Multiple polygenic risk signals and important sex differences suggest that genetic data may be useful in improving future phenotypic risk assessment.

Download Full-text