scholarly journals Fine-tuning Polygenic Risk Scores with GWAS Summary Statistics

2019 ◽  
Author(s):  
Zijie Zhao ◽  
Yanyao Yi ◽  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yupei Lin ◽  
...  

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research. Notably, most PRS models include tuning parameters which improve predictive performance when properly selected. However, existing model-tuning methods require individual-level genetic data as the training dataset or as a validation dataset independent from both training and testing samples. These data rarely exist in practice, creating a significant gap between PRS methodology and applications. Here, we introduce PUMAS (Parameter-tuning Using Marginal Association Statistics), a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform a variety of model-tuning procedures (e.g. cross-validation) using GWAS summary statistics and can effectively benchmark and optimize PRS models under diverse genetic architecture. On average, PUMAS improves the predictive R2 by 205.6% and 62.5% compared to PRSs with arbitrary p-value cutoffs of 0.01 and 1, respectively. Applied to 211 neuroimaging traits and Alzheimer’s disease, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis. We believe our method resolves a fundamental problem without a current solution and will greatly benefit genetic prediction applications.

Author(s):  
Lars G. Fritsche ◽  
Snehal Patil ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Maxwell Salvatore ◽  
...  

AbstractTo facilitate scientific collaboration on polygenic risk scores (PRS) research, we created an extensive PRS online repository for 49 common cancer traits integrating freely available genome-wide association studies (GWAS) summary statistics from three sources: published GWAS, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWAS. Our framework condenses these summary statistics into PRS using various approaches such as linkage disequilibrium pruning / p-value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance, calibration, and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRS. We expect this integrated platform to accelerate PRS-related cancer research.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zijie Zhao ◽  
Yanyao Yi ◽  
Jie Song ◽  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
...  

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research, but often include tuning parameters which are difficult to optimize in practice due to limited access to individual-level data. Here, we introduce PUMAS, a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform various model-tuning procedures using GWAS summary statistics and effectively benchmark and optimize PRS models under diverse genetic architecture. Furthermore, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis.


2018 ◽  
Author(s):  
Florian Privé ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractPolygenic Risk Scores (PRS) consist in combining the information across many single-nucleotide polymorphisms (SNPs) in a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The “Clumping+Thresholding” (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T.In this paper, we present an efficient method to jointly estimate SNP effects, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. The choice of hyper-parameters for a predictive model is very important since it can dramatically impact its predictive performance. As an example, AUC values range from less than 60% to 90% in a model with 30 causal SNPs, depending on the p-value threshold in C+T.We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. PLR consistently achieves higher predictive performance than the two other methods while being as fast as C+T. We find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC of 89% and of 82.5%.In conclusion, our study demonstrates that penalized logistic regression can achieve more discriminative polygenic risk scores, while being applicable to large-scale individual-level data thanks to the implementation we provide in the R package bigstatsr.


2020 ◽  
Vol 46 (Supplement_1) ◽  
pp. S103-S103
Author(s):  
Tim Bigdeli ◽  
Ayman Fanous ◽  
Nallakkandi Rajeevan ◽  
Frederick Sayward ◽  
Yuli Li ◽  
...  

Abstract Background Schizophrenia and bipolar disorder are debilitating neuropsychiatric illnesses collectively affecting 2% of the world’s population, and which cause tremendous human suffering that impacts patients, their families and their communities. Recognizing the major impact of these disorders on the psychosocial function of more than 200,000 US Veterans, the Department of Veterans Affairs (VA) recently genotyping of nearly 9,000 veterans with schizophrenia or bipolar I disorder in Cooperative Studies Program (CSP) #572: “Genetics of Functional Disability in Schizophrenia and Bipolar Illness”, all of whom were extensively assessed for neurocognitive function and disability, and genotyped using a custom Affymetrix Axiom Biobank array. Methods Primary genome-wide association studies (GWAS) of schizophrenia and bipolar disorder were performed across and within ancestry goups, with attempted replication in matched subjects from the PGC and Genomic Psychiatry Cohort (GPC). We combined results for CSP#572 with available summary statistics from the PGC, Indonesia Schizophrenia Consortium and Genetic REsearch on schizophreniA neTwork-China and Netherland (GREAT-CN) study, and multi-ethnic GPC cohorts, achieving among the largest and most diverse studies of these disorders to date. Results Polygenic risk scores based on published PGC summary statistics for schizophrenia or bipolar disorder were significantly associated with case status among EA (P<10–30) and AA (P<0.0005) participants in CSP#572. Our primary analyses of schizophrenia yielded a single genome-wide significant association with variants in CHD7 at 8q12.2 for European-American (EA) participants, which remained significant in a joint analysis of EA and African-American (AA) subjects (P=4.62e-08). While no genome-wide significant associations were detected by our within-ancestry analyses of bipolar disorder, a cross-ancestry meta-analysis of CSP#572 participants yielded a significant finding at 10q25 with variants in SORCS3 (P=2.62e-08). Among loci attaining P<0.0001 in our within-ancestry analyses, 4 and 8 subsequently achieved genome-wide significance, respectively, when jointly analyzed with matched subjects from the PGC and GPC. Combining our results with published summary statistics, we performed a cross-ancestry GWAS meta-analysis of 69,280 schizophrenia cases and 138,379 controls, identifying 200 genome-wide significant loci of which 76 are newly reported here. Cross-ancestry analysis of 28,326 bipolar cases and 90,570 controls identified 24 genome-wide significant loci, including novel associations with common variants in PAX5, DOCK2, MACROD2, BRE, KCNG1, and LINC01378. Discussion We newly describe genome-wide analyses in a diverse cohort of US Veterans with schizophrenia or bipolar disorder, benchmarking the predictive value of polygenic risk scores based on published GWAS findings. Leveraging available summary statistics from studies of global populations, we add to burgeoning lists of genomic loci implicated in the etiologies of these disorders.


Author(s):  
Niccolo’ Tesi ◽  
Sven J van der Lee ◽  
Marc Hulsman ◽  
Iris E Jansen ◽  
Najada Stringa ◽  
...  

Abstract Studying the genome of centenarians may give insights into the molecular mechanisms underlying extreme human longevity and the escape of age-related diseases. Here, we set out to construct polygenic risk scores (PRSs) for longevity and to investigate the functions of longevity-associated variants. Using a cohort of centenarians with maintained cognitive health (N = 343), a population-matched cohort of older adults from 5 cohorts (N = 2905), and summary statistics data from genome-wide association studies on parental longevity, we constructed a PRS including 330 variants that significantly discriminated between centenarians and older adults. This PRS was also associated with longer survival in an independent sample of younger individuals (p = .02), leading up to a 4-year difference in survival based on common genetic factors only. We show that this PRS was, in part, able to compensate for the deleterious effect of the APOE-ε4 allele. Using an integrative framework, we annotated the 330 variants included in this PRS by the genes they associate with. We find that they are enriched with genes associated with cellular differentiation, developmental processes, and cellular response to stress. Together, our results indicate that an extended human life span is, in part, the result of a constellation of variants each exerting small advantageous effects on aging-related biological mechanisms that maintain overall health and decrease the risk of age-related diseases.


2018 ◽  
Author(s):  
Tom G. Richardson ◽  
Sean Harrison ◽  
Gibran Hemani ◽  
George Davey Smith

AbstractThe age of large-scale genome-wide association studies (GWAS) has provided us with an unprecedented opportunity to evaluate the genetic liability of complex disease using polygenic risk scores (PRS). In this study, we have analysed 162 PRS (P<5×l0 05) derived from GWAS and 551 heritable traits from the UK Biobank study (N=334,398). Findings can be investigated using a web application (http://mrcieu.mrsoftware.org/PRS_atlas/), which we envisage will help uncover both known and novel mechanisms which contribute towards disease susceptibility.To demonstrate this, we have investigated the results from a phenome-wide evaluation of schizophrenia genetic liability. Amongst findings were inverse associations with measures of cognitive function which extensive follow-up analyses using Mendelian randomization (MR) provided evidence of a causal relationship. We have also investigated the effect of multiple risk factors on disease using mediation and multivariable MR frameworks. Our atlas provides a resource for future endeavours seeking to unravel the causal determinants of complex disease.


2018 ◽  
Author(s):  
Roman Teo Oliynyk

AbstractBackgroundGenome-wide association studies and other computational biology techniques are gradually discovering the causal gene variants that contribute to late-onset human diseases. After more than a decade of genome-wide association study efforts, these can account for only a fraction of the heritability implied by familial studies, the so-called “missing heritability” problem.MethodsComputer simulations of polygenic late-onset diseases in an aging population have quantified the risk allele frequency decrease at older ages caused by individuals with higher polygenic risk scores becoming ill proportionately earlier. This effect is most prominent for diseases characterized by high cumulative incidence and high heritability, examples of which include Alzheimer’s disease, coronary artery disease, cerebral stroke, and type 2 diabetes.ResultsThe incidence rate for late-onset diseases grows exponentially for decades after early onset ages, guaranteeing that the cohorts used for genome-wide association studies overrepresent older individuals with lower polygenic risk scores, whose disease cases are disproportionately due to environmental causes such as old age itself. This mechanism explains the decline in clinical predictive power with age and the lower discovery power of familial studies of heritability and genome-wide association studies. It also explains the relatively constant-with-age heritability found for late-onset diseases of lower prevalence, exemplified by cancers.ConclusionsFor late-onset polygenic diseases showing high cumulative incidence together with high initial heritability, rather than using relatively old age-matched cohorts, study cohorts combining the youngest possible cases with the oldest possible controls may significantly improve the discovery power of genome-wide association studies.


Author(s):  
Tim B Bigdeli ◽  
Ayman H Fanous ◽  
Yuli Li ◽  
Nallakkandi Rajeevan ◽  
Frederick Sayward ◽  
...  

Abstract Background Schizophrenia (SCZ) and bipolar disorder (BIP) are debilitating neuropsychiatric disorders, collectively affecting 2% of the world’s population. Recognizing the major impact of these psychiatric disorders on the psychosocial function of more than 200 000 US Veterans, the Department of Veterans Affairs (VA) recently completed genotyping of more than 8000 veterans with SCZ and BIP in the Cooperative Studies Program (CSP) #572. Methods We performed genome-wide association studies (GWAS) in CSP #572 and benchmarked the predictive value of polygenic risk scores (PRS) constructed from published findings. We combined our results with available summary statistics from several recent GWAS, realizing the largest and most diverse studies of these disorders to date. Results Our primary GWAS uncovered new associations between CHD7 variants and SCZ, and novel BIP associations with variants in Sortilin Related VPS10 Domain Containing Receptor 3 (SORCS3) and downstream of PCDH11X. Combining our results with published summary statistics for SCZ yielded 39 novel susceptibility loci including CRHR1, and we identified 10 additional findings for BIP (28 326 cases and 90 570 controls). PRS trained on published GWAS were significantly associated with case-control status among European American (P &lt; 10–30) and African American (P &lt; .0005) participants in CSP #572. Conclusions We have demonstrated that published findings for SCZ and BIP are robustly generalizable to a diverse cohort of US veterans. Leveraging available summary statistics from GWAS of global populations, we report 52 new susceptibility loci and improved fine-mapping resolution for dozens of previously reported associations.


Sign in / Sign up

Export Citation Format

Share Document