Cancer PRSweb – an Online Repository with Polygenic Risk Scores (PRS) for Major Cancer Traits and Their Phenome-wide Exploration in Two Independent Biobanks

AbstractTo facilitate scientific collaboration on polygenic risk scores (PRS) research, we created an extensive PRS online repository for 49 common cancer traits integrating freely available genome-wide association studies (GWAS) summary statistics from three sources: published GWAS, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWAS. Our framework condenses these summary statistics into PRS using various approaches such as linkage disequilibrium pruning / p-value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance, calibration, and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRS. We expect this integrated platform to accelerate PRS-related cancer research.

Download Full-text

Fine-tuning Polygenic Risk Scores with GWAS Summary Statistics

10.1101/810713 ◽

2019 ◽

Cited By ~ 4

Author(s):

Zijie Zhao ◽

Yanyao Yi ◽

Yuchang Wu ◽

Xiaoyuan Zhong ◽

Yupei Lin ◽

...

Keyword(s):

Association Studies ◽

Fine Tuning ◽

Risk Scores ◽

Training Dataset ◽

Validation Dataset ◽

P Value ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Polygenic Risk ◽

Model Tuning

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research. Notably, most PRS models include tuning parameters which improve predictive performance when properly selected. However, existing model-tuning methods require individual-level genetic data as the training dataset or as a validation dataset independent from both training and testing samples. These data rarely exist in practice, creating a significant gap between PRS methodology and applications. Here, we introduce PUMAS (Parameter-tuning Using Marginal Association Statistics), a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform a variety of model-tuning procedures (e.g. cross-validation) using GWAS summary statistics and can effectively benchmark and optimize PRS models under diverse genetic architecture. On average, PUMAS improves the predictive R2 by 205.6% and 62.5% compared to PRSs with arbitrary p-value cutoffs of 0.01 and 1, respectively. Applied to 211 neuroimaging traits and Alzheimer’s disease, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis. We believe our method resolves a fundamental problem without a current solution and will greatly benefit genetic prediction applications.

Download Full-text

Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007565 ◽

2020 ◽

Vol 16 (2) ◽

pp. e1007565 ◽

Cited By ~ 1

Author(s):

Shuang Song ◽

Wei Jiang ◽

Lin Hou ◽

Hongyu Zhao

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Size Distributions ◽

Summary Statistics ◽

Polygenic Risk ◽

Genome Wide

Download Full-text

Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries

Genome Biology ◽

10.1186/s13059-021-02591-w ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Yanyu Liang ◽

Milton Pividori ◽

Ani Manichaikul ◽

Abraham A. Palmer ◽

Nancy J. Cox ◽

...

Keyword(s):

Association Studies ◽

Poor Performance ◽

Genome Wide Association ◽

European Ancestry ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Transcript Levels ◽

Polygenic Risk ◽

Genome Wide

Abstract Background Polygenic risk scores (PRS) are valuable to translate the results of genome-wide association studies (GWAS) into clinical practice. To date, most GWAS have been based on individuals of European-ancestry leading to poor performance in populations of non-European ancestry. Results We introduce the polygenic transcriptome risk score (PTRS), which is based on predicted transcript levels (rather than SNPs), and explore the portability of PTRS across populations using UK Biobank data. Conclusions We show that PTRS has a significantly higher portability (Wilcoxon p=0.013) in the African-descent samples where the loss of performance is most acute with better performance than PRS when used in combination.

Download Full-text

Efficient implementation of penalized regression for genetic risk prediction

10.1101/403337 ◽

2018 ◽

Cited By ~ 1

Author(s):

Florian Privé ◽

Hugues Aschard ◽

Michael G.B. Blum

Keyword(s):

Logistic Regression ◽

Genetic Risk ◽

Association Studies ◽

Predictive Performance ◽

Penalized Regression ◽

Risk Scores ◽

P Value ◽

Genome Wide Association Studies ◽

Polygenic Risk ◽

Penalized Logistic Regression

AbstractPolygenic Risk Scores (PRS) consist in combining the information across many single-nucleotide polymorphisms (SNPs) in a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The “Clumping+Thresholding” (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T.In this paper, we present an efficient method to jointly estimate SNP effects, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. The choice of hyper-parameters for a predictive model is very important since it can dramatically impact its predictive performance. As an example, AUC values range from less than 60% to 90% in a model with 30 causal SNPs, depending on the p-value threshold in C+T.We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. PLR consistently achieves higher predictive performance than the two other methods while being as fast as C+T. We find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC of 89% and of 82.5%.In conclusion, our study demonstrates that penalized logistic regression can achieve more discriminative polygenic risk scores, while being applicable to large-scale individual-level data thanks to the implementation we provide in the R package bigstatsr.

Download Full-text

S173. GENOME-WIDE ASSOCIATION STUDIES OF SCHIZOPHRENIA AND BIPOLAR DISORDER IN A DIVERSE COHORT OF US VETERANS

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa031.239 ◽

2020 ◽

Vol 46 (Supplement_1) ◽

pp. S103-S103

Author(s):

Tim Bigdeli ◽

Ayman Fanous ◽

Nallakkandi Rajeevan ◽

Frederick Sayward ◽

Yuli Li ◽

...

Keyword(s):

Bipolar Disorder ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Polygenic Risk ◽

Genome Wide ◽

Us Veterans

Abstract Background Schizophrenia and bipolar disorder are debilitating neuropsychiatric illnesses collectively affecting 2% of the world’s population, and which cause tremendous human suffering that impacts patients, their families and their communities. Recognizing the major impact of these disorders on the psychosocial function of more than 200,000 US Veterans, the Department of Veterans Affairs (VA) recently genotyping of nearly 9,000 veterans with schizophrenia or bipolar I disorder in Cooperative Studies Program (CSP) #572: “Genetics of Functional Disability in Schizophrenia and Bipolar Illness”, all of whom were extensively assessed for neurocognitive function and disability, and genotyped using a custom Affymetrix Axiom Biobank array. Methods Primary genome-wide association studies (GWAS) of schizophrenia and bipolar disorder were performed across and within ancestry goups, with attempted replication in matched subjects from the PGC and Genomic Psychiatry Cohort (GPC). We combined results for CSP#572 with available summary statistics from the PGC, Indonesia Schizophrenia Consortium and Genetic REsearch on schizophreniA neTwork-China and Netherland (GREAT-CN) study, and multi-ethnic GPC cohorts, achieving among the largest and most diverse studies of these disorders to date. Results Polygenic risk scores based on published PGC summary statistics for schizophrenia or bipolar disorder were significantly associated with case status among EA (P<10–30) and AA (P<0.0005) participants in CSP#572. Our primary analyses of schizophrenia yielded a single genome-wide significant association with variants in CHD7 at 8q12.2 for European-American (EA) participants, which remained significant in a joint analysis of EA and African-American (AA) subjects (P=4.62e-08). While no genome-wide significant associations were detected by our within-ancestry analyses of bipolar disorder, a cross-ancestry meta-analysis of CSP#572 participants yielded a significant finding at 10q25 with variants in SORCS3 (P=2.62e-08). Among loci attaining P<0.0001 in our within-ancestry analyses, 4 and 8 subsequently achieved genome-wide significance, respectively, when jointly analyzed with matched subjects from the PGC and GPC. Combining our results with published summary statistics, we performed a cross-ancestry GWAS meta-analysis of 69,280 schizophrenia cases and 138,379 controls, identifying 200 genome-wide significant loci of which 76 are newly reported here. Cross-ancestry analysis of 28,326 bipolar cases and 90,570 controls identified 24 genome-wide significant loci, including novel associations with common variants in PAX5, DOCK2, MACROD2, BRE, KCNG1, and LINC01378. Discussion We newly describe genome-wide analyses in a diverse cohort of US Veterans with schizophrenia or bipolar disorder, benchmarking the predictive value of polygenic risk scores based on published GWAS findings. Leveraging available summary statistics from studies of global populations, we add to burgeoning lists of genomic loci implicated in the etiologies of these disorders.

Download Full-text

A meta-analysis of polygenic risk scores for mood disorders, neuroticism, and schizophrenia in antidepressant response

10.1101/2021.05.28.21257812 ◽

2021 ◽

Author(s):

Giuseppe Fanelli ◽

Katharina Domschke ◽

Alessandra Minelli ◽

Massimo Gennarelli ◽

Paolo Martini ◽

...

Keyword(s):

Antidepressant Treatment ◽

Association Studies ◽

Meta Analysis ◽

Predictive Performance ◽

Risk Scores ◽

Clinical Samples ◽

Genome Wide Association Studies ◽

Bonferroni Correction ◽

Polygenic Risk ◽

Genome Wide

About two-thirds of patients with major depressive disorder (MDD) fail to achieve symptom remission after the initial antidepressant treatment. Despite a role of genetic factors was proven, the specific underpinnings are not fully understood yet. Polygenic risk scores (PRSs), which summarise the additive effect of multiple risk variants across the genome, might provide insights into the underlying genetics. This study aims to investigate the possible association of PRSs for bipolar disorder, MDD, neuroticism, and schizophrenia (SCZ) with antidepressant non-response or non-remission in patients with MDD. PRSs were calculated at eight genome-wide P-thresholds based on publicly available summary statistics of the largest genome-wide association studies. Logistic regressions were performed between PRSs and non-response or non-remission in six European clinical samples, adjusting for age, sex, baseline symptom severity, recruitment sites, and population stratification. Results were meta-analysed across samples, including up to 3,637 individuals. Bonferroni correction was applied. In the meta-analysis, no result was significant after Bonferroni correction. The top result was found for MDD-PRS and non-remission (p=0.004), with patients in the highest vs. lowest PRS quintile being more likely not to achieve remission (OR=1.5, 95% CI=1.11-1.98, p=0.007). Nominal associations were also found between MDD-PRS and non-response (p=0.013), as well as between SCZ-PRS and non-remission (p=0.035). Although PRSs are still not able to predict non-response or non-remission, our results are in line with previous works; methodological improvements in PRSs calculation may improve their predictive performance and have a meaningful role in precision psychiatry.

Download Full-text

Reproducibility in the UK Biobank of Genome-Wide Significant Signals Discovered in Earlier Genome-wide Association Studies

10.1101/2020.06.24.20139576 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack W. O’Sullivan ◽

John P. A. Ioannidis

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Single Nucleotide ◽

Genome Wide ◽

The Uk ◽

Open Question

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.

Download Full-text

Age-related late-onset disease heritability patterns and implications for genome-wide association studies

10.1101/349019 ◽

2018 ◽

Cited By ~ 1

Author(s):

Roman Teo Oliynyk

Keyword(s):

Old Age ◽

Cumulative Incidence ◽

Late Onset ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Cerebral Stroke ◽

Genome Wide Association Studies ◽

Polygenic Risk ◽

Genome Wide

AbstractBackgroundGenome-wide association studies and other computational biology techniques are gradually discovering the causal gene variants that contribute to late-onset human diseases. After more than a decade of genome-wide association study efforts, these can account for only a fraction of the heritability implied by familial studies, the so-called “missing heritability” problem.MethodsComputer simulations of polygenic late-onset diseases in an aging population have quantified the risk allele frequency decrease at older ages caused by individuals with higher polygenic risk scores becoming ill proportionately earlier. This effect is most prominent for diseases characterized by high cumulative incidence and high heritability, examples of which include Alzheimer’s disease, coronary artery disease, cerebral stroke, and type 2 diabetes.ResultsThe incidence rate for late-onset diseases grows exponentially for decades after early onset ages, guaranteeing that the cohorts used for genome-wide association studies overrepresent older individuals with lower polygenic risk scores, whose disease cases are disproportionately due to environmental causes such as old age itself. This mechanism explains the decline in clinical predictive power with age and the lower discovery power of familial studies of heritability and genome-wide association studies. It also explains the relatively constant-with-age heritability found for late-onset diseases of lower prevalence, exemplified by cancers.ConclusionsFor late-onset polygenic diseases showing high cumulative incidence together with high initial heritability, rather than using relatively old age-matched cohorts, study cohorts combining the youngest possible cases with the oldest possible controls may significantly improve the discovery power of genome-wide association studies.

Download Full-text

Polygenic Risk Scores for Kidney Function and Their Associations with Circulating Proteome, and Incident Kidney Diseases

Journal of the American Society of Nephrology ◽

10.1681/asn.2020111599 ◽

2021 ◽

pp. ASN.2020111599

Author(s):

Zhi Yu ◽

Jin Jin ◽

Adrienne Tin ◽

Anna Köttgen ◽

Bing Yu ◽

...

Keyword(s):

Kidney Function ◽

Kidney Diseases ◽

Association Studies ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Plasma Proteome ◽

Uk Biobank ◽

Polygenic Risk ◽

Genome Wide ◽

A Genome

Background: Genome-wide association studies (GWAS) have revealed numerous loci for kidney function (estimated glomerular filtration rate, eGFR). The relationship of polygenic predictors of eGFR, risk of incident adverse kidney outcomes, and the plasma proteome is not known. Methods: We developed a genome-wide polygenic risk score (PRS) for eGFR by applying the LDpred algorithm to summary statistics generated from a multiethnic meta-analysis of CKDGen Consortium GWAS (N=765,348) and UK Biobank GWAS (90% of the cohort; N=451,508), followed by best parameter selection using the remaining 10% of UK Biobank (N=45,158). We then tested the association of the PRS in the Atherosclerosis Risk in Communities (ARIC) study (N=8,866) with incident chronic kidney disease, kidney failure, and acute kidney injury. We also examined associations between the PRS and 4,877 plasma proteins measured at at middle age and older adulthood and evaluated mediation of PRS associations by eGFR. Results: The developed PRS showed significant associations with all outcomes with hazard ratios (95% CI) per 1 SD lower PRS ranged from 1.06 (1.01, 1.11) to 1.33 (1.28, 1.37). The PRS was significantly associated with 132 proteins at both time points. The strongest associations were with cystatin-C, collagen alpha-1(XV) chain, and desmocollin-2. Most proteins were higher at lower kidney function, except for 5 proteins including testican-2. Most correlations of the genetic PRS with proteins were mediated by eGFR. Conclusions: A PRS for eGFR is now sufficiently strong to capture risk for a spectrum of incident kidney diseases and broadly influences the plasma proteome, primarily mediated by eGFR.

Download Full-text