scholarly journals A principal component approach to improve association testing with polygenic risk scores

2019 ◽  
Author(s):  
Brandon J. Coombes ◽  
Joanna M. Biernacka

AbstractPolygenic risk scores (PRSs) have become an increasingly popular approach for demonstrating polygenic influences on complex traits and for establishing common polygenic signals between different traits. PRSs are typically constructed using pruning and thresholding (P+T), but the best choice of parameters is uncertain; thus multiple settings are used and the best is chosen. This optimization can lead to inflated type I error. To correct this, permutation procedures can be used but they can be computationally intensive. Alternatively, a single parameter setting can be chosen a priori for the PRS, but choosing suboptimal settings result in loss of power. We propose computing PRSs under a range of parameter settings, performing principal component analysis (PCA) on the resulting set of PRSs, and using the first PRS-PC in association tests. The first PC reweights the variants included in the PRS with new weights to achieve maximum variation over all PRS settings used. Using simulations, we compare the performance of the proposed PRS-PCA approach with a permutation test and a priori selection of p-value threshold. We then apply the approach to the Mayo Clinic Bipolar Disorder Biobank study to test for PRS association with psychosis using a variety of PRSs constructed from summary statistics from the largest studies of psychiatric disorders and related traits. The PRS-PCA approach is simple to implement, outperforms the other strategies in most scenarios, and provides an unbiased estimate of prediction performance. We therefore recommend it to be used PRS association studies where multiple phenotypes and/or PRSs are being investigated.

2018 ◽  
Author(s):  
Tao He ◽  
Shaoyu Li ◽  
Ping-Shou Zhong ◽  
Yuehua Cui

ABSTRACTSingle-variant based genome-wide association studies have successfully detected many genetic variants that are associated with many complex traits. However, their power is limited due to weak marginal signals and ignoring potential complex interactions among genetic variants. Set-based strategy was proposed to provide a remedy where multiple genetic variants in a given set (e.g., gene or pathway) are jointly evaluated, so that the systematic effect of the set is considered. Among many, the kernel-based testing (KBT) framework is one of the most popular and powerful methods in set-based association studies. Given a set of candidate kernels, method has been proposed to choose the one with the smallest p-value. Such a method, however, can yield inflated type I error, especially when the number of variants in a set is large. Alternatively one can get p-values by permutations which, however, could be very time consuming. In this work, we proposed an efficient testing procedure that can not only control type I error rate but also generate power close to the one obtained under the optimal kernel. Our method is built upon the KBT framework and is based on asymptotic results under a high-dimensional setting. Hence it can efficiently deal with the case where the number of variants in a set is much larger than the sample size. Both simulation and real data analysis demonstrate the advantages of the method compared with its counterparts.


2019 ◽  
Author(s):  
Zijie Zhao ◽  
Yanyao Yi ◽  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yupei Lin ◽  
...  

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research. Notably, most PRS models include tuning parameters which improve predictive performance when properly selected. However, existing model-tuning methods require individual-level genetic data as the training dataset or as a validation dataset independent from both training and testing samples. These data rarely exist in practice, creating a significant gap between PRS methodology and applications. Here, we introduce PUMAS (Parameter-tuning Using Marginal Association Statistics), a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform a variety of model-tuning procedures (e.g. cross-validation) using GWAS summary statistics and can effectively benchmark and optimize PRS models under diverse genetic architecture. On average, PUMAS improves the predictive R2 by 205.6% and 62.5% compared to PRSs with arbitrary p-value cutoffs of 0.01 and 1, respectively. Applied to 211 neuroimaging traits and Alzheimer’s disease, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis. We believe our method resolves a fundamental problem without a current solution and will greatly benefit genetic prediction applications.


2020 ◽  
Author(s):  
Jiawen Chen ◽  
Jing You ◽  
Zijie Zhao ◽  
Zheng Ni ◽  
Kunling Huang ◽  
...  

AbstractPolygenic risk scores (PRS) derived from summary statistics of genome-wide association studies (GWAS) have enjoyed great popularity in human genetics research. Applied to population cohorts, PRS can effectively stratify individuals by risk group and has promising applications in early diagnosis and clinical intervention. However, our understanding of within-family polygenic risk is incomplete, in part because the small samples per family significantly limits power. Here, to address this challenge, we introduce ORIGAMI, a computational framework that uses parental genotype data to simulate offspring genomes. ORIGAMI uses state-of-the-art genetic maps to simulate realistic recombination events on phased parental genomes and allows quantifying the prospective PRS variability within each family. We quantify and showcase the substantially reduced yet highly heterogeneous PRS variation within families for numerous complex traits. Further, we incorporate within-family PRS variability to improve polygenic transmission disequilibrium test (pTDT). Through simulations, we demonstrate that modeling within-family risk substantially improves the statistical power of pTDT. Applied to 7,805 trios of autism spectrum disorder (ASD) probands and healthy parents, we successfully replicated previously reported over-transmission of ASD, educational attainment, and schizophrenia risk, and identified multiple novel traits with significant transmission disequilibrium. These results provided novel etiologic insights into the shared genetic basis of various complex traits and ASD.


2018 ◽  
Author(s):  
Florian Privé ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractPolygenic Risk Scores (PRS) consist in combining the information across many single-nucleotide polymorphisms (SNPs) in a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The “Clumping+Thresholding” (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T.In this paper, we present an efficient method to jointly estimate SNP effects, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. The choice of hyper-parameters for a predictive model is very important since it can dramatically impact its predictive performance. As an example, AUC values range from less than 60% to 90% in a model with 30 causal SNPs, depending on the p-value threshold in C+T.We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. PLR consistently achieves higher predictive performance than the two other methods while being as fast as C+T. We find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC of 89% and of 82.5%.In conclusion, our study demonstrates that penalized logistic regression can achieve more discriminative polygenic risk scores, while being applicable to large-scale individual-level data thanks to the implementation we provide in the R package bigstatsr.


Author(s):  
Lars G. Fritsche ◽  
Snehal Patil ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Maxwell Salvatore ◽  
...  

AbstractTo facilitate scientific collaboration on polygenic risk scores (PRS) research, we created an extensive PRS online repository for 49 common cancer traits integrating freely available genome-wide association studies (GWAS) summary statistics from three sources: published GWAS, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWAS. Our framework condenses these summary statistics into PRS using various approaches such as linkage disequilibrium pruning / p-value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance, calibration, and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRS. We expect this integrated platform to accelerate PRS-related cancer research.


Genetics ◽  
2001 ◽  
Vol 157 (2) ◽  
pp. 885-897 ◽  
Author(s):  
Hong-Wen Deng ◽  
Wei-Min Chen ◽  
Robert R Recker

Abstract In association studies searching for genes underlying complex traits, the results are often inconsistent, and population admixture has been recognized qualitatively as one major potential cause. Hardy-Weinberg equilibrium (HWE) is often employed to test for population admixture; however, its power is generally unknown. Through analytical and simulation approaches, we quantify the power of the HWE test for population admixture and the effects of population admixture on increasing the type I error rate of association studies under various scenarios of population differentiation and admixture. We found that (1) the power of the HWE test for detecting population admixture is usually small; (2) population admixture seriously elevates type I error rate for detecting genes underlying complex traits, the extent of which depends on the degrees of population differentiation and admixture; (3) HWE testing for population admixture should be performed with random samples or only with controls at the candidate genes, or the test can be performed for combined samples of cases and controls at marker loci that are not linked to the disease; (4) testing HWE for population admixture generally reduces false positive association findings of genes underlying complex traits but the effect is small; and (5) with population admixture, a linkage disequilibrium method that employs cases only is more robust and yields many fewer false positive findings than conventional case-control analyses. Therefore, unless random samples are carefully selected from one homogeneous population, admixture is always a legitimate concern for positive findings in association studies except for the analyses that deliberately control population admixture.


Author(s):  
Qing Cheng ◽  
Tingting Qiu ◽  
Xiaoran Chai ◽  
Baoluo Sun ◽  
Yingcun Xia ◽  
...  

Abstract Motivation Mendelian randomization (MR) is a valuable tool to examine the causal relationships between health risk factors and outcomes from observational studies. Along with the proliferation of genome-wide association studies, a variety of two-sample MR methods for summary data have been developed to account for horizontal pleiotropy (HP), primarily based on the assumption that the effects of variants on exposure (γ) and HP (α) are independent. In practice, this assumption is too strict and can be easily violated because of the correlated HP. Results To account for this correlated HP, we propose a Bayesian approach, MR-Corr2, that uses the orthogonal projection to reparameterize the bivariate normal distribution for γ and α, and a spike-slab prior to mitigate the impact of correlated HP. We have also developed an efficient algorithm with paralleled Gibbs sampling. To demonstrate the advantages of MR-Corr2 over existing methods, we conducted comprehensive simulation studies to compare for both type-I error control and point estimates in various scenarios. By applying MR-Corr2 to study the relationships between exposure–outcome pairs in complex traits, we did not identify the contradictory causal relationship between HDL-c and CAD. Moreover, the results provide a new perspective of the causal network among complex traits. Availability and implementation The developed R package and code to reproduce all the results are available at https://github.com/QingCheng0218/MR.Corr2. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 117 (32) ◽  
pp. 18924-18933
Author(s):  
Daniel J. M. Crouch ◽  
Walter F. Bodmer

The reconciliation between Mendelian inheritance of discrete traits and the genetically based correlation between relatives for quantitative traits was Fisher’s infinitesimal model of a large number of genetic variants, each with very small effects, whose causal effects could not be individually identified. The development of genome-wide genetic association studies (GWAS) raised the hope that it would be possible to identify single polymorphic variants with identifiable functional effects on complex traits. It soon became clear that, with larger and larger GWAS on more and more complex traits, most of the significant associations had such small effects, that identifying their individual functional effects was essentially hopeless. Polygenic risk scores that provide an overall estimate of the genetic propensity to a trait at the individual level have been developed using GWAS data. These provide useful identification of groups of individuals with substantially increased risks, which can lead to recommendations of medical treatments or behavioral modifications to reduce risks. However, each such claim will require extensive investigation to justify its practical application. The challenge now is to use limited genetic association studies to find individually identifiable variants of significant functional effect that can help to understand the molecular basis of complex diseases and traits, and so lead to improved disease prevention and treatment. This can best be achieved by 1) the study of rare variants, often chosen by careful candidate assessment, and 2) the careful choice of phenotypes, often extremes of a quantitative variable, or traits with relatively high heritability.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Lukasz Smigielski ◽  
Sergi Papiol ◽  
Anastasia Theodoridou ◽  
Karsten Heekeren ◽  
Miriam Gerstenberg ◽  
...  

AbstractAs early detection of symptoms in the subclinical to clinical psychosis spectrum may improve health outcomes, knowing the probabilistic susceptibility of developing a disorder could guide mitigation measures and clinical intervention. In this context, polygenic risk scores (PRSs) quantifying the additive effects of multiple common genetic variants hold the potential to predict complex diseases and index severity gradients. PRSs for schizophrenia (SZ) and bipolar disorder (BD) were computed using Bayesian regression and continuous shrinkage priors based on the latest SZ and BD genome-wide association studies (Psychiatric Genomics Consortium, third release). Eight well-phenotyped groups (n = 1580; 56% males) were assessed: control (n = 305), lower (n = 117) and higher (n = 113) schizotypy (both groups of healthy individuals), at-risk for psychosis (n = 120), BD type-I (n = 359), BD type-II (n = 96), schizoaffective disorder (n = 86), and SZ groups (n = 384). PRS differences were investigated for binary traits and the quantitative Positive and Negative Syndrome Scale. Both BD-PRS and SZ-PRS significantly differentiated controls from at-risk and clinical groups (Nagelkerke’s pseudo-R2: 1.3–7.7%), except for BD type-II for SZ-PRS. Out of 28 pairwise comparisons for SZ-PRS and BD-PRS, 9 and 12, respectively, reached the Bonferroni-corrected significance. BD-PRS differed between control and at-risk groups, but not between at-risk and BD type-I groups. There was no difference between controls and schizotypy. SZ-PRSs, but not BD-PRSs, were positively associated with transdiagnostic symptomology. Overall, PRSs support the continuum model across the psychosis spectrum at the genomic level with possible irregularities for schizotypy. The at-risk state demands heightened clinical attention and research addressing symptom course specifiers. Continued efforts are needed to refine the diagnostic and prognostic accuracy of PRSs in mental healthcare.


2019 ◽  
Vol 40 (37) ◽  
pp. 3097-3107 ◽  
Author(s):  
Rafik Tadros ◽  
Hanno L Tan ◽  
Sulayman el Mathari ◽  
Jan A Kors ◽  
Pieter G Postema ◽  
...  

Abstract Aims Sodium-channel blockers (SCBs) are associated with arrhythmia, but variability of cardiac electrical response remains unexplained. We sought to identify predictors of ajmaline-induced PR and QRS changes and Type I Brugada syndrome (BrS) electrocardiogram (ECG). Methods and results In 1368 patients that underwent ajmaline infusion for suspected BrS, we performed measurements of 26 721 ECGs, dose–response mixed modelling and genotyping. We calculated polygenic risk scores (PRS) for PR interval (PRSPR), QRS duration (PRSQRS), and Brugada syndrome (PRSBrS) derived from published genome-wide association studies and used regression analysis to identify predictors of ajmaline dose related PR change (slope) and QRS slope. We derived and validated using bootstrapping a predictive model for ajmaline-induced Type I BrS ECG. Higher PRSPR, baseline PR, and female sex are associated with more pronounced PR slope, while PRSQRS and age are positively associated with QRS slope (P < 0.01 for all). PRSBrS, baseline QRS duration, presence of Type II or III BrS ECG at baseline, and family history of BrS are independently associated with the occurrence of a Type I BrS ECG, with good predictive accuracy (optimism-corrected C-statistic 0.74). Conclusion We show for the first time that genetic factors underlie the variability of cardiac electrical response to SCB. PRSBrS, family history, and a baseline ECG can predict the development of a diagnostic drug-induced Type I BrS ECG with clinically relevant accuracy. These findings could lead to the use of PRS in the diagnosis of BrS and, if confirmed in population studies, to identify patients at risk for toxicity when given SCB.


Sign in / Sign up

Export Citation Format

Share Document