scholarly journals LDpred2: better, faster, stronger

Author(s):  
Florian Privé ◽  
Julyan Arbel ◽  
Bjarni J. Vilhjálmsson

AbstractPolygenic scores have become a central tool in human genetics research. LDpred is a popular method for deriving polygenic scores based on summary statistics and a matrix of correlation between genetic variants. However, LDpred has limitations that may reduce its predictive performance. Here we present LDpred2, a new version of LDpred that addresses these issues. We also provide two new options in LDpred2: a “sparse” option that can learn effects that are exactly 0, and an “auto” option that directly learns the two LDpred parameters from data. We benchmark predictive performance of LDpred2 against the previous version on simulated and real data, demonstrating substantial improvements in robustness and predictive accuracy compared to LDpred1. We then show that LDpred2 also outperforms other polygenic score methods recently developed, with a mean AUC over the 8 real traits analyzed here of 65.1%, compared to 63.8% for lassosum, 62.9% for PRS-CS and 61.5% for SBayesR. Note that, in contrast to what was recommended in the first version of this paper, we now recommend to run LDpred2 genome-wide instead of per chromosome. LDpred2 is implemented in R package bigsnpr.

Author(s):  
Florian Privé ◽  
Julyan Arbel ◽  
Bjarni J Vilhjálmsson

Abstract Motivation Polygenic scores have become a central tool in human genetics research. LDpred is a popular method for deriving polygenic scores based on summary statistics and a matrix of correlation between genetic variants. However, LDpred has limitations that may reduce its predictive performance. Results Here, we present LDpred2, a new version of LDpred that addresses these issues. We also provide two new options in LDpred2: a ‘sparse’ option that can learn effects that are exactly 0, and an ‘auto’ option that directly learns the two LDpred parameters from data. We benchmark predictive performance of LDpred2 against the previous version on simulated and real data, demonstrating substantial improvements in robustness and predictive accuracy compared to LDpred1. We then show that LDpred2 also outperforms other polygenic score methods recently developed, with a mean AUC over the 8 real traits analyzed here of 65.1%, compared to 63.8% for lassosum, 62.9% for PRS-CS and 61.5% for SBayesR. Note that LDpred2 provides more accurate polygenic scores when run genome-wide, instead of per chromosome. Availability and implementation LDpred2 is implemented in R package bigsnpr. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Patrick Turley ◽  
Raymond K. Walters ◽  
Omeed Maghzian ◽  
Aysu Okbay ◽  
James J. Lee ◽  
...  

ABSTRACTWe introduce Multi-Trait Analysis of GWAS (MTAG), a method for joint analysis of summary statistics from GWASs of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (Neff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). Compared to 32, 9, and 13 genome-wide significant loci in the single-trait GWASs (most of which are themselves novel), MTAG increases the number of loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase variance explained by polygenic scores by approximately 25%, matching theoretical expectations.


2015 ◽  
Author(s):  
Hon-Cheong SO ◽  
Pak C. SHAM

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg


2021 ◽  
Author(s):  
Maryn O. Carlson ◽  
Daniel P. Rice ◽  
Jeremy J. Berg ◽  
Matthias Steinrücken

AbstractPolygenic scores link the genotypes of ancient individuals to their phenotypes, which are often unobservable, offering a tantalizing opportunity to reconstruct complex trait evolution. In practice, however, interpretation of ancient polygenic scores is subject to numerous assumptions. For one, the genome-wide association (GWA) studies from which polygenic scores are derived, can only estimate effect sizes for loci segregating in contemporary populations. Therefore, a GWA study may not correctly identify all loci relevant to trait variation in the ancient population. In addition, the frequencies of trait-associated loci may have changed in the intervening years. Here, we devise a theoretical framework to quantify the effect of this allelic turnover on the statistical properties of polygenic scores as functions of population genetic dynamics, trait architecture, power to detect significant loci, and the age of the ancient sample. We model the allele frequencies of loci underlying trait variation using the Wright-Fisher diffusion, and employ the spectral representation of its transition density to find analytical expressions for several error metrics, including the correlation between an ancient individual’s polygenic score and true phenotype, referred to as polygenic score accuracy. Our theory also applies to a two-population scenario and demonstrates that allelic turnover alone may explain a substantial percentage of the reduced accuracy observed in cross-population predictions, akin to those performed in human genetics. Finally, we use simulations to explore the effects of recent directional selection, a bias-inducing process, on the statistics of interest. We find that even in the presence of bias, weak selection induces minimal deviations from our neutral expectations for the decay of polygenic score accuracy. By quantifying the limitations of polygenic scores in an explicit evolutionary context, our work lays the foundation for the development of more sophisticated statistical procedures to analyze both temporally and geographically resolved polygenic scores.


2019 ◽  
Author(s):  
Florian Privé ◽  
Bjarni J. Vilhjálmsson ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractPolygenic prediction has the potential to contribute to precision medicine. Clumping and Thresh-olding (C+T) is a widely used method to derive polygenic scores. When using C+T, it is common to test several p-value thresholds to maximize predictive ability of the derived polygenic scores. Along with this p-value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T polygenic scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123,200 different C+T scores for 300K individuals and 1M variants on a single node with 16 cores.We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p-value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p-value threshold in C+T to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T.We further propose Stacked Clumping and Thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to 8 different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.


2021 ◽  
Author(s):  
Paul O’Reilly ◽  
Shing Choi ◽  
Judit Garcia-Gonzalez ◽  
Yunfeng Ruan ◽  
Hei Man Wu ◽  
...  

Abstract Polygenic risk scores (PRSs) have been among the leading advances in biomedicine in recent years. As a proxy of genetic liability, PRSs are utilised across multiple fields and applications. While numerous statistical and machine learning methods have been developed to optimise their predictive accuracy, all of these distil genetic liability to a single number based on aggregation of an individual’s genome-wide alleles. This results in a key loss of information about an individual’s genetic profile, which could be critical given the functional sub-structure of the genome and the heterogeneity of complex disease. Here we evaluate the performance of pathway-based PRSs, in which polygenic scores are calculated across genomic pathways for each individual, and we introduce a software, PRSet, for computing and analysing pathway PRSs. We find that pathway PRSs have similar power for evaluating pathway enrichment of GWAS signal as the leading methods, with the distinct advantage of providing estimates of pathway genetic liability at the individual-level. Exemplifying their utility, we demonstrate that pathway PRSs can stratify diseases into subtypes in the UK Biobank with substantially greater power than genome-wide PRSs. Compared to genome-wide PRSs, we expect pathway-based PRSs to offer greater insights into the heterogeneity of complex disease and treatment response, generate more biologically tractable therapeutic targets, and provide a more powerful path to precision medicine.


Author(s):  
Lars G. Fritsche ◽  
Snehal Patil ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Maxwell Salvatore ◽  
...  

AbstractTo facilitate scientific collaboration on polygenic risk scores (PRS) research, we created an extensive PRS online repository for 49 common cancer traits integrating freely available genome-wide association studies (GWAS) summary statistics from three sources: published GWAS, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWAS. Our framework condenses these summary statistics into PRS using various approaches such as linkage disequilibrium pruning / p-value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance, calibration, and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRS. We expect this integrated platform to accelerate PRS-related cancer research.


2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Hanna Julienne ◽  
Pierre Lechat ◽  
Vincent Guillemot ◽  
Carla Lasry ◽  
Chunzi Yao ◽  
...  

Abstract Genome-wide association study (GWAS) has been the driving force for identifying association between genetic variants and human phenotypes. Thousands of GWAS summary statistics covering a broad range of human traits and diseases are now publicly available. These GWAS have proven their utility for a range of secondary analyses, including in particular the joint analysis of multiple phenotypes to identify new associated genetic variants. However, although several methods have been proposed, there are very few large-scale applications published so far because of challenges in implementing these methods on real data. Here, we present JASS (Joint Analysis of Summary Statistics), a polyvalent Python package that addresses this need. Our package incorporates recently developed joint tests such as the omnibus approach and various weighted sum of Z-score tests while solving all practical and computational barriers for large-scale multivariate analysis of GWAS summary statistics. This includes data cleaning and harmonization tools, an efficient algorithm for fast derivation of joint statistics, an optimized data management process and a web interface for exploration purposes. Both benchmark analyses and real data applications demonstrated the robustness and strong potential of JASS for the detection of new associated genetic variants. Our package is freely available at https://gitlab.pasteur.fr/statistical-genetics/jass.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Declan Bennett ◽  
Donal O’Shea ◽  
John Ferguson ◽  
Derek Morris ◽  
Cathal Seoighe

AbstractOngoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


2019 ◽  
Author(s):  
Hakhamanesh Mostafavi ◽  
Arbel Harpak ◽  
Dalton Conley ◽  
Jonathan K Pritchard ◽  
Molly Przeworski

AbstractFields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group, the prediction accuracy of polygenic scores depends on characteristics such as the age or sex composition of the individuals in which the GWAS and the prediction were conducted, and on the GWAS study design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.


Sign in / Sign up

Export Citation Format

Share Document