Improving Polygenic Prediction in Ancestrally Diverse Populations

Abstract Polygenic risk scores (PRS) have attenuated cross-population predictive performance. As existing genome-wide association studies (GWAS) were predominantly conducted in individuals of European descent, the limited transferability of PRS reduces its clinical value in non-European populations and may exacerbate healthcare disparities. Recent efforts to level ancestry imbalance in genomic research have expanded the scale of non-European GWAS, although they remain under-powered. Here we present a novel PRS construction method, PRS-CSx, which improves cross-population polygenic prediction by integrating GWAS summary statistics from multiple populations. PRS-CSx couples genetic effects across populations via a shared continuous shrinkage prior, enabling more accurate effect size estimation by sharing information between summary statistics and leveraging linkage disequilibrium (LD) diversity across discovery samples, while inheriting computational efficiency and robustness from PRS-CS. We show that PRS-CSx outperforms alternative methods across traits with a wide range of genetic architectures and cross-population genetic correlations in simulations, and substantially improves the prediction of quantitative traits and schizophrenia risk in non-European populations.

Download Full-text

Improving Polygenic Prediction in Ancestrally Diverse Populations

10.1101/2020.12.27.20248738 ◽

2021 ◽

Author(s):

Yunfeng Ruan ◽

Yen-Chen Anne Feng ◽

Chia-Yen Chen ◽

Max Lam ◽

Akira Sawa ◽

...

Keyword(s):

Association Studies ◽

Genetic Correlations ◽

Genomic Research ◽

Alternative Methods ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Size Estimation ◽

Wide Range ◽

European Populations

ABSTRACTPolygenic risk scores (PRS) have attenuated cross-population predictive performance. As existing genome-wide association studies (GWAS) were predominantly conducted in individuals of European descent, the limited transferability of PRS reduces its clinical value in non-European populations and may exacerbate healthcare disparities. Recent efforts to level ancestry imbalance in genomic research have expanded the scale of non-European GWAS, although they remain under-powered. Here we present a novel PRS construction method, PRS-CSx, which improves cross-population polygenic prediction by integrating GWAS summary statistics from multiple populations. PRS-CSx couples genetic effects across populations via a shared continuous shrinkage prior, enabling more accurate effect size estimation by sharing information between summary statistics and leveraging linkage disequilibrium (LD) diversity across discovery samples, while inheriting computational efficiency and robustness from PRS-CS. We show that PRS-CSx outperforms alternative methods across traits with a wide range of genetic architectures and cross-population genetic correlations in simulations, and substantially improves the prediction of quantitative traits and schizophrenia risk in non-European populations.

Download Full-text

Fine-tuning Polygenic Risk Scores with GWAS Summary Statistics

10.1101/810713 ◽

2019 ◽

Cited By ~ 4

Author(s):

Zijie Zhao ◽

Yanyao Yi ◽

Yuchang Wu ◽

Xiaoyuan Zhong ◽

Yupei Lin ◽

...

Keyword(s):

Association Studies ◽

Fine Tuning ◽

Risk Scores ◽

Training Dataset ◽

Validation Dataset ◽

P Value ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Polygenic Risk ◽

Model Tuning

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research. Notably, most PRS models include tuning parameters which improve predictive performance when properly selected. However, existing model-tuning methods require individual-level genetic data as the training dataset or as a validation dataset independent from both training and testing samples. These data rarely exist in practice, creating a significant gap between PRS methodology and applications. Here, we introduce PUMAS (Parameter-tuning Using Marginal Association Statistics), a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform a variety of model-tuning procedures (e.g. cross-validation) using GWAS summary statistics and can effectively benchmark and optimize PRS models under diverse genetic architecture. On average, PUMAS improves the predictive R2 by 205.6% and 62.5% compared to PRSs with arbitrary p-value cutoffs of 0.01 and 1, respectively. Applied to 211 neuroimaging traits and Alzheimer’s disease, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis. We believe our method resolves a fundamental problem without a current solution and will greatly benefit genetic prediction applications.

Download Full-text

Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007565 ◽

2020 ◽

Vol 16 (2) ◽

pp. e1007565 ◽

Cited By ~ 1

Author(s):

Shuang Song ◽

Wei Jiang ◽

Lin Hou ◽

Hongyu Zhao

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Size Distributions ◽

Summary Statistics ◽

Polygenic Risk ◽

Genome Wide

Download Full-text

Genome-Wide Association Studies of Schizophrenia and Bipolar Disorder in a Diverse Cohort of US Veterans

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa133 ◽

2020 ◽

Author(s):

Tim B Bigdeli ◽

Ayman H Fanous ◽

Yuli Li ◽

Nallakkandi Rajeevan ◽

Frederick Sayward ◽

...

Keyword(s):

Bipolar Disorder ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Susceptibility Loci ◽

New Associations ◽

Genome Wide ◽

Us Veterans

Abstract Background Schizophrenia (SCZ) and bipolar disorder (BIP) are debilitating neuropsychiatric disorders, collectively affecting 2% of the world’s population. Recognizing the major impact of these psychiatric disorders on the psychosocial function of more than 200 000 US Veterans, the Department of Veterans Affairs (VA) recently completed genotyping of more than 8000 veterans with SCZ and BIP in the Cooperative Studies Program (CSP) #572. Methods We performed genome-wide association studies (GWAS) in CSP #572 and benchmarked the predictive value of polygenic risk scores (PRS) constructed from published findings. We combined our results with available summary statistics from several recent GWAS, realizing the largest and most diverse studies of these disorders to date. Results Our primary GWAS uncovered new associations between CHD7 variants and SCZ, and novel BIP associations with variants in Sortilin Related VPS10 Domain Containing Receptor 3 (SORCS3) and downstream of PCDH11X. Combining our results with published summary statistics for SCZ yielded 39 novel susceptibility loci including CRHR1, and we identified 10 additional findings for BIP (28 326 cases and 90 570 controls). PRS trained on published GWAS were significantly associated with case-control status among European American (P < 10–30) and African American (P < .0005) participants in CSP #572. Conclusions We have demonstrated that published findings for SCZ and BIP are robustly generalizable to a diverse cohort of US veterans. Leveraging available summary statistics from GWAS of global populations, we report 52 new susceptibility loci and improved fine-mapping resolution for dozens of previously reported associations.

Download Full-text

MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa028 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 2

Author(s):

Qing Cheng ◽

Yi Yang ◽

Xingjie Shi ◽

Kar-Fu Yeung ◽

Can Yang ◽

...

Keyword(s):

Risk Factors ◽

Linkage Disequilibrium ◽

Genetic Variants ◽

Mendelian Randomization ◽

Association Studies ◽

Alternative Methods ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Causal Relationships ◽

Disease Outcomes

Abstract The proliferation of genome-wide association studies (GWAS) has prompted the use of two-sample Mendelian randomization (MR) with genetic variants as instrumental variables (IVs) for drawing reliable causal relationships between health risk factors and disease outcomes. However, the unique features of GWAS demand that MR methods account for both linkage disequilibrium (LD) and ubiquitously existing horizontal pleiotropy among complex traits, which is the phenomenon wherein a variant affects the outcome through mechanisms other than exclusively through the exposure. Therefore, statistical methods that fail to consider LD and horizontal pleiotropy can lead to biased estimates and false-positive causal relationships. To overcome these limitations, we proposed a probabilistic model for MR analysis in identifying the causal effects between risk factors and disease outcomes using GWAS summary statistics in the presence of LD and to properly account for horizontal pleiotropy among genetic variants (MR-LDP) and develop a computationally efficient algorithm to make the causal inference. We then conducted comprehensive simulation studies to demonstrate the advantages of MR-LDP over the existing methods. Moreover, we used two real exposure–outcome pairs to validate the results from MR-LDP compared with alternative methods, showing that our method is more efficient in using all-instrumental variants in LD. By further applying MR-LDP to lipid traits and body mass index (BMI) as risk factors for complex diseases, we identified multiple pairs of significant causal relationships, including a protective effect of high-density lipoprotein cholesterol on peripheral vascular disease and a positive causal effect of BMI on hemorrhoids.

Download Full-text

Perspective: The Clinical Use of Polygenic Risk Scores: Race, Ethnicity, and Health Disparities

Ethnicity & Disease ◽

10.18865/ed.29.3.513 ◽

2019 ◽

Vol 29 (3) ◽

pp. 513-516 ◽

Cited By ~ 2

Author(s):

Megan C. Roberts ◽

Muin J. Khoury ◽

George A. Mensah

Keyword(s):

Precision Medicine ◽

Association Studies ◽

Clinical Care ◽

Genomic Research ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Polygenic Risk ◽

Adverse Health Outcomes ◽

Genome Wide ◽

Disease Risks

Polygenic risk scores (PRS) are an emerging precision medicine tool based on multiple gene variants that, taken alone, have weak associations with disease risks, but collectively may enhance disease predictive value in the population. However, the benefit of PRS may not be equal among non-European populations, as they are under-represented in genome-wide association studies (GWAS) that serve as the basis for PRS development. In this perspective, we discuss a path forward, which includes: 1) inclusion of underrepresented populations in PRS research; 2) global efforts to build capacity for genomic research; 3) equitable implementation of these tools in clinical practice; and 4) traditional public health approaches to reduce risk of adverse health outcomes as an important component to precision health. As precision medicine is implemented in clinical care, researchers must ensure that advances from PRS research will benefit all.Ethn Dis.2019;29(3):513-516; doi:10.18865/ed.29.3.513.

Download Full-text

Human demographic history impacts genetic risk prediction across diverse populations

10.1101/070797 ◽

2016 ◽

Cited By ~ 7

Author(s):

Alicia R. Martin ◽

Christopher R. Gignoux ◽

Raymond K. Walters ◽

Genevieve L. Wojcik ◽

Benjamin M. Neale ◽

...

Keyword(s):

Risk Prediction ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Demographic History ◽

Population History ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Medical Genomics

AbstractThe vast majority of genome-wide association studies are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g. linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely-used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWAS, we used published summary statistics to calculate polygenic risk scores for six well-studied traits and diseases. We identified directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk were typically highest in the population from which summary statistics were derived. We demonstrated that scores inferred from European GWAS were biased by genetic drift in other populations even when choosing the same causal variants, and that biases in any direction were possible and unpredictable. This work cautions that summarizing findings from large-scale GWAS may have limited portability to other populations using standard approaches, and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.

Download Full-text

Sleep Deficits and Cannabis Use Behaviors: An Analysis of Shared Genetics Using Linkage Disequilibrium Score Regression and Polygenic Risk Prediction

10.1101/2020.05.02.053983 ◽

2020 ◽

Author(s):

Evan A. Winiger ◽

Jarrod M. Ellingson ◽

Claire L. Morrison ◽

Robin P. Corley ◽

Joëlle A. Pasman ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Sleep Duration ◽

Association Studies ◽

Genetic Correlations ◽

Cannabis Use ◽

European Ancestry ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Short Sleep Duration ◽

Polygenic Risk

AbstractStudy ObjectivesEstimate the genetic relationship of cannabis use with sleep deficits and eveningness chronotype.MethodsWe used linkage disequilibrium score regression (LDSC) to analyze genetic correlations between sleep deficits and cannabis use behaviors. Secondly, we generated sleep deficit polygenic risk scores (PRSs) and estimated their ability to predict cannabis use behaviors using logistic regression. Summary statistics came from existing genome wide association studies (GWASs) of European ancestry that were focused on sleep duration, insomnia, chronotype, lifetime cannabis use, and cannabis use disorder (CUD). A target sample for PRS prediction consisted of high-risk participants and participants from twin/family community-based studies (n = 796, male = 66%; mean age = 26.81). Target data consisted of self-reported sleep (sleep duration, feeling tired, and taking naps) and cannabis use behaviors (lifetime use, number of lifetime uses, past 180-day use, age of first use, and lifetime CUD symptoms).ResultsSignificant genetic correlation between lifetime cannabis use and eveningness chronotype (rG = 0.24, p < 0.01), as well as between CUD and both short sleep duration (<7 h) (rG = 0.23, p = 0.02) and insomnia (rG = 0.20, p = 0.02). Insomnia PRS predicted earlier age of first cannabis use (β = −0.09, p = 0.02) and increased lifetime CUD symptom count use (β = 0.07, p = 0.03).ConclusionCannabis use is genetically associated with both sleep deficits and an eveningness chronotype, suggesting that there are genes that predispose individuals to both cannabis use and sleep deficits.

Download Full-text

Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics

10.1101/2020.10.12.336867 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yiliang Zhang ◽

Youshu Cheng ◽

Wei Jiang ◽

Yixuan Ye ◽

Qiongshi Lu ◽

...

Keyword(s):

Genetic Correlation ◽

Complex Traits ◽

Association Studies ◽

Genetic Correlations ◽

Real Data ◽

Estimation Methods ◽

Easy Access ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Correlation Estimation

AbstractGenetic correlation is the correlation of additive genetic effects on two phenotypes. It is an informative metric to quantify the overall genetic similarity between complex traits, which provides insights into their polygenic genetic architecture. Several methods have been proposed to estimate genetic correlations based on data collected from genome-wide association studies (GWAS). Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. Here, we present a benchmark study for different summary-statistics-based genetic correlation estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic correlation: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To assess the performance of different methods in the presence of these two challenges, we first conducted comprehensive simulations with diverse LD patterns and sample overlaps. Then we applied these methods to real GWAS summary statistics for a wide spectrum of complex traits. Based on these experiments, we conclude that methods relying on accurate LD estimation are less robust in real data applications compared to other methods due to the imprecision of LD obtained from reference panels. Our findings offer a guidance on how to appropriately choose the method for genetic correlation estimation in post-GWAS analysis in interpretation.

Download Full-text

Estimating Heritability and Genetic Correlation in Case Control Studies Directly and with Summary Statistics

10.1101/256388 ◽

2018 ◽

Author(s):

Omer Weissbrod ◽

Jonathan Flint ◽

Saharon Rosset

Keyword(s):

Genetic Correlation ◽

Association Studies ◽

Genetic Correlations ◽

Large Data ◽

Case Control ◽

Data Sets ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Case Control Studies ◽

Individual Level

AbstractMethods that estimate heritability and genetic correlations from genome-wide association studies have proven to be powerful tools for investigating the genetic architecture of common diseases and exposing unexpected relationships between disorders. Many relevant studies employ a case-control design, yet most methods are primarily geared towards analyzing quantitative traits. Here we investigate the validity of three common methods for estimating genetic heritability and genetic correlation. We find that the Phenotype-Correlation-Genotype-Correlation (PCGC) approach is the only method that can estimate both quantities accurately in the presence of important non-genetic risk factors, such as age and sex. We extend PCGC to work with summary statistics that take the case-control sampling into account, and demonstrate that our new method, PCGC-s, accurately estimates both heritability and genetic correlations and can be applied to large data sets without requiring individual-level genotypic or phenotypic information. Finally, we use PCGC-S to estimate the genetic correlation between schizophrenia and bipolar disorder, and demonstrate that previous estimates are biased due to incorrect handling of sex as a strong risk factor. PCGC-s is available at https://github.com/omerwe/PCGCs.

Download Full-text