Polygenic Adaptation has Impacted Multiple Anthropometric Traits

AbstractOur understanding of the genetic basis of human adaptation is biased toward loci of large pheno-typic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in polygenic phenotypes. We test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. We identify signals of polygenic adaptation for anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR). Analysis of ancient DNA samples indicates that a north-south cline of height within Europe and and a west-east cline across Eurasia can be traced to selection for increased height in two late Pleistocene hunter gatherer populations living in western and west-central Eurasia. Our observation that IHC and WHR follow a latitudinal cline in Western Eurasia support the role of natural selection driving Bergmann’s Rule in humans, consistent with thermoregulatory adaptation in response to latitudinal temperature variation.Author’s Note on Failure to ReplicateAfter this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.

Download Full-text

How robust are cross-population signatures of polygenic adaptation in humans?

10.1101/2020.07.13.200030 ◽

2020 ◽

Author(s):

Alba Refoyo-Martínez ◽

Siyang Liu ◽

Anja Moltke Jørgensen ◽

Xin Jin ◽

Anders Albrechtsen ◽

...

Keyword(s):

Effect Size ◽

Association Studies ◽

Gwas Data ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Polygenic Adaptation ◽

The Uk ◽

Meta Analyses ◽

Size Estimates

AbstractOver the past decade, summary statistics from genome-wide association studies (GWAS) have been used to detect and quantify polygenic adaptation in humans. Several studies have reported signatures of natural selection at sets of SNPs associated with complex traits, like height and body mass index. However, more recent studies suggest that some of these signals may be caused by biases from uncorrected population stratification in the GWAS data with which these tests are performed. Moreover, past studies have predominantly relied on SNP effect size estimates obtained from GWAS panels of European ancestries, which are known to be poor predictors of phenotypes in non-European populations. Here, we collated GWAS data from multiple anthropometric and metabolic traits that have been measured in more than one cohort around the world, including the UK Biobank, FINRISK, Chinese NIPT, Biobank Japan, APCDR and PAGE. We then evaluated how robust signals of polygenic adaptation are to the choice of GWAS cohort used to identify associated variants and their effect size estimates, while using the same panel to obtain population allele frequencies (The 1000 Genomes Project). We observe many discrepancies across tests performed on the same phenotype and find that association studies performed using multiple different cohorts, like meta-analyses, tend to produce scores with strong overdispersion across populations. This results in apparent signatures of polygenic adaptation which are not observed when using effect size estimates from biobank-based GWAS of homogeneous ancestries. Indeed, we were able to artificially create score overdispersion when taking the UK Biobank cohort and simulating a meta-analysis on multiple subsets of the cohort. This suggests that extreme caution should be taken in the execution and interpretation of future tests of polygenic adaptation based on population differentiation, especially when using summary statistics from GWAS meta-analyses.

Download Full-text

Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies

10.1101/355057 ◽

2018 ◽

Cited By ~ 19

Author(s):

Mashaal Sohail ◽

Robert M. Maier ◽

Andrea Ganna ◽

Alex Bloemendal ◽

Alicia R. Martin ◽

...

Keyword(s):

Population Structure ◽

Association Studies ◽

Meta Analysis ◽

Human Populations ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Large Numbers ◽

Genome Wide ◽

Polygenic Adaptation ◽

The Uk

AbstractGenetic predictions of height differ among human populations and these differences are too large to be explained by genetic drift. This observation has been interpreted as evidence of polygenic adaptation. Differences across populations were detected using SNPs genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of subsignificant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection and claims of polygenic adaptation for multiple traits. Polygenic adaptation studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure.

Download Full-text

Variable prediction accuracy of polygenic scores within an ancestry group

10.1101/629949 ◽

2019 ◽

Cited By ~ 14

Author(s):

Hakhamanesh Mostafavi ◽

Arbel Harpak ◽

Dalton Conley ◽

Jonathan K Pritchard ◽

Molly Przeworski

Keyword(s):

Prediction Accuracy ◽

Human Genetics ◽

Association Studies ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Gwas Study ◽

Ancestry Group ◽

Genome Wide ◽

Polygenic Scores ◽

The Uk

AbstractFields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group, the prediction accuracy of polygenic scores depends on characteristics such as the age or sex composition of the individuals in which the GWAS and the prediction were conducted, and on the GWAS study design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.

Download Full-text

Variable prediction accuracy of polygenic scores within an ancestry group

eLife ◽

10.7554/elife.48376 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 33

Author(s):

Hakhamanesh Mostafavi ◽

Arbel Harpak ◽

Ipsita Agarwal ◽

Dalton Conley ◽

Jonathan K Pritchard ◽

...

Keyword(s):

Prediction Accuracy ◽

Human Genetics ◽

Association Studies ◽

Economic Status ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Ancestry Group ◽

Genome Wide ◽

Polygenic Scores ◽

The Uk

Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.

Download Full-text

Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations

10.1101/2020.01.14.905927 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ying Wang ◽

Jing Guo ◽

Guiyan Ni ◽

Jian Yang ◽

Peter M. Visscher ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

African Ancestry ◽

Real Data ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Scores ◽

Causal Variants ◽

The Uk

AbstractPolygenic scores (PGS) have been widely used to predict complex traits and risk of diseases using variants identified from genome-wide association studies (GWASs). To date, most GWASs have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European populations. Here, we develop a new theory to predict the relative accuracy (RA, relative to the accuracy in populations of the same ancestry as the discovery population) of PGS across ancestries. We used simulations and real data from the UK Biobank to evaluate our results. We found across various simulation scenarios that the RA of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability. Altogether, we find that LD and MAF differences between ancestries explain alone up to ~70% of the loss of RA using European-based PGS in African ancestry for traits like body mass index and height. Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWASs are mostly shared across continents.

Download Full-text

Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies

eLife ◽

10.7554/elife.39702 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 90

Author(s):

Mashaal Sohail ◽

Robert M Maier ◽

Andrea Ganna ◽

Alex Bloemendal ◽

Alicia R Martin ◽

...

Keyword(s):

Population Stratification ◽

Association Studies ◽

Editorial Note ◽

Human Populations ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Large Numbers ◽

Genome Wide ◽

Polygenic Scores ◽

Polygenic Adaptation

Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (<xref ref-type="decision-letter" rid="SA1">see decision letter</xref>).

Download Full-text

Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits

Bioinformatics ◽

10.1093/bioinformatics/btw745 ◽

2017 ◽

pp. btw745 ◽

Cited By ~ 8

Author(s):

Hon-Cheong So ◽

Pak C. Sham

Keyword(s):

Complex Traits ◽

Predictive Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Scores

Download Full-text

Assessing the contribution of rare-to-common protein-coding variants to circulating metabolic biomarker levels via 412,394 UK Biobank exome sequences

10.1101/2021.12.24.21268381 ◽

2021 ◽

Author(s):

Abhishek Nag ◽

Lawrence Middleton ◽

Ryan S Dhindsa ◽

Dimitrios Vitsios ◽

Eleanor M Wigmore ◽

...

Keyword(s):

Gene Networks ◽

Rare Variants ◽

Association Studies ◽

Low Frequency ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Protein Coding ◽

The Uk ◽

Metabolic Biomarkers ◽

Coding Variants

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.

Download Full-text

GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background

10.1101/2020.04.20.051631 ◽

2020 ◽

Cited By ~ 6

Author(s):

Nasa Sinnott-Armstrong ◽

Sahin Naqvi ◽

Manuel Rivas ◽

Jonathan K Pritchard

Keyword(s):

Complex Traits ◽

Genetic Basis ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Biological Processes ◽

Uk Biobank ◽

The Core ◽

Genome Wide ◽

Core Genes

SummaryGenome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. However, for most traits it remains difficult to interpret what genes and biological processes are impacted by the top hits. Here, as a contrast, we describe UK Biobank GWAS results for three molecular traits—urate, IGF-1, and testosterone—that are biologically simpler than most diseases, and for which we know a great deal in advance about the core genes and pathways. Unlike most GWAS of complex traits, for all three traits we find that most top hits are readily interpretable. We observe huge enrichment of significant signals near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of variation in each trait, including insights into differences in testosterone regulation between females and males. Meanwhile, in other respects the results are reminiscent of GWAS for more-complex traits. In particular, even these molecular traits are highly polygenic, with most of the variance coming not from core genes, but from thousands to tens of thousands of variants spread across most of the genome. Given that diseases are often impacted by many distinct biological processes, including these three, our results help to illustrate why so many variants can affect risk for any given disease.

Download Full-text

Reproducibility in the UK Biobank of Genome-Wide Significant Signals Discovered in Earlier Genome-wide Association Studies

10.1101/2020.06.24.20139576 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack W. O’Sullivan ◽

John P. A. Ioannidis

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Single Nucleotide ◽

Genome Wide ◽

The Uk ◽

Open Question

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.

Download Full-text