Quantitative Human Paleogenetics: What can Ancient DNA Tell us About Complex Trait Evolution?

Genetic association data from national biobanks and large-scale association studies have provided new prospects for understanding the genetic evolution of complex traits and diseases in humans. In turn, genomes from ancient human archaeological remains are now easier than ever to obtain, and provide a direct window into changes in frequencies of trait-associated alleles in the past. This has generated a new wave of studies aiming to analyse the genetic component of traits in historic and prehistoric times using ancient DNA, and to determine whether any such traits were subject to natural selection. In humans, however, issues about the portability and robustness of complex trait inference across different populations are particularly concerning when predictions are extended to individuals that died thousands of years ago, and for which little, if any, phenotypic validation is possible. In this review, we discuss the advantages of incorporating ancient genomes into studies of trait-associated variants, the need for models that can better accommodate ancient genomes into quantitative genetic frameworks, and the existing limits to inferences about complex trait evolution, particularly with respect to past populations.

Download Full-text

Investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis

10.1101/808295 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yuhua Zhang ◽

Corbin Quick ◽

Ketian Yu ◽

Alvaro Barbeira ◽

Francesca Luca ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Large Scale ◽

Molecular Mechanisms ◽

Association Studies ◽

Complex Trait ◽

Causal Effects ◽

Biological Mechanisms ◽

Integrative Framework ◽

Eqtl Data

AbstractTranscriptome-wide association studies (TWAS), an integrative framework using expression quantitative trait loci (eQTLs) to construct proxies for gene expression, have emerged as a promising method to investigate the biological mechanisms underlying associations between genotypes and complex traits. However, challenges remain in interpreting TWAS results, especially regarding their causality implications. In this paper, we describe a new computational framework, probabilistic TWAS (PTWAS), to detect associations and investigate causal relationships between gene expression and complex traits. We use established concepts and principles from instrumental variables (IV) analysis to delineate and address the unique challenges that arise in TWAS. PTWAS utilizes probabilistic eQTL annotations derived from multi-variant Bayesian fine-mapping analysis conferring higher power to detect TWAS associations than existing methods. Additionally, PTWAS provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type specific causal effects of gene expression on complex traits. These features make PTWAS uniquely suited for in-depth investigations of the biological mechanisms that contribute to complex trait variation. Using eQTL data across 49 tissues from GTEx v8, we apply PTWAS to analyze 114 complex traits using GWAS summary statistics from several large-scale projects, including the UK Biobank. Our analysis reveals an abundance of genes with strong evidence of eQTL-mediated causal effects on complex traits and highlights the heterogeneity and tissue-relevance of these effects across complex traits. We distribute software and eQTL annotations to enable users performing rigorous TWAS analysis by leveraging the full potentials of the latest GTEx multi-tissue eQTL data.

Download Full-text

Dissecting the genetics of complex traits using summary association statistics

10.1101/072934 ◽

2016 ◽

Cited By ~ 10

Author(s):

Bogdan Pasaniuc ◽

Alkes L. Price

Keyword(s):

Complex Traits ◽

Genetic Basis ◽

Association Studies ◽

Genome Wide Association Studies ◽

Privacy Concerns ◽

Individual Level ◽

The Past ◽

Genome Wide ◽

Association Data ◽

Limit Access

AbstractDuring the past decade, genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced vast repositories of genetic variation and trait measurements across millions of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyze summary association statistics. Here we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.

Download Full-text

Improved methods for multi-trait fine mapping of pleiotropic risk loci

10.1101/054684 ◽

2016 ◽

Author(s):

Gleb Kichaev ◽

Megan Roytman ◽

Ruth Johnson ◽

Eleazar Eskin ◽

Sara Lindstroem ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Functional Annotation ◽

Large Scale ◽

Association Studies ◽

Real Data ◽

Causal Variant ◽

Genome Wide Association Studies ◽

Annotation Data ◽

Association Data

AbstractGenome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologicaly causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution relative to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data.

Download Full-text

Analyzing and Reconciling Colocalization and Transcriptome-wide Association Studies from the Perspective of Inferential Reproducibility

10.1101/2021.10.29.466468 ◽

2021 ◽

Author(s):

Abhay Hukku ◽

Matthew G Sampson ◽

Francesca Luca ◽

Roger Pigue-Regi ◽

Xiaoquan Wen

Keyword(s):

Genetic Association ◽

Complex Traits ◽

Association Studies ◽

Complex Trait ◽

Theoretical Derivation ◽

Disease Etiology ◽

Specificity And Sensitivity ◽

Novel Approach ◽

Association Data ◽

Eqtl Data

Transcriptome-wide association studies and colocalization analysis are popular computational approaches for integrating genetic association data from molecular and complex traits. They show the unique ability to go beyond variant-level genetic association evidence and implicate critical functional units, e.g., genes, in disease etiology. However, in practice, when the two approaches are applied to the same molecular and complex trait data, the inference results can be markedly different. This paper systematically investigates the inferential reproducibility between the two approaches through theoretical derivation, numerical experiments, and analyses of 4 complex trait GWAS and GTEx eQTL data. We identify two classes of inconsistent inference results. We find that the first class of inconsistent results may suggest an interesting biological phenomenon, i.e., horizontal pleiotropy; thus, the two approaches are truly complementary. The inconsistency in the second class can be understood and effectively reconciled. To this end, we propose a novel approach for locus-level colocalization analysis. We demonstrate that the joint TWAS and locus-level colocalization analysis improves specificity and sensitivity for implicating biological-relevant genes.

Download Full-text

Integrative approaches for large-scale transcriptome-wide association studies

10.1101/024083 ◽

2015 ◽

Cited By ~ 6

Author(s):

Alexander Gusev ◽

Arthur Ko ◽

Huwenbo Shi ◽

Gaurav Bhatia ◽

Wonil Chung ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Large Scale ◽

Association Studies ◽

The Novel ◽

Expression Trait ◽

Novel Genes ◽

Genome Wide ◽

Hybrid Mouse Diversity Panel ◽

Association Data

Many genetic variants influence complex traits by modulating gene expression, thus altering the abundance levels of one or multiple proteins. In this work we introduce a powerful strategy that integrates gene expression measurements with large-scale genome-wide association data to identify genes whose cis-regulated expression is associated to complex traits. We use a relatively small reference panel of individuals for which both genetic variation and gene expression have been measured to impute gene expression into large cohorts of individuals and identify expression-trait associations. We extend our methods to allow for indirect imputation of the expression-trait association from summary association statistics of large-scale GWAS1-3. We applied our approaches to expression data from blood and adipose tissue measured in ~3,000 individuals overall. We then imputed gene expression into GWAS data from over 900,000 phenotype measurements4-6 to identify 69 novel genes significantly associated to obesity-related traits (BMI, lipids, and height). Many of the novel genes were associated with relevant phenotypes in the Hybrid Mouse Diversity Panel. Overall our results showcase the power of integrating genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits.

Download Full-text

Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2016.08.012 ◽

2016 ◽

Vol 99 (4) ◽

pp. 791-801 ◽

Cited By ~ 48

Author(s):

Paul L. Auer ◽

Alex P. Reiner ◽

Gao Wang ◽

Hyun Min Kang ◽

Goncalo R. Abecasis ◽

...

Keyword(s):

Exome Sequencing ◽

Large Scale ◽

Association Studies ◽

Complex Trait ◽

Lessons Learned ◽

Sequencing Project ◽

Trait Association ◽

Exome Sequencing Project ◽

Scale Sequence

Download Full-text

Within and across populations complex traits and diseases prediction using summary statistics from large-scale genomewide association studies

10.14264/11574da ◽

2021 ◽

Author(s):

◽

Ying Wang

Keyword(s):

Complex Traits ◽

Large Scale ◽

Association Studies ◽

Summary Statistics ◽

Genomewide Association

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples

Genetics ◽

10.1534/genetics.118.301865 ◽

2019 ◽

Vol 212 (3) ◽

pp. 919-929

Author(s):

Daniel A. Skelly ◽

Narayanan Raghupathy ◽

Raymond F. Robledo ◽

Joel H. Graber ◽

Elissa J. Chesler

Keyword(s):

Gene Expression ◽

Canonical Correlation ◽

Complex Traits ◽

Behavioral Genetics ◽

Association Studies ◽

Complex Trait ◽

Integrated Analysis ◽

Data Set ◽

Trait Analysis ◽

Molecular Features

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.

Download Full-text

Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations

10.1101/503144 ◽

2018 ◽

Cited By ~ 3

Author(s):

Yang Luo ◽

Xinyi Li ◽

Xin Wang ◽

Steven Gazal ◽

Josep Maria Mercader ◽

...

Keyword(s):

African American ◽

Complex Traits ◽

Association Studies ◽

Genetic Data ◽

Age At Menarche ◽

Specific Gene ◽

Genome Wide Association Studies ◽

Diverse Populations ◽

Tissue Specific ◽

Different Populations

AbstractThe increasing size and diversity of genome-wide association studies provide an exciting opportunity to study how the genetics of complex traits vary among diverse populations. Here, we introduce covariate-adjusted LD score regression (cov-LDSC), a method to accurately estimate genetic heritability and its enrichment in both homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the GWAS samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates by 10% − 60% in admixed populations; in contrast, cov-LDSC is robust to all simulation parameters. We apply cov-LDSC to genotyping data from approximately 170,000 Latino, 47,000 African American and 135,000 European individuals. We estimate and detect heritability enrichment in three quantitative and five dichotomous phenotypes respectively, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals. Our results show that most traits have high concordance of and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of . We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size τ* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in Latino, African American and European populations respectively. Our results demonstrate that our approach is a powerful way to analyze genetic data for complex traits from underrepresented populations.Author summaryAdmixed populations such as African Americans and Hispanic Americans bear a disproportionately high burden of disease but remain underrepresented in current genetic studies. It is important to extend current methodological advancements for understanding the genetic basis of complex traits in homogeneous populations to individuals with admixed genetic backgrounds. Here, we develop a computationally efficient method to answer two specific questions. First, does genetic variation contribute to the same amount of phenotypic variation (heritability) across diverse populations? Second, are the genetic mechanisms shared among different populations? To answer these questions, we use our novel method to conduct the first comprehensive heritability-based analysis of a large number of admixed individuals. We show that there is a high degree of concordance in total heritability and tissue-specific enrichment between different ancestral groups. However, traits such as age at menarche show a noticeable differences among populations. Our work provides a powerful way to analyze genetic data in admixed populations and may contribute to the applicability of genomic medicine to admixed population groups.

Download Full-text