scholarly journals Using gene genealogies to localize rare variants associated with complex traits in diploid populations

2017 ◽  
Author(s):  
Charith B. Karunarathna ◽  
Jinko Graham

AbstractBackground and AimsMany methods can detect trait association with causal variants in candidate genomic regions; however, a comparison of their ability to localize causal variants is lacking. We extend a previous study of the detection abilities of these methods to a comparison of their localization abilities.MethodsThrough coalescent simulation, we compare several popular association methods. Cases and controls are sampled from a diploid population to mimic human studies. As benchmarks for comparison, we include two methods that cluster phenotypes on the true genealogical trees, a naive Mantel test considered previously in haploid populations and an extension that takes into account whether case haplotypes carry a causal variant. We first work through a simulated dataset to illustrate the methods. We then perform a simulation study to score the localization and detection properties.ResultsIn our simulations, the association signal was localized least precisely by the naive Mantel test and most precisely by its extension. Most other approaches had intermediate performance similar to the single-variant Fisher’s-exact test.ConclusionsOur results confirm earlier findings in haploid populations about potential gains in performance from genealogy-based approaches. They also highlight differences between haploid and diploid populations when localizing and detecting causal variants.


2016 ◽  
Author(s):  
Andrew Anand Brown ◽  
Ana Viñuela ◽  
Olivier Delaneau ◽  
Tim Spector ◽  
Kerrin Small ◽  
...  

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show examples where the gene implicated by expression has been functionally validated as relevant for complex traits.



2021 ◽  
Author(s):  
Megan Null ◽  
Josée Dupuis ◽  
Christopher R. Gignoux ◽  
Audrey E. Hendricks

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.



2021 ◽  
Author(s):  
Roshni A. Patel ◽  
Shaila A. Musharoff ◽  
Jeffrey P. Spence ◽  
Harold Pimentel ◽  
Catherine Tcheandjieu ◽  
...  

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.



2017 ◽  
Author(s):  
Luke M. Evans ◽  
Rasool Tahmasbi ◽  
Scott I. Vrieze ◽  
Gonçalo R. Abecasis ◽  
Sayantan Das ◽  
...  

ABSTRACTHeritability, h2, is a foundational concept in genetics, critical to understanding the genetic basis of complex traits. Recently-developed methods that estimate heritability from genotyped SNPs, h2SNP, explain substantially more genetic variance than genome-wide significant loci, but less than classical estimates from twins and families. However, h2SNP estimates have yet to be comprehensively compared under a range of genetic architectures, making it difficult to draw conclusions from sometimes conflicting published estimates. Here, we used thousands of real whole genome sequences to simulate realistic phenotypes under a variety of genetic architectures, including those from very rare causal variants. We compared the performance of ten methods across different types of genotypic data (commercial SNP array positions, whole genome sequence variants, and imputed variants) and under differing causal variant frequencies, levels of stratification, and relatedness thresholds. These results provide guidance in interpreting past results and choosing optimal approaches for future studies. We then chose two methods (GREML-MS and GREML-LDMS) that best estimated overall h2SNP and the causal variant frequency spectra to six phenotypes in the UK Biobank using imputed genome-wide variants. Our results suggest that as imputation reference panels become larger and more diverse, estimates of the frequency distribution of causal variants will become increasingly unbiased and the vast majority of trait narrow-sense heritability will be accounted for.



2018 ◽  
Vol 83 (1) ◽  
pp. 30-39 ◽  
Author(s):  
Charith B. Karunarathna ◽  
Jinko Graham


2019 ◽  
Author(s):  
Anna Hutchinson ◽  
Hope Watson ◽  
Chris Wallace

AbstractGenome Wide Association Studies (GWAS) have successfully identified thousands of loci associated with human diseases. Bayesian genetic fine-mapping studies aim to identify the specific causal variants within GWAS loci responsible for each association, reporting credible sets of plausible causal variants, which are interpreted as containing the causal variant with some “coverage probability”.Here, we use simulations to demonstrate that the coverage probabilities are over-conservative in most fine-mapping situations. We show that this is because fine-mapping data sets are not randomly selected from amongst all causal variants, but from amongst causal variants with larger effect sizes. We present a method to re-estimate the coverage of credible sets using rapid simulations based on the observed, or estimated, SNP correlation structure, we call this the “corrected coverage estimate”. This is extended to find “corrected credible sets”, which are the smallest set of variants such that their corrected coverage estimate meets the target coverage.We use our method to improve the resolution of a fine-mapping study of type 1 diabetes. We found that in 27 out of 39 associated genomic regions our method could reduce the number of potentially causal variants to consider for follow-up, and found that none of the 95% or 99% credible sets required the inclusion of more variants – a pattern matched in simulations of well powered GWAS.Crucially, our correction method requires only GWAS summary statistics and remains accurate when SNP correlations are estimated from a large reference panel. Using our method to improve the resolution of fine-mapping studies will enable more efficient expenditure of resources in the follow-up process of annotating the variants in the credible set to determine the implicated genes and pathways in human diseases.Author summaryPinpointing specific genetic variants within the genome that are causal for human diseases is difficult due to complex correlation patterns existing between variants. Consequently, researchers typically prioritise a set of plausible causal variants for functional validation - these sets of putative causal variants are called “credible sets”. We find that the probabilistic interpretation that these credible sets do indeed contain the true causal variant is variable, in that the reported probabilities often underestimate the true coverage of the causal variant in the credible set. We have developed a method to provide researchers with a “corrected coverage estimate” that the true causal variant appears in the credible set, and this has been extended to find “corrected credible sets”, allowing for more efficient allocation of resources in the expensive follow-up laboratory experiments. We used our method to reduce the number of genetic variants to consider as causal candidates for follow-up in 27 genomic regions that are associated with type 1 diabetes.



2021 ◽  
Vol 17 (10) ◽  
pp. e1009483
Author(s):  
Ruth Johnson ◽  
Kathryn S. Burch ◽  
Kangcheng Hou ◽  
Mario Paciuc ◽  
Bogdan Pasaniuc ◽  
...  

The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.



2016 ◽  
Author(s):  
Huwenbo Shi ◽  
Nicholas Mancuso ◽  
Sarah Spendlove ◽  
Bogdan Pasaniuc

AbstractAlthough genetic correlations between complex traits provide valuable insights into epidemiological and etiological studies, a precise quantification of which genomic regions contribute to the genome-wide genetic correlation is currently lacking. Here, we introduce ρ-HESS, a technique to quantify the correlation between pairs of traits due to genetic variation at a small region in the genome. Our approach only requires GWAS summary data and makes no distributional assumption on the causal variant effects sizes while accounting for linkage disequilibrium (LD) and overlapping GWAS samples. We analyzed large-scale GWAS summary data across 35 complex traits, and identified 27 genomic regions that contribute significantly to the genetic correlation among these traits. Notably, we find 7 genomic regions that contribute to the genetic correlation of 12 pairs of traits that show negligible genome-wide correlation, further showcasing the power of local genetic correlation analyses. Finally, we leverage the distribution of local genetic correlations across the genome to assign putative direction of causality for 15 pairs of traits.



2021 ◽  
Author(s):  
Irene Novo ◽  
Eugenio López-Cortegano ◽  
Armando Caballero

Abstract Recent studies have shown the ubiquity of pleiotropy for variants affecting human complex traits. These studies also show that rare variants tend to be less pleiotropic than common ones, suggesting that purifying natural selection acts against highly pleiotropic variants of large effect. Here we investigate the mean frequency, effect size and recombination rate associated with pleiotropic variants, and focus particularly on whether highly pleiotropic variants are enriched in regions with putative strong background selection. We evaluate variants for 41 human traits using data from the NHGRI-EBI GWAS Catalog, as well as data from other three studies. Our results show that variants involving a higher degree of pleiotropy tend to be more common, have larger mean effect sizes, and contribute more to heritability than variants with a lower degree of pleiotropy. Using data from four different studies, we show that more pleiotropic variants are enriched in genome regions with stronger background selection than less pleiotropic variants. Thus, we conclude that even though highly pleiotropic variants found so far have larger average effect sizes and frequencies than less pleiotropic ones, they are likely to be subjected to stronger background selection.



2021 ◽  
Author(s):  
Irene Novo ◽  
Eugenio López-Cortegano ◽  
Armando Caballero

AbstractRecent studies have shown the ubiquity of pleiotropy for variants affecting human complex traits. These studies also show that rare variants tend to be less pleiotropic than common ones, suggesting that purifying natural selection acts against highly pleiotropic variants of large effect. Here, we investigate the mean frequency, effect size and recombination rate associated with pleiotropic variants, and focus particularly on whether highly pleiotropic variants are enriched in regions with putative strong background selection. We evaluate variants for 41 human traits using data from the NHGRI-EBI GWAS Catalog, as well as data from other three studies. Our results show that variants involving a higher degree of pleiotropy tend to be more common, have larger mean effect sizes, and contribute more to heritability than variants with a lower degree of pleiotropy. This is consistent with the fact that variants of large effect and frequency are more likely detected by GWAS. Using data from four different studies, we also show that more pleiotropic variants are enriched in genome regions with stronger background selection than less pleiotropic variants, suggesting that highly pleiotropic variants are subjected to strong purifying selection. From the above results, we hypothesized that a number of highly pleiotropic variants of low effect/frequency may pass undetected by GWAS.



Sign in / Sign up

Export Citation Format

Share Document