Predicting causal variants affecting expression using whole genome sequence and RNA-seq from multiple human tissues

2016 ◽  
Author(s):  
Andrew Anand Brown ◽  
Ana Viñuela ◽  
Olivier Delaneau ◽  
Tim Spector ◽  
Kerrin Small ◽  
...  

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show examples where the gene implicated by expression has been functionally validated as relevant for complex traits.


2017 ◽  
Author(s):  
Luke M. Evans ◽  
Rasool Tahmasbi ◽  
Scott I. Vrieze ◽  
Gonçalo R. Abecasis ◽  
Sayantan Das ◽  
...  

ABSTRACTHeritability, h2, is a foundational concept in genetics, critical to understanding the genetic basis of complex traits. Recently-developed methods that estimate heritability from genotyped SNPs, h2SNP, explain substantially more genetic variance than genome-wide significant loci, but less than classical estimates from twins and families. However, h2SNP estimates have yet to be comprehensively compared under a range of genetic architectures, making it difficult to draw conclusions from sometimes conflicting published estimates. Here, we used thousands of real whole genome sequences to simulate realistic phenotypes under a variety of genetic architectures, including those from very rare causal variants. We compared the performance of ten methods across different types of genotypic data (commercial SNP array positions, whole genome sequence variants, and imputed variants) and under differing causal variant frequencies, levels of stratification, and relatedness thresholds. These results provide guidance in interpreting past results and choosing optimal approaches for future studies. We then chose two methods (GREML-MS and GREML-LDMS) that best estimated overall h2SNP and the causal variant frequency spectra to six phenotypes in the UK Biobank using imputed genome-wide variants. Our results suggest that as imputation reference panels become larger and more diverse, estimates of the frequency distribution of causal variants will become increasingly unbiased and the vast majority of trait narrow-sense heritability will be accounted for.



2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 25-25
Author(s):  
Muhammad Yasir Nawaz ◽  
Rodrigo Pelicioni Savegnago ◽  
Cedric Gondro

Abstract In this study, we detected genome wide footprints of selection in Hanwoo and Angus beef cattle using different allele frequency and haplotype-based methods based on imputed whole genome sequence data. Our dataset included 13,202 Angus and 10,437 Hanwoo animals with 10,057,633 and 13,241,550 imputed SNPs, respectively. A subset of data with 6,873,624 common SNPs between the two populations was used to estimate signatures of selection parameters, both within (runs of homozygosity and extended haplotype homozygosity) and between (allele fixation index, extended haplotype homozygosity) the breeds in order to infer evidence of selection. We observed that correlations between various measures of selection ranged between 0.01 to 0.42. Assuming these parameters were complementary to each other, we combined them into a composite selection signal to identify regions under selection in both beef breeds. The composite signal was based on the average of fractional ranks of individual selection measures for every SNP. We identified some selection signatures that were common between the breeds while others were independent. We also observed that more genomic regions were selected in Angus as compared to Hanwoo. Candidate genes within significant genomic regions may help explain mechanisms of adaptation, domestication history and loci for important traits in Angus and Hanwoo cattle. In the future, we will use the top SNPs under selection for genomic prediction of carcass traits in both breeds.



BMC Genetics ◽  
2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Johanna K. Höglund ◽  
Bernt Guldbrandtsen ◽  
Mogens S. Lund ◽  
Goutam Sahana


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 567 ◽  
Author(s):  
John J Schellenberg ◽  
Tobin J Verbeke ◽  
Peter McQueen ◽  
Oleg V Krokhin ◽  
Xiangli Zhang ◽  
...  


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Sunduimijid Bolormaa ◽  
Andrew A. Swan ◽  
Paul Stothard ◽  
Majid Khansefid ◽  
Nasir Moghaddar ◽  
...  

Abstract Background Imputation to whole-genome sequence is now possible in large sheep populations. It is therefore of interest to use this data in genome-wide association studies (GWAS) to investigate putative causal variants and genes that underpin economically important traits. Merino wool is globally sought after for luxury fabrics, but some key wool quality attributes are unfavourably correlated with the characteristic skin wrinkle of Merinos. In turn, skin wrinkle is strongly linked to susceptibility to “fly strike” (Cutaneous myiasis), which is a major welfare issue. Here, we use whole-genome sequence data in a multi-trait GWAS to identify pleiotropic putative causal variants and genes associated with changes in key wool traits and skin wrinkle. Results A stepwise conditional multi-trait GWAS (CM-GWAS) identified putative causal variants and related genes from 178 independent quantitative trait loci (QTL) of 16 wool and skin wrinkle traits, measured on up to 7218 Merino sheep with 31 million imputed whole-genome sequence (WGS) genotypes. Novel candidate gene findings included the MAT1A gene that encodes an enzyme involved in the sulphur metabolism pathway critical to production of wool proteins, and the ESRP1 gene. We also discovered a significant wrinkle variant upstream of the HAS2 gene, which in dogs is associated with the exaggerated skin folds in the Shar-Pei breed. Conclusions The wool and skin wrinkle traits studied here appear to be highly polygenic with many putative candidate variants showing considerable pleiotropy. Our CM-GWAS identified many highly plausible candidate genes for wool traits as well as breech wrinkle and breech area wool cover.



2017 ◽  
Author(s):  
Charith B. Karunarathna ◽  
Jinko Graham

AbstractBackground and AimsMany methods can detect trait association with causal variants in candidate genomic regions; however, a comparison of their ability to localize causal variants is lacking. We extend a previous study of the detection abilities of these methods to a comparison of their localization abilities.MethodsThrough coalescent simulation, we compare several popular association methods. Cases and controls are sampled from a diploid population to mimic human studies. As benchmarks for comparison, we include two methods that cluster phenotypes on the true genealogical trees, a naive Mantel test considered previously in haploid populations and an extension that takes into account whether case haplotypes carry a causal variant. We first work through a simulated dataset to illustrate the methods. We then perform a simulation study to score the localization and detection properties.ResultsIn our simulations, the association signal was localized least precisely by the naive Mantel test and most precisely by its extension. Most other approaches had intermediate performance similar to the single-variant Fisher’s-exact test.ConclusionsOur results confirm earlier findings in haploid populations about potential gains in performance from genealogy-based approaches. They also highlight differences between haploid and diploid populations when localizing and detecting causal variants.





2017 ◽  
Vol 49 (1) ◽  
Author(s):  
Qianqian Zhang ◽  
Mario P. L. Calus ◽  
Bernt Guldbrandtsen ◽  
Mogens Sandø Lund ◽  
Goutam Sahana


2017 ◽  
Author(s):  
Francisco C. Ceballos ◽  
Scott Hazelhurst ◽  
Michèle Ramsay

AbstractRuns of Homozygosity (ROH) are sequences that arise when identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits and diseases. ROH studies have predominantly exploited SNP array data, but are gradually moving to whole genome sequence (WGS) data as it becomes available. WGS data, covering more genetic variability, can add value to ROH studies, but require additional considerations during analysis. Using SNP array and low coverage WGS data from 1885 individuals from 20 world populations, our aims were to compare ROH from the two datasets and to establish software conditions to get comparable results, thus providing guidelines for combining disparate datasets in joint ROH analyses. Using the PLINK Homozygosity functions, we found that by allowing 3 heterozygous SNPs per window when dealing with WGS low coverage data, it is possible to establish meaningful comparisons between data using the two technologies.



2019 ◽  
Author(s):  
Robert Literman ◽  
Rachel S. Schwartz

AbstractAccurate estimates of species relationships are integral to our understanding of evolution, yet many relationships remain controversial despite whole-genome sequence data. These controversies are due in part to complex patterns of phylogenetic and non-phylogenetic signal coming from regions of the genome experiencing distinct evolutionary forces, which can be difficult to disentangle. Here we profile the amounts and proportions of phylogenetic and non-phylogenetic signal derived from loci spread across mammalian genomes. We identified orthologous sequences from primates, rodents, and pecora, annotated sites as one or more of nine locus types (e.g. coding, intronic, intergenic), and profiled the phylogenetic information contained within locus types across evolutionary timescales associated with each clade. In all cases, non-coding loci provided more overall signal and a higher proportion of phylogenetic signal compared to coding loci. This suggests potential benefits of shifting away from primarily targeting genes or coding regions for phylogenetic studies, particularly in this era of accessible whole genome sequence data. In contrast to long-held assumptions about the phylogenetic utility of more variable genomic regions, most locus types provided relatively consistent phylogenetic information across timescales, although we find evidence that coding and intronic regions may, respectively and to a limited degree, inform disproportionately about older and younger splits. As part of this work we also validate the SISRS pipeline as an annotation-free ortholog discovery pipeline capable of identifying millions of phylogenetically informative sites directly from raw sequencing reads.



Sign in / Sign up

Export Citation Format

Share Document