scholarly journals Leveraging allele-specific expression to refine fine-mapping for eQTL studies

2018 ◽  
Author(s):  
Jennifer Zou ◽  
Farhad Hormozdiari ◽  
Brandon Jew ◽  
Jason Ernst ◽  
Jae Hoon Sul ◽  
...  

AbstractMany disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. It is hypothesized that these variants affect complex traits by acting as expression quantitative trait loci (eQTLs) that influence expression of nearby genes. This indicates that many causal variants for complex traits are likely to be causal variants for gene expression. Hence, identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allele-specific expression (ASE) that measures imbalance in gene expression due to different alleles. In this work, we develop a novel statistical method that leverages both ASE and eQTL information to detect causal variants that regulate gene expression. We illustrate through simulations and application to the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. In the GTEx dataset, our method achieves the median reduction rate of 11% in the number of putative causal [email protected], [email protected]

2020 ◽  
Author(s):  
Yanyu Liang ◽  
François Aguet ◽  
Alvaro Barbeira ◽  
Kristin Ardlie ◽  
Hae Kyung Im

AbstractGenome-wide association studies (GWAS) have been highly successful in identifying genomic loci associated with complex traits. However, identification of the causal genes that mediate these associations remains challenging, and many approaches integrating transcriptomic data with GWAS have been proposed. However, there currently exist no computationally scalable methods that integrate total and allele-specific gene expression to maximize power to detect genetic effects on gene expression. Here, we describe a unified framework that is scalable to studies with thousands of samples. Using simulations and data from GTEx, we demonstrate an average power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. We provide a suite of freely available tools, mixQTL, mixFine, and mixPred, that apply this framework for mapping of quantitative trait loci, fine-mapping, and prediction.


2019 ◽  
Author(s):  
Xi Rao ◽  
Kriti S. Thapa ◽  
Andy B Chen ◽  
Hai Lin ◽  
Hongyu Gao ◽  
...  

AbstractTranscriptome studies can identify genes whose expression differs between alcoholics and controls. To test which variants associated with alcohol use disorder (AUDs) may cause expression differences, we integrated deep RNA-seq and genome-wide association studies (GWAS) data from four postmortem brain regions of 30 AUDs subjects and 30 controls (social/non-drinkers) and analyzed allele-specific expression (ASE). We identified 90 genes with differential ASE in subjects with AUDs compared to controls. Of these, 61 genes contained 437 single nucleotide polymorphisms (SNPs) in the 3’ untranslated regions (3’UTR) with at least one heterozygote among the subjects studied. Using a modified PASSPORT-seq (parallel assessment of polymorphisms in miRNA target-sites by sequencing) assay, we identified 25 SNPs that showed affected RNA levels in a consistent manner in two neuroblastoma cell lines, SH-SY5Y and SK-N-BE(2). Many of these are in binding sites of miRNAs and RNA binding proteins, indicating that these SNPs are likely causal variants of AUD-associated differential ASE.


2021 ◽  
Author(s):  
Roshni A. Patel ◽  
Shaila A. Musharoff ◽  
Jeffrey P. Spence ◽  
Harold Pimentel ◽  
Catherine Tcheandjieu ◽  
...  

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.


2021 ◽  
Author(s):  
Wenmin Zhang ◽  
Hamed S Najafabadi ◽  
Yue Li

Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct functionally informed statistical fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations used in existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (9) ◽  
pp. e1009733
Author(s):  
Nathan LaPierre ◽  
Kodi Taraszka ◽  
Helen Huang ◽  
Rosemary He ◽  
Farhad Hormozdiari ◽  
...  

Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).


2016 ◽  
Author(s):  
Nicholas Mancuso ◽  
Huwenbo Shi ◽  
Pagé Goddard ◽  
Gleb Kichaev ◽  
Alexander Gusev ◽  
...  

AbstractAlthough genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. We leverage recently introduced methods to integrate gene expression measurements from 45 expression panels with summary GWAS data to perform 30 transcriptome-wide association studies (TWASs). We identify 1,196 susceptibility genes whose expression is associated with these traits; of these, 168 reside more than 0.5Mb away from any previously reported GWAS significant variant, thus providing new risk loci. Second, we find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, 8 are not found through genetic correlation at the SNP level. Third, we use bi-directional regression to find evidence for BMI causally influencing triglyceride levels, and triglyceride levels causally influencing LDL. Taken together, our results provide insights into the role of expression to susceptibility of complex traits and diseases.


Author(s):  
Nathan LaPierre ◽  
Kodi Taraszka ◽  
Helen Huang ◽  
Rosemary He ◽  
Farhad Hormozdiari ◽  
...  

AbstractIncreasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. In a trans-ethnic, trans-biobank Type 2 Diabetes analysis, we show that MsCAVIAR returns causal set sizes that are over 20% smaller than those given by current state of the art methods for trans-ethnic fine-mapping.


2019 ◽  
Author(s):  
Jiaxin Fan ◽  
Jian Hu ◽  
Chenyi Xue ◽  
Hanrui Zhang ◽  
Muredach P. Reilly ◽  
...  

ABSTRACTAllele-specific expression (ASE) analysis, which quantifies the relative expression of two alleles in a diploid individual, is a powerful tool for identifying cis-regulated gene expression variations that underlie phenotypic differences among individuals. Existing methods for gene-level ASE detection analyze one individual at a time, therefore wasting shared information across individuals. Failure to accommodate such shared information not only loses power, but also makes it difficult to interpret results across individuals. However, ASE detection across individuals is challenging because the data often include individuals that are either heterozygous or homozygous for the unobserved cis-regulatory SNP, leading to heterogeneity in ASE as only those heterozygous individuals are informative for ASE, whereas those homozygous individuals have balanced expression. To simultaneously model multi-individual information and account for such heterogeneity, we developed ASEP, a mixture model with subject-specific random effect accounting for multi-SNP correlations within the same gene. ASEP is able to detect gene-level ASE under one condition and differential ASE between two conditions (e.g., pre-versus post-treatment). Extensive simulations have demonstrated the convincing performance of ASEP under a wide range of scenarios. We further applied ASEP to RNA-seq data of human macrophages, and identified genes showing evidence of differential ASE pre-versus post-stimulation, which were extended through findings in cardiometabolic trait-relevant genome-wide association studies. To the best of our knowledge, ASEP is the first method for gene-level ASE detection at the population level. With the growing adoption of RNA-seq, we believe ASEP will be well-suited for various ASE studies for human diseases.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Margrete Langmyhr ◽  
Sandra Pilar Henriksen ◽  
Chiara Cappelletti ◽  
Wilma D. J. van de Berg ◽  
Lasse Pihlstrøm ◽  
...  

AbstractGenome-wide association studies have identified genetic variation in genomic loci associated with susceptibility to Parkinson’s disease (PD), the most common neurodegenerative movement disorder worldwide. We used allelic expression profiling of genes located within PD-associated loci to identify cis-regulatory variation affecting gene expression. DNA and RNA were extracted from post-mortem superior frontal gyrus tissue and whole blood samples from PD patients and controls. The relative allelic expression of transcribed SNPs in 12 GWAS risk genes was analysed by real-time qPCR. Allele-specific expression was identified for 9 out of 12 genes tested (GBA, TMEM175, RAB7L1, NUCKS1, MCCC1, BCKDK, ZNF646, LZTS3, and WDHD1) in brain tissue samples. Three genes (GPNMB, STK39 and SIPA1L2) did not show significant allele-specific effects. Allele-specific effects were confirmed in whole blood for three genes (BCKDK, LZTS3 and MCCC1), whereas two genes (RAB7L1 and NUCKS1) showed brain-specific allelic expression. Our study supports the hypothesis that changes to the cis-regulation of gene expression is a major mechanism behind a large proportion of genetic associations in PD. Interestingly, allele-specific expression was also observed for coding variants believed to be causal variants (GBA and TMEM175), indicating that splicing and other regulatory mechanisms may be involved in disease development.


2021 ◽  
Author(s):  
Masahiro Kanai ◽  
Jacob C Ulirsch ◽  
Juha Karjalainen ◽  
Mitja Kurki ◽  
Konrad J Karczewski ◽  
...  

AbstractDespite the great success of genome-wide association studies (GWAS) in identifying genetic loci significantly associated with diseases, the vast majority of causal variants underlying disease-associated loci have not been identified1–3. To create an atlas of causal variants, we performed and integrated fine-mapping across 148 complex traits in three large-scale biobanks (BioBank Japan4,5, FinnGen6, and UK Biobank7,8; total n = 811,261), resulting in 4,518 variant-trait pairs with high posterior probability (> 0.9) of causality. Of these, we found 285 high-confidence variant-trait pairs replicated across multiple populations, and we characterized multiple contributors to the surprising lack of overlap among fine-mapping results from different biobanks. By studying the bottlenecked Finnish and Japanese populations, we identified 21 and 26 putative causal coding variants with extreme allele frequency enrichment (> 10-fold) in these two populations, respectively. Aggregating data across populations enabled identification of 1,492 unique fine-mapped coding variants and 176 genes in which multiple independent coding variants influence the same trait (i.e., with an allelic series of coding variants). Our results demonstrate that fine-mapping in diverse populations enables novel insights into the biology of complex traits by pinpointing high-confidence causal variants for further characterization.


Sign in / Sign up

Export Citation Format

Share Document