Leveraging allele-specific expression to refine fine-mapping for eQTL studies

AbstractMany disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. It is hypothesized that these variants affect complex traits by acting as expression quantitative trait loci (eQTLs) that influence expression of nearby genes. This indicates that many causal variants for complex traits are likely to be causal variants for gene expression. Hence, identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allele-specific expression (ASE) that measures imbalance in gene expression due to different alleles. In this work, we develop a novel statistical method that leverages both ASE and eQTL information to detect causal variants that regulate gene expression. We illustrate through simulations and application to the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. In the GTEx dataset, our method achieves the median reduction rate of 11% in the number of putative causal [email protected], [email protected]

Download Full-text

Scalable unified framework of total and allele-specific counts for cis-QTL, fine-mapping, and prediction

10.1101/2020.04.22.050666 ◽

2020 ◽

Author(s):

Yanyu Liang ◽

François Aguet ◽

Alvaro Barbeira ◽

Kristin Ardlie ◽

Hae Kyung Im

Keyword(s):

Gene Expression ◽

Fine Mapping ◽

Complex Traits ◽

Association Studies ◽

Average Power ◽

Specific Gene ◽

Genome Wide Association Studies ◽

Unified Framework ◽

Causal Genes ◽

Allele Specific

AbstractGenome-wide association studies (GWAS) have been highly successful in identifying genomic loci associated with complex traits. However, identification of the causal genes that mediate these associations remains challenging, and many approaches integrating transcriptomic data with GWAS have been proposed. However, there currently exist no computationally scalable methods that integrate total and allele-specific gene expression to maximize power to detect genetic effects on gene expression. Here, we describe a unified framework that is scalable to studies with thousands of samples. Using simulations and data from GTEx, we demonstrate an average power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. We provide a suite of freely available tools, mixQTL, mixFine, and mixPred, that apply this framework for mapping of quantitative trait loci, fine-mapping, and prediction.

Download Full-text

Allele-Specific Expression and High-Throughput Reporter Assay Reveal Functional Variants in Human Brains with Alcohol Use Disorders

10.1101/514992 ◽

2019 ◽

Cited By ~ 3

Author(s):

Xi Rao ◽

Kriti S. Thapa ◽

Andy B Chen ◽

Hai Lin ◽

Hongyu Gao ◽

...

Keyword(s):

Alcohol Use ◽

Rna Binding ◽

Rna Binding Proteins ◽

Association Studies ◽

Postmortem Brain ◽

Genome Wide Association Studies ◽

Specific Expression ◽

Allele Specific Expression ◽

Functional Variants ◽

Allele Specific

AbstractTranscriptome studies can identify genes whose expression differs between alcoholics and controls. To test which variants associated with alcohol use disorder (AUDs) may cause expression differences, we integrated deep RNA-seq and genome-wide association studies (GWAS) data from four postmortem brain regions of 30 AUDs subjects and 30 controls (social/non-drinkers) and analyzed allele-specific expression (ASE). We identified 90 genes with differential ASE in subjects with AUDs compared to controls. Of these, 61 genes contained 437 single nucleotide polymorphisms (SNPs) in the 3’ untranslated regions (3’UTR) with at least one heterozygote among the subjects studied. Using a modified PASSPORT-seq (parallel assessment of polymorphisms in miRNA target-sites by sequencing) assay, we identified 25 SNPs that showed affected RNA levels in a consistent manner in two neuroblastoma cell lines, SH-SY5Y and SK-N-BE(2). Many of these are in binding sites of miRNAs and RNA binding proteins, indicating that these SNPs are likely causal variants of AUD-associated differential ASE.

Download Full-text

Effect sizes of causal variants for gene expression and complex traits differ between populations

10.1101/2021.12.06.471235 ◽

2021 ◽

Author(s):

Roshni A. Patel ◽

Shaila A. Musharoff ◽

Jeffrey P. Spence ◽

Harold Pimentel ◽

Catherine Tcheandjieu ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Association Studies ◽

Causal Variant ◽

Effect Sizes ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Polygenic Scores ◽

Causal Variants ◽

Variant Effect

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.

Download Full-text

SparsePro: an efficient genome-wide fine-mapping method integrating summary statistics and functional annotations

10.1101/2021.10.04.463133 ◽

2021 ◽

Author(s):

Wenmin Zhang ◽

Hamed S Najafabadi ◽

Yue Li

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Genetic Architecture ◽

Association Studies ◽

Computational Cost ◽

Mapping Method ◽

Genome Wide Association Studies ◽

Functional Annotations ◽

Genome Wide ◽

Causal Variants

Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct functionally informed statistical fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations used in existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs.

Download Full-text

Identifying causal variants by fine mapping across multiple studies

PLoS Genetics ◽

10.1371/journal.pgen.1009733 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009733

Author(s):

Nathan LaPierre ◽

Kodi Taraszka ◽

Helen Huang ◽

Rosemary He ◽

Farhad Hormozdiari ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Association Studies ◽

Density Lipoprotein ◽

Genome Wide Association Studies ◽

Multivariate Normal ◽

Multiple Study ◽

Genome Wide ◽

Causal Variants ◽

Different Populations

Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).

Download Full-text

Integrating gene expression with summary association statistics to identify susceptibility genes for 30 complex traits

10.1101/072967 ◽

2016 ◽

Cited By ~ 2

Author(s):

Nicholas Mancuso ◽

Huwenbo Shi ◽

Pagé Goddard ◽

Gleb Kichaev ◽

Alexander Gusev ◽

...

Keyword(s):

Gene Expression ◽

Genetic Correlation ◽

Complex Traits ◽

Association Studies ◽

Susceptibility Genes ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Causal Variants

AbstractAlthough genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. We leverage recently introduced methods to integrate gene expression measurements from 45 expression panels with summary GWAS data to perform 30 transcriptome-wide association studies (TWASs). We identify 1,196 susceptibility genes whose expression is associated with these traits; of these, 168 reside more than 0.5Mb away from any previously reported GWAS significant variant, thus providing new risk loci. Second, we find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, 8 are not found through genetic correlation at the SNP level. Third, we use bi-directional regression to find evidence for BMI causally influencing triglyceride levels, and triglyceride levels causally influencing LDL. Taken together, our results provide insights into the role of expression to susceptibility of complex traits and diseases.

Download Full-text

Identifying Causal Variants by Fine Mapping Across Multiple Studies

10.1101/2020.01.15.908517 ◽

2020 ◽

Cited By ~ 2

Author(s):

Nathan LaPierre ◽

Kodi Taraszka ◽

Helen Huang ◽

Rosemary He ◽

Farhad Hormozdiari ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Multiple Study ◽

Current State ◽

Genome Wide ◽

Causal Variants ◽

Different Populations

AbstractIncreasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. In a trans-ethnic, trans-biobank Type 2 Diabetes analysis, we show that MsCAVIAR returns causal set sizes that are over 20% smaller than those given by current state of the art methods for trans-ethnic fine-mapping.

Download Full-text

ASEP: gene-based detection of allele-specific expression in a population by RNA-seq

10.1101/798124 ◽

2019 ◽

Author(s):

Jiaxin Fan ◽

Jian Hu ◽

Chenyi Xue ◽

Hanrui Zhang ◽

Muredach P. Reilly ◽

...

Keyword(s):

Association Studies ◽

Genome Wide Association Studies ◽

Rna Seq ◽

Specific Expression ◽

Allele Specific Expression ◽

Wide Range ◽

Gene Level ◽

Shared Information ◽

Allele Specific ◽

Regulated Gene Expression

ABSTRACTAllele-specific expression (ASE) analysis, which quantifies the relative expression of two alleles in a diploid individual, is a powerful tool for identifying cis-regulated gene expression variations that underlie phenotypic differences among individuals. Existing methods for gene-level ASE detection analyze one individual at a time, therefore wasting shared information across individuals. Failure to accommodate such shared information not only loses power, but also makes it difficult to interpret results across individuals. However, ASE detection across individuals is challenging because the data often include individuals that are either heterozygous or homozygous for the unobserved cis-regulatory SNP, leading to heterogeneity in ASE as only those heterozygous individuals are informative for ASE, whereas those homozygous individuals have balanced expression. To simultaneously model multi-individual information and account for such heterogeneity, we developed ASEP, a mixture model with subject-specific random effect accounting for multi-SNP correlations within the same gene. ASEP is able to detect gene-level ASE under one condition and differential ASE between two conditions (e.g., pre-versus post-treatment). Extensive simulations have demonstrated the convincing performance of ASEP under a wide range of scenarios. We further applied ASEP to RNA-seq data of human macrophages, and identified genes showing evidence of differential ASE pre-versus post-stimulation, which were extended through findings in cardiometabolic trait-relevant genome-wide association studies. To the best of our knowledge, ASEP is the first method for gene-level ASE detection at the population level. With the growing adoption of RNA-seq, we believe ASEP will be well-suited for various ASE studies for human diseases.

Download Full-text

Allele-specific expression of Parkinson’s disease susceptibility genes in human brain

Scientific Reports ◽

10.1038/s41598-020-79990-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Margrete Langmyhr ◽

Sandra Pilar Henriksen ◽

Chiara Cappelletti ◽

Wilma D. J. van de Berg ◽

Lasse Pihlstrøm ◽

...

Keyword(s):

Gene Expression ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Whole Blood ◽

Allelic Expression ◽

Genome Wide Association Studies ◽

Specific Expression ◽

Allele Specific Expression ◽

Specific Effects ◽

Allele Specific

AbstractGenome-wide association studies have identified genetic variation in genomic loci associated with susceptibility to Parkinson’s disease (PD), the most common neurodegenerative movement disorder worldwide. We used allelic expression profiling of genes located within PD-associated loci to identify cis-regulatory variation affecting gene expression. DNA and RNA were extracted from post-mortem superior frontal gyrus tissue and whole blood samples from PD patients and controls. The relative allelic expression of transcribed SNPs in 12 GWAS risk genes was analysed by real-time qPCR. Allele-specific expression was identified for 9 out of 12 genes tested (GBA, TMEM175, RAB7L1, NUCKS1, MCCC1, BCKDK, ZNF646, LZTS3, and WDHD1) in brain tissue samples. Three genes (GPNMB, STK39 and SIPA1L2) did not show significant allele-specific effects. Allele-specific effects were confirmed in whole blood for three genes (BCKDK, LZTS3 and MCCC1), whereas two genes (RAB7L1 and NUCKS1) showed brain-specific allelic expression. Our study supports the hypothesis that changes to the cis-regulation of gene expression is a major mechanism behind a large proportion of genetic associations in PD. Interestingly, allele-specific expression was also observed for coding variants believed to be causal variants (GBA and TMEM175), indicating that splicing and other regulatory mechanisms may be involved in disease development.

Download Full-text

Insights from complex trait fine-mapping across diverse populations

10.1101/2021.09.03.21262975 ◽

2021 ◽

Author(s):

Masahiro Kanai ◽

Jacob C Ulirsch ◽

Juha Karjalainen ◽

Mitja Kurki ◽

Konrad J Karczewski ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Large Scale ◽

Association Studies ◽

Great Success ◽

Genome Wide Association Studies ◽

Diverse Populations ◽

High Confidence ◽

Causal Variants ◽

Coding Variants

AbstractDespite the great success of genome-wide association studies (GWAS) in identifying genetic loci significantly associated with diseases, the vast majority of causal variants underlying disease-associated loci have not been identified1–3. To create an atlas of causal variants, we performed and integrated fine-mapping across 148 complex traits in three large-scale biobanks (BioBank Japan4,5, FinnGen6, and UK Biobank7,8; total n = 811,261), resulting in 4,518 variant-trait pairs with high posterior probability (> 0.9) of causality. Of these, we found 285 high-confidence variant-trait pairs replicated across multiple populations, and we characterized multiple contributors to the surprising lack of overlap among fine-mapping results from different biobanks. By studying the bottlenecked Finnish and Japanese populations, we identified 21 and 26 putative causal coding variants with extreme allele frequency enrichment (> 10-fold) in these two populations, respectively. Aggregating data across populations enabled identification of 1,492 unique fine-mapped coding variants and 176 genes in which multiple independent coding variants influence the same trait (i.e., with an allelic series of coding variants). Our results demonstrate that fine-mapping in diverse populations enables novel insights into the biology of complex traits by pinpointing high-confidence causal variants for further characterization.

Download Full-text