scholarly journals MRLocus: identifying causal genes mediating a trait through Bayesian estimation of allelic heterogeneity

2020 ◽  
Author(s):  
Anqi Zhu ◽  
Nana Matoba ◽  
Emmaleigh Wilson ◽  
Amanda L. Tapia ◽  
Yun Li ◽  
...  

AbstractExpression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci displaying allelic heterogeneity, that is, containing multiple LD-independent eQTLs. MRLocus makes use of a colocalization step applied to each eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five causal candidate genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus’ estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.

PLoS Genetics ◽  
2021 ◽  
Vol 17 (4) ◽  
pp. e1009455
Author(s):  
Anqi Zhu ◽  
Nana Matoba ◽  
Emma P. Wilson ◽  
Amanda L. Tapia ◽  
Yun Li ◽  
...  

Expression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci with evidence of allelic heterogeneity, that is, containing multiple causal variants. MRLocus makes use of a colocalization step applied to each nearly-LD-independent eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of the extent of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against other state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five candidate causal genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus’s estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.


2021 ◽  
Author(s):  
Ken Chen ◽  
Zhenhuang Zhuang ◽  
Chunli Shao ◽  
Jilin Zheng ◽  
Qing Zhou ◽  
...  

Abstract ObjectivesTo investigate the roles of cardiometabolic factors (including blood pressure, blood lipids, thyroid function, body mass, and insulin sensitivity) in mediating the causal effect of type 2 diabetes (T2DM) on cardiovascular disease (CVD) outcomes. DesignTwo-step, two-sample multivariable Mendelian randomization (MVMR) study.SettingInternational genome-wide association study (GWAS) consortia data.ExposureT2DM, blood pressure: systolic blood pressure (SBP), diastolic blood pressure (DBP); blood lipids: low-density lipoprotein (LDL), high-density lipoprotein (HDL), total cholesterol (TC), triglycerides (TG); thyroid function: hyperthyroidism, hypothyroidism; body mass index (BMI), waist-hip-ratio (WHR), and insulin sensitivity. Main outcomesCVD including coronary heart disease (CHD), myocardial infarction (MI) and stroke.MethodsSummary-level data for exposures and main outcomes were extracted from GWAS consortia. We used two-sample MR to illustrate the causal effect of T2DM on CVD subtypes and regression-based MVMR to quantify the possible mediation effects of cardiometabolic factors on CVD.ResultsEach additional unit of log odds of T2DM increased 16% risk of CHD [OR: 1.16, 95% confidence interval (CI): 1.12-1.21], 15% risk of MI (OR: 1.15, 95%CI: 1.10-1.20), and 10% risk of stroke (OR: 1.10, 95%CI: 1.06-1.13). In mediation analysis, SBP, DBP and TG were found as main mediators, while the mediation effects of other cardiometabolic factors were not significant. The proportion of total effect of T2DM on CHD mediated by SBP, DBP and TG was 16% (95%CI: 8%-24%), 7% (95%CI: 1%-13%) and 10% (95%CI: 2%-18%), respectively. Mediation effect of SBP and DBP on MI and stroke, TG on MI was also prominent, while mediation effect of TG on stroke was not significant. Combined mediation effect of all three mediators accounted for 29%, 26% and 13% of total effect of T2DM on CHD, MI and stroke, respectively.ConclusionSBP, DBP and TG mediate a substantial proportion of the causal effect of T2DM on CVD and thus interventions on these factors might reduce considerable excess risk of CVD among T2DM patients.


2016 ◽  
Author(s):  
Colby Chiang ◽  
Alexandra J. Scott ◽  
Joe R. Davis ◽  
Emily K. Tsang ◽  
Xin Li ◽  
...  

AbstractStructural variants (SVs) are an important source of human genetic diversity but their contribution to traits, disease, and gene regulation remains unclear. The Genotype-Tissue Expression (GTEx) project presents an unprecedented opportunity to address this question due to the availability of deep whole genome sequencing (WGS) and multi-tissue RNA-seq data from 147 individuals. We used comprehensive methods to identify 24,157 high confidence SVs, and mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single nucleotide (SNV) and short insertion/deletion (indel) variants. We identified 24,801 eQTLs affecting the expression of 10,101 distinct genes. Based on haplotype structure and heritability partitioning, we estimate that SVs are the causal variant at 3.3-7.0% of eQTLs, which is nearly an order of magnitude higher than prior estimates from low coverage WGS and represents a 26- to 54-fold enrichment relative to their scarcity in the genome. Expression-altering SVs also have significantly larger effect sizes than SNVs and indels. We identified 787 putatively causal SVs predicted to directly alter gene expression, most of which (88.3%) are noncoding variants that show significant enrichment at enhancers and other regulatory elements. By evaluating linkage disequilibrium between SVs, SNVs and indels, we nominate 49 SVs as plausible causal variants at published genome-wide association study (GWAS) loci. Remarkably, 29.9% of the common SV-eQTLs are not well tagged by flanking SNVs, and we observe a notable abundance (relative to SNVs and indels) of rare, high impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of both common and rare variant association studies.


2020 ◽  
Author(s):  
Bo He ◽  
Chao Zhang ◽  
Xiaoxue Zhang ◽  
Yu Fan ◽  
Hu Zeng ◽  
...  

Abstract 5-Hydroxymethylcytosine (5hmC) is an important epigenetic mark that regulates gene expression. Charting the landscape of 5hmC in human tissues is fundamental to understand its regulatory functions. Here, we systematically profiled the whole-genome 5hmC landscape at single-base resolution for 19 types of human tissues. We found that 5hmC preferentially decorates gene bodies and outperforms gene body 5mC in reflecting gene expression. Approximately one-third of 5hmC peaks are tissue-specific differentially hydroxymethylated regions (tsDhMRs), which are deposited in regulatory elements that regulate the expression of nearby tissue-specific functional genes. In addition, tsDhMRs are enriched with tissue-specific transcription-factor-binding sites and may rewire tissue-specific gene expression networks. Moreover, tsDhMRs are associated with SNPs identified by genome-wide association study (GWAS), linked to tissue-specific phenotypes and diseases. Collectively, our results show the tissue-specific 5hmC landscape of the human genome and demonstrate that 5hmC serves as a fundamental regulatory element affecting tissue-specific development and diseases.


2021 ◽  
Author(s):  
Noah James Connally ◽  
Sumaiya Nazeen ◽  
Daniel Lee ◽  
Huwenbo Shi ◽  
John Stamatoyannopoulos ◽  
...  

The genetic basis of most complex traits is highly polygenic and dominated by non-coding alleles, and it is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite availability of expansive gene expression and epigenomic data sets, few variant-to-gene links have emerged. We identified 139 genes in which protein-coding variants cause severe or familial forms of nine human traits. We then computed the association between common complex forms of the same traits and non-coding variation, revealing that most such traits are also associated with non-coding variation in the vicinity of the same genes. However, we found colocalization evidence--the same variant influencing both the physiological trait and gene expression--for only 7% of genes, and transcriptome-wide association evidence with correct direction of effect for only 4% of genes, despite an abundance of eQTLs in most loci. Fine mapping variants to regulatory elements and assigning these to genes by linear distance similarly failed to implicate most genes in complex traits. These results contradict the hypothesis that most complex trait-associated variants coincide with currently ascertained expression quantitative trait loci. The field must confront this deficit, and pursue the "missing regulation."


2019 ◽  
Vol 116 (22) ◽  
pp. 10636-10645 ◽  
Author(s):  
Ashish Kapoor ◽  
Dongwon Lee ◽  
Luke Zhu ◽  
Elsayed Z. Soliman ◽  
Megan L. Grove ◽  
...  

The rationale for genome-wide association study (GWAS) results is sequence variation in cis-regulatory elements (CREs) modulating a target gene’s expression as the major cause of trait variation. To understand the complete molecular landscape of one of these GWAS loci, we performed in vitro reporter screens in cardiomyocyte cell lines for CREs overlapping nearly all common variants associated with any of five independent QT interval (QTi)-associated GWAS hits at the SCN5A-SCN10A locus. We identified 13 causal CRE variants using allelic reporter activity, cardiomyocyte nuclear extract-based binding assays, overlap with human cardiac tissue DNaseI hypersensitive regions, and predicted impact of sequence variants on DNaseI sensitivity. Our analyses identified at least one high-confidence causal CRE variant for each of the five sentinel hits that could collectively predict SCN5A cardiac gene expression and QTi association. Although all 13 variants could explain SCN5A gene expression, the highest statistical significance was obtained with seven variants (inclusive of the five above). Thus, multiple, causal, mutually associated CRE variants can underlie GWAS signals.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. e1009398
Author(s):  
Arjun Bhattacharya ◽  
Yun Li ◽  
Michael I. Love

Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1–2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.


2020 ◽  
Author(s):  
Asa Thibodeau ◽  
Shubham Khetan ◽  
Alper Eroglu ◽  
Ryan Tewhey ◽  
Michael L. Stitzel ◽  
...  

AbstractCis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n=6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n=40 samples) that were not used in model training (mean average precision=0.80). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.Author SummaryNon-coding DNA sequences serve different functional roles to regulate gene expression. For these sequences to be active, they must be accessible for proteins and other factors to bind in order to carry out a specific regulatory function. Even so, mutations within these sequences or other regulatory events may modulate their activity or regulatory function. It is therefore critical that we identify these non-coding sequences and their specific regulatory function to fully understand how specific genes are regulated. Current sequencing technologies allow us to identify accessible sequences via chromatin accessibility maps from low cell numbers, enabling the study of clinical samples. However, determining the functional role associated with these sequences remains a challenge. Towards this goal, we harnessed the power of deep learning to unravel the intricacies of chromatin accessibility maps to infer their associated gene regulatory functions. We demonstrate that our method, CoRE-ATAC, can infer regulatory functions in diverse cell types, captures activity differences modulated by genetic mutations, and can be applied to accessibility maps of single cell clusters to infer regulatory functions of rare cell populations. These inferences will further our understanding of how genes are regulated and enable the study of these mechanisms as they relate to disease.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Karolina Stępniak ◽  
Magdalena A. Machnicka ◽  
Jakub Mieczkowski ◽  
Anna Macioszek ◽  
Bartosz Wojtaś ◽  
...  

AbstractChromatin structure and accessibility, and combinatorial binding of transcription factors to regulatory elements in genomic DNA control transcription. Genetic variations in genes encoding histones, epigenetics-related enzymes or modifiers affect chromatin structure/dynamics and result in alterations in gene expression contributing to cancer development or progression. Gliomas are brain tumors frequently associated with epigenetics-related gene deregulation. We perform whole-genome mapping of chromatin accessibility, histone modifications, DNA methylation patterns and transcriptome analysis simultaneously in multiple tumor samples to unravel epigenetic dysfunctions driving gliomagenesis. Based on the results of the integrative analysis of the acquired profiles, we create an atlas of active enhancers and promoters in benign and malignant gliomas. We explore these elements and intersect with Hi-C data to uncover molecular mechanisms instructing gene expression in gliomas.


Sign in / Sign up

Export Citation Format

Share Document