scholarly journals MRLocus: Identifying causal genes mediating a trait through Bayesian estimation of allelic heterogeneity

PLoS Genetics ◽  
2021 ◽  
Vol 17 (4) ◽  
pp. e1009455
Author(s):  
Anqi Zhu ◽  
Nana Matoba ◽  
Emma P. Wilson ◽  
Amanda L. Tapia ◽  
Yun Li ◽  
...  

Expression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci with evidence of allelic heterogeneity, that is, containing multiple causal variants. MRLocus makes use of a colocalization step applied to each nearly-LD-independent eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of the extent of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against other state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five candidate causal genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus’s estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.

2020 ◽  
Author(s):  
Anqi Zhu ◽  
Nana Matoba ◽  
Emmaleigh Wilson ◽  
Amanda L. Tapia ◽  
Yun Li ◽  
...  

AbstractExpression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci displaying allelic heterogeneity, that is, containing multiple LD-independent eQTLs. MRLocus makes use of a colocalization step applied to each eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five causal candidate genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus’ estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.


2021 ◽  
Author(s):  
Ken Chen ◽  
Zhenhuang Zhuang ◽  
Chunli Shao ◽  
Jilin Zheng ◽  
Qing Zhou ◽  
...  

Abstract ObjectivesTo investigate the roles of cardiometabolic factors (including blood pressure, blood lipids, thyroid function, body mass, and insulin sensitivity) in mediating the causal effect of type 2 diabetes (T2DM) on cardiovascular disease (CVD) outcomes. DesignTwo-step, two-sample multivariable Mendelian randomization (MVMR) study.SettingInternational genome-wide association study (GWAS) consortia data.ExposureT2DM, blood pressure: systolic blood pressure (SBP), diastolic blood pressure (DBP); blood lipids: low-density lipoprotein (LDL), high-density lipoprotein (HDL), total cholesterol (TC), triglycerides (TG); thyroid function: hyperthyroidism, hypothyroidism; body mass index (BMI), waist-hip-ratio (WHR), and insulin sensitivity. Main outcomesCVD including coronary heart disease (CHD), myocardial infarction (MI) and stroke.MethodsSummary-level data for exposures and main outcomes were extracted from GWAS consortia. We used two-sample MR to illustrate the causal effect of T2DM on CVD subtypes and regression-based MVMR to quantify the possible mediation effects of cardiometabolic factors on CVD.ResultsEach additional unit of log odds of T2DM increased 16% risk of CHD [OR: 1.16, 95% confidence interval (CI): 1.12-1.21], 15% risk of MI (OR: 1.15, 95%CI: 1.10-1.20), and 10% risk of stroke (OR: 1.10, 95%CI: 1.06-1.13). In mediation analysis, SBP, DBP and TG were found as main mediators, while the mediation effects of other cardiometabolic factors were not significant. The proportion of total effect of T2DM on CHD mediated by SBP, DBP and TG was 16% (95%CI: 8%-24%), 7% (95%CI: 1%-13%) and 10% (95%CI: 2%-18%), respectively. Mediation effect of SBP and DBP on MI and stroke, TG on MI was also prominent, while mediation effect of TG on stroke was not significant. Combined mediation effect of all three mediators accounted for 29%, 26% and 13% of total effect of T2DM on CHD, MI and stroke, respectively.ConclusionSBP, DBP and TG mediate a substantial proportion of the causal effect of T2DM on CVD and thus interventions on these factors might reduce considerable excess risk of CVD among T2DM patients.


2019 ◽  
Author(s):  
James Boocock ◽  
Megan Leask ◽  
Yukinori Okada ◽  
Hirotaka Matsuo ◽  
Yusuke Kawamura ◽  
...  

AbstractSerum urate is the end-product of purine metabolism. Elevated serum urate is causal of gout and a predictor of renal disease, cardiovascular disease and other metabolic conditions. Genome-wide association studies (GWAS) have reported dozens of loci associated with serum urate control, however there has been little progress in understanding the molecular basis of the associated loci. Here we employed trans-ancestral meta-analysis using data from European and East Asian populations to identify ten new loci for serum urate levels. Genome-wide colocalization with cis-expression quantitative trait loci (eQTL) identified a further five new loci. By cis- and trans-eQTL colocalization analysis we identified 24 and 20 genes respectively where the causal eQTL variant has a high likelihood that it is shared with the serum urate-associated locus. One new locus identified was SLC22A9 that encodes organic anion transporter 7 (OAT7). We demonstrate that OAT7 is a very weak urate-butyrate exchanger. Newly implicated genes identified in the eQTL analysis include those encoding proteins that make up the dystrophin complex, a scaffold for signaling proteins and transporters at the cell membrane; MLXIP that, with the previously identified MLXIPL, is a transcription factor that may regulate serum urate via the pentose-phosphate pathway; and MRPS7 and IDH2 that encode proteins necessary for mitochondrial function. Trans-ancestral functional fine-mapping identified six loci (RREB1, INHBC, HLF, UBE2Q2, SFMBT1, HNF4G) with colocalized eQTL that contained putative causal SNPs (posterior probability of causality > 0.8). This systematic analysis of serum urate GWAS loci has identified candidate causal genes at 19 loci and a network of previously unidentified genes likely involved in control of serum urate levels, further illuminating the molecular mechanisms of urate control.Author SummaryHigh serum urate is a prerequisite for gout and a risk factor for metabolic disease. Previous GWAS have identified numerous loci that are associated with serum urate control, however, only a small handful of these loci have known molecular consequences. The majority of loci are within the non-coding regions of the genome and therefore it is difficult to ascertain how these variants might influence serum urate levels without tangible links to gene expression and / or protein function. We have applied a novel bioinformatic pipeline where we combined population-specific GWAS data with gene expression and genome connectivity information to identify putative causal genes for serum urate associated loci. Overall, we identified 15 novel serum urate loci and show that these loci along with previously identified loci are linked to the expression of 44 genes. We show that some of the variants within these loci have strong predicted regulatory function which can be further tested in functional analyses. This study expands on previous GWAS by identifying further loci implicated in serum urate control and new causal mechanisms supported by gene expression changes.


2021 ◽  
Author(s):  
Noah James Connally ◽  
Sumaiya Nazeen ◽  
Daniel Lee ◽  
Huwenbo Shi ◽  
John Stamatoyannopoulos ◽  
...  

The genetic basis of most complex traits is highly polygenic and dominated by non-coding alleles, and it is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite availability of expansive gene expression and epigenomic data sets, few variant-to-gene links have emerged. We identified 139 genes in which protein-coding variants cause severe or familial forms of nine human traits. We then computed the association between common complex forms of the same traits and non-coding variation, revealing that most such traits are also associated with non-coding variation in the vicinity of the same genes. However, we found colocalization evidence--the same variant influencing both the physiological trait and gene expression--for only 7% of genes, and transcriptome-wide association evidence with correct direction of effect for only 4% of genes, despite an abundance of eQTLs in most loci. Fine mapping variants to regulatory elements and assigning these to genes by linear distance similarly failed to implicate most genes in complex traits. These results contradict the hypothesis that most complex trait-associated variants coincide with currently ascertained expression quantitative trait loci. The field must confront this deficit, and pursue the "missing regulation."


2019 ◽  
Author(s):  
Haoran Xue ◽  
Wei Pan ◽  

AbstractTranscriptome-wide association study (TWAS) has become popular in integrating a reference eQTL dataset with an independent main GWAS dataset to identify (putatively) causal genes, shedding mechanistic insights to biological pathways from genetic variants to a GWAS trait mediated by gene expression. Statistically TWAS is a (two-sample) 2-stage least squares (2SLS) method in the framework of instrumental variables analysis for causal inference: in Stage 1 it uses the reference eQTL data to impute a gene’s expression for the main GWAS data, then in Stage 2 it tests for association between the imputed gene expression and the GWAS trait; if an association is detected in Stage 2, a (putatively) causal relationship between the gene and the GWAS trait is claimed. If a non-linear model or a generalized linear model (GLM) is fitted in Stage 2 (e.g. for a binary GWAS trait), it is known that using only imputed gene expression, as in standard TWAS, in general does not lead to a consistent (i.e. asymptotically unbiased) estimate for the causal effect; accordingly, a variation of 2SLS, called two-stage residual inclusion (2SRI), has been proposed to yield better estimates (e.g. being consistent under suitable conditions). Our main goal is to investigate whether it is necessary or even better to apply 2SRI, instead of the standard 2SLS. In addition, due to the use of imputed gene expression (i.e. with measurement errors), it is known that in general some correction to the standard error estimate of the causal effect estimate has to be applied, while in the standard TWAS no correction is applied. Is this an issue? We also compare one-sample 2SLS with two-sample 2SLS (i.e. the standard TWAS). We used the ADNI data and simulated data mimicking the ADNI data to address the above questions. At the end, we conclude that, in practice with the large sample sizes and small effect sizes of genetic variants, the standard TWAS performs well and is recommended.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. e1009398
Author(s):  
Arjun Bhattacharya ◽  
Yun Li ◽  
Michael I. Love

Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1–2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.


2020 ◽  
Author(s):  
Asa Thibodeau ◽  
Shubham Khetan ◽  
Alper Eroglu ◽  
Ryan Tewhey ◽  
Michael L. Stitzel ◽  
...  

AbstractCis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n=6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n=40 samples) that were not used in model training (mean average precision=0.80). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.Author SummaryNon-coding DNA sequences serve different functional roles to regulate gene expression. For these sequences to be active, they must be accessible for proteins and other factors to bind in order to carry out a specific regulatory function. Even so, mutations within these sequences or other regulatory events may modulate their activity or regulatory function. It is therefore critical that we identify these non-coding sequences and their specific regulatory function to fully understand how specific genes are regulated. Current sequencing technologies allow us to identify accessible sequences via chromatin accessibility maps from low cell numbers, enabling the study of clinical samples. However, determining the functional role associated with these sequences remains a challenge. Towards this goal, we harnessed the power of deep learning to unravel the intricacies of chromatin accessibility maps to infer their associated gene regulatory functions. We demonstrate that our method, CoRE-ATAC, can infer regulatory functions in diverse cell types, captures activity differences modulated by genetic mutations, and can be applied to accessibility maps of single cell clusters to infer regulatory functions of rare cell populations. These inferences will further our understanding of how genes are regulated and enable the study of these mechanisms as they relate to disease.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Karolina Stępniak ◽  
Magdalena A. Machnicka ◽  
Jakub Mieczkowski ◽  
Anna Macioszek ◽  
Bartosz Wojtaś ◽  
...  

AbstractChromatin structure and accessibility, and combinatorial binding of transcription factors to regulatory elements in genomic DNA control transcription. Genetic variations in genes encoding histones, epigenetics-related enzymes or modifiers affect chromatin structure/dynamics and result in alterations in gene expression contributing to cancer development or progression. Gliomas are brain tumors frequently associated with epigenetics-related gene deregulation. We perform whole-genome mapping of chromatin accessibility, histone modifications, DNA methylation patterns and transcriptome analysis simultaneously in multiple tumor samples to unravel epigenetic dysfunctions driving gliomagenesis. Based on the results of the integrative analysis of the acquired profiles, we create an atlas of active enhancers and promoters in benign and malignant gliomas. We explore these elements and intersect with Hi-C data to uncover molecular mechanisms instructing gene expression in gliomas.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Jonathan D. Licht ◽  
Richard L. Bennett

Abstract Background Epigenetic mechanisms regulate chromatin accessibility patterns that govern interaction of transcription machinery with genes and their cis-regulatory elements. Mutations that affect epigenetic mechanisms are common in cancer. Because epigenetic modifications are reversible many anticancer strategies targeting these mechanisms are currently under development and in clinical trials. Main body Here we review evidence suggesting that epigenetic therapeutics can deactivate immunosuppressive gene expression or reprogram tumor cells to activate antigen presentation mechanisms. In addition, the dysregulation of epigenetic mechanisms commonly observed in cancer may alter the immunogenicity of tumor cells and effectiveness of immunotherapies. Conclusions Therapeutics targeting epigenetic mechanisms may be helpful to counter immune evasion and improve the effectiveness of immunotherapies.


Sign in / Sign up

Export Citation Format

Share Document