scholarly journals Some Statistical Consideration in Transcriptome-Wide Association Studies

2019 ◽  
Author(s):  
Haoran Xue ◽  
Wei Pan ◽  

AbstractTranscriptome-wide association study (TWAS) has become popular in integrating a reference eQTL dataset with an independent main GWAS dataset to identify (putatively) causal genes, shedding mechanistic insights to biological pathways from genetic variants to a GWAS trait mediated by gene expression. Statistically TWAS is a (two-sample) 2-stage least squares (2SLS) method in the framework of instrumental variables analysis for causal inference: in Stage 1 it uses the reference eQTL data to impute a gene’s expression for the main GWAS data, then in Stage 2 it tests for association between the imputed gene expression and the GWAS trait; if an association is detected in Stage 2, a (putatively) causal relationship between the gene and the GWAS trait is claimed. If a non-linear model or a generalized linear model (GLM) is fitted in Stage 2 (e.g. for a binary GWAS trait), it is known that using only imputed gene expression, as in standard TWAS, in general does not lead to a consistent (i.e. asymptotically unbiased) estimate for the causal effect; accordingly, a variation of 2SLS, called two-stage residual inclusion (2SRI), has been proposed to yield better estimates (e.g. being consistent under suitable conditions). Our main goal is to investigate whether it is necessary or even better to apply 2SRI, instead of the standard 2SLS. In addition, due to the use of imputed gene expression (i.e. with measurement errors), it is known that in general some correction to the standard error estimate of the causal effect estimate has to be applied, while in the standard TWAS no correction is applied. Is this an issue? We also compare one-sample 2SLS with two-sample 2SLS (i.e. the standard TWAS). We used the ADNI data and simulated data mimicking the ADNI data to address the above questions. At the end, we conclude that, in practice with the large sample sizes and small effect sizes of genetic variants, the standard TWAS performs well and is recommended.

Author(s):  
Fernando Pires Hartwig ◽  
Kate Tilling ◽  
George Davey Smith ◽  
Deborah A Lawlor ◽  
Maria Carolina Borges

Abstract Background Two-sample Mendelian randomization (MR) allows the use of freely accessible summary association results from genome-wide association studies (GWAS) to estimate causal effects of modifiable exposures on outcomes. Some GWAS adjust for heritable covariables in an attempt to estimate direct effects of genetic variants on the trait of interest. One, both or neither of the exposure GWAS and outcome GWAS may have been adjusted for covariables. Methods We performed a simulation study comprising different scenarios that could motivate covariable adjustment in a GWAS and analysed real data to assess the influence of using covariable-adjusted summary association results in two-sample MR. Results In the absence of residual confounding between exposure and covariable, between exposure and outcome, and between covariable and outcome, using covariable-adjusted summary associations for two-sample MR eliminated bias due to horizontal pleiotropy. However, covariable adjustment led to bias in the presence of residual confounding (especially between the covariable and the outcome), even in the absence of horizontal pleiotropy (when the genetic variants would be valid instruments without covariable adjustment). In an analysis using real data from the Genetic Investigation of ANthropometric Traits (GIANT) consortium and UK Biobank, the causal effect estimate of waist circumference on blood pressure changed direction upon adjustment of waist circumference for body mass index. Conclusions Our findings indicate that using covariable-adjusted summary associations in MR should generally be avoided. When that is not possible, careful consideration of the causal relationships underlying the data (including potentially unmeasured confounders) is required to direct sensitivity analyses and interpret results with appropriate caution.


2019 ◽  
Author(s):  
Jing Yang ◽  
Amanda McGovern ◽  
Paul Martin ◽  
Kate Duffus ◽  
Xiangyu Ge ◽  
...  

AbstractGenome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T-cells over 24 hours, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T-cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yuquan Wang ◽  
Tingting Li ◽  
Liwan Fu ◽  
Siqian Yang ◽  
Yue-Qing Hu

Mendelian randomization makes use of genetic variants as instrumental variables to eliminate the influence induced by unknown confounders on causal estimation in epidemiology studies. However, with the soaring genetic variants identified in genome-wide association studies, the pleiotropy, and linkage disequilibrium in genetic variants are unavoidable and may produce severe bias in causal inference. In this study, by modeling the pleiotropic effect as a normally distributed random effect, we propose a novel mixed-effects regression model-based method PLDMR, pleiotropy and linkage disequilibrium adaptive Mendelian randomization, which takes linkage disequilibrium into account and also corrects for the pleiotropic effect in causal effect estimation and statistical inference. We conduct voluminous simulation studies to evaluate the performance of the proposed and existing methods. Simulation results illustrate the validity and advantage of the novel method, especially in the case of linkage disequilibrium and directional pleiotropic effects, compared with other methods. In addition, by applying this novel method to the data on Atherosclerosis Risk in Communications Study, we conclude that body mass index has a significant causal effect on and thus might be a potential risk factor of systolic blood pressure. The novel method is implemented in R and the corresponding R code is provided for free download.


2019 ◽  
Author(s):  
Tom G Richardson ◽  
Gibran Hemani ◽  
Tom R Gaunt ◽  
Caroline L Relton ◽  
George Davey Smith

AbstractBackgroundDeveloping insight into tissue-specific transcriptional mechanisms can help improve our understanding of how genetic variants exert their effects on complex traits and disease. By applying the principles of Mendelian randomization, we have undertaken a systematic analysis to evaluate transcriptome-wide associations between gene expression across 48 different tissue types and 395 complex traits.ResultsOverall, we identified 100,025 gene-trait associations based on conventional genome-wide corrections (P < 5 × 10−08) that also provided evidence of genetic colocalization. These results indicated that genetic variants which influence gene expression levels in multiple tissues are more likely to influence multiple complex traits. We identified many examples of tissue-specific effects, such as genetically-predicted TPO, NR3C2 and SPATA13 expression only associating with thyroid disease in thyroid tissue. Additionally, FBN2 expression was associated with both cardiovascular and lung function traits, but only when analysed in heart and lung tissue respectively.We also demonstrate that conducting phenome-wide evaluations of our results can help flag adverse on-target side effects for therapeutic intervention, as well as propose drug repositioning opportunities. Moreover, we find that exploring the tissue-dependency of associations identified by genome-wide association studies (GWAS) can help elucidate the causal genes and tissues responsible for effects, as well as uncover putative novel associations.ConclusionsThe atlas of tissue-dependent associations we have constructed should prove extremely valuable to future studies investigating the genetic determinants of complex disease. The follow-up analyses we have performed in this study are merely a guide for future research. Conducting similar evaluations can be undertaken systematically at http://mrcieu.mrsoftware.org/Tissue_MR_atlas/.


2021 ◽  
Author(s):  
Kimmo Eriksson ◽  
Kimmo Sorjonen ◽  
Daniel Falkstedt ◽  
Bo Melin ◽  
Gustav Nilsonne

Effects of education on intelligence are controversial. Earlier studies of longitudinal data have observed positive associations between level of education and a later measurement of intelligence, when statistically controlling for an earlier measurement of intelligence, and furthermore that this association is stronger among individuals with lower pre-education intelligence. Here we challenge the interpretation that these observations reflect a causal effect of education. We develop and analyze a mathematical model in which education is assumed to have zero effect on intelligence, showing that precisely the observed pattern of results arises as a statistical artefact due to measurement errors. Fitting our model to a dataset used in a prior study, we show that observed associations between education and intelligence are closely replicated in simulated data generated by our model. Thus, our reanalysis indicates that additional higher education does not cause an increase in intelligence. We discuss how positive findings in studies of policy changes and school-age cutoff are limited to basic education and may not generalize to higher education.


2014 ◽  
Vol 9 ◽  
pp. BMI.S13729 ◽  
Author(s):  
Chindo Hicks ◽  
Tejaswi Koganti ◽  
Shankar Giri ◽  
Memory Tekere ◽  
Ritika Ramani ◽  
...  

Genome-wide association studies (GWAS) have achieved great success in identifying single nucleotide polymorphisms (SNPs, herein called genetic variants) and genes associated with risk of developing prostate cancer. However, GWAS do not typically link the genetic variants to the disease state or inform the broader context in which the genetic variants operate. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to infer the causal association between gene expression and the disease and to identify the network states and biological pathways enriched for genetic variants. We identified gene regulatory networks and biological pathways enriched for genetic variants, including the prostate cancer, IGF-1, JAK2, androgen, and prolactin signaling pathways. The integration of GWAS information with gene expression data provides insights about the broader context in which genetic variants associated with an increased risk of developing prostate cancer operate.


2018 ◽  
Author(s):  
Charlie Hatcher ◽  
Caroline L. Relton ◽  
Tom R. Gaunt ◽  
Tom G. Richardson

AbstractIntegrative approaches which harness large-scale molecular datasets can help develop mechanistic insight into findings from genome-wide association studies (GWAS). We have performed extensive analyses to uncover transcriptional and epigenetic processes which may play a role in neurological trait variation.This was undertaken by applying Bayesian multiple-trait colocalization systematically across the genome to identify genetic variants responsible for influencing intermediate molecular phenotypes as well as neurological traits. In this analysis we leveraged high dimensional quantitative trait loci data derived from prefrontal cortex tissue (concerning gene expression, DNA methylation and histone acetylation) and GWAS findings for 5 neurological traits (Neuroticism, Schizophrenia, Educational Attainment, Insomnia and Alzheimer’s disease).There was evidence of colocalization for 118 associations suggesting that the same underlying genetic variant influenced both nearby gene expression as well as neurological trait variation. Of these, 73 associations provided evidence that the genetic variant also influenced proximal DNA methylation and/or histone acetylation. These findings support previous evidence at loci where epigenetic mechanisms may putatively mediate effects of genetic variants on traits, such as KLC1 and schizophrenia. We also uncovered evidence implicating novel loci in neurological disease susceptibility, including genes expressed predominantly in brain tissue such as MDGA1, KIRREL3 and SLC12A5.An inverse relationship between DNA methylation and gene expression was observed more than can be accounted for by chance, supporting previous findings implicating DNA methylation as a transcriptional repressor. Our study should prove valuable in helping future studies prioritise candidate genes and epigenetic mechanisms for in-depth functional follow-up analyses.


2019 ◽  
Author(s):  
Yi Yang ◽  
Xingjie Shi ◽  
Yuling Jiao ◽  
Jian Huang ◽  
Min Chen ◽  
...  

AbstractMotivationAlthough genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) [42] was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required.ResultsIn this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS [email protected] and implementationThe implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM.Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
James Boocock ◽  
Megan Leask ◽  
Yukinori Okada ◽  
Hirotaka Matsuo ◽  
Yusuke Kawamura ◽  
...  

AbstractSerum urate is the end-product of purine metabolism. Elevated serum urate is causal of gout and a predictor of renal disease, cardiovascular disease and other metabolic conditions. Genome-wide association studies (GWAS) have reported dozens of loci associated with serum urate control, however there has been little progress in understanding the molecular basis of the associated loci. Here we employed trans-ancestral meta-analysis using data from European and East Asian populations to identify ten new loci for serum urate levels. Genome-wide colocalization with cis-expression quantitative trait loci (eQTL) identified a further five new loci. By cis- and trans-eQTL colocalization analysis we identified 24 and 20 genes respectively where the causal eQTL variant has a high likelihood that it is shared with the serum urate-associated locus. One new locus identified was SLC22A9 that encodes organic anion transporter 7 (OAT7). We demonstrate that OAT7 is a very weak urate-butyrate exchanger. Newly implicated genes identified in the eQTL analysis include those encoding proteins that make up the dystrophin complex, a scaffold for signaling proteins and transporters at the cell membrane; MLXIP that, with the previously identified MLXIPL, is a transcription factor that may regulate serum urate via the pentose-phosphate pathway; and MRPS7 and IDH2 that encode proteins necessary for mitochondrial function. Trans-ancestral functional fine-mapping identified six loci (RREB1, INHBC, HLF, UBE2Q2, SFMBT1, HNF4G) with colocalized eQTL that contained putative causal SNPs (posterior probability of causality > 0.8). This systematic analysis of serum urate GWAS loci has identified candidate causal genes at 19 loci and a network of previously unidentified genes likely involved in control of serum urate levels, further illuminating the molecular mechanisms of urate control.Author SummaryHigh serum urate is a prerequisite for gout and a risk factor for metabolic disease. Previous GWAS have identified numerous loci that are associated with serum urate control, however, only a small handful of these loci have known molecular consequences. The majority of loci are within the non-coding regions of the genome and therefore it is difficult to ascertain how these variants might influence serum urate levels without tangible links to gene expression and / or protein function. We have applied a novel bioinformatic pipeline where we combined population-specific GWAS data with gene expression and genome connectivity information to identify putative causal genes for serum urate associated loci. Overall, we identified 15 novel serum urate loci and show that these loci along with previously identified loci are linked to the expression of 44 genes. We show that some of the variants within these loci have strong predicted regulatory function which can be further tested in functional analyses. This study expands on previous GWAS by identifying further loci implicated in serum urate control and new causal mechanisms supported by gene expression changes.


2016 ◽  
Author(s):  
Xiaoyu Song ◽  
Gen Li ◽  
Iuliana Ionita-Laza ◽  
Ying Wei

AbstractOver the past decade, there has been a remarkable improvement in our understanding of the role of genetic variation in complex human diseases, especially via genome-wide association studies. However, the underlying molecular mechanisms are still poorly characterized, impending the development of therapeutic interventions. Identifying genetic variants that influence the expression level of a gene, i.e. expression quantitative trait loci (eQTLs), can help us understand how genetic variants influence traits at the molecular level. While most eQTL studies focus on identifying mean effects on gene expression using linear regression, evidence suggests that genetic variation can impact the entire distribution of the expression level. Indeed, several studies have already investigated higher order associations with a special focus on detecting heteroskedasticity. In this paper, we develop a Quantile Rank-score Based Test (QRBT) to identify eQTLs that are associated with the conditional quantile functions of gene expression. We have applied the proposed QRBT to the Genotype-Tissue Expression project, an international tissue bank for studying the relationship between genetic variation and gene expression in human tissues, and found that the proposed QRBT complements the existing methods, and identifies new eQTLs with heterogeneous effects genome-wideacross different quantile levels. Notably, we show that the eQTLs identified by QRBT but missed by linear regression are more likely to be tissue specific, and also associated with greater enrichment in genome-wide significant SNPs from the GWAS catalog. An R package implementing QRBT is available on our website.


Sign in / Sign up

Export Citation Format

Share Document