scholarly journals Reversing Transcriptome-Wide Association Studies to improve expression Quantitative Trait Loci associations

2020 ◽  
Author(s):  
Jie Yuan ◽  
Ben Lai ◽  
Itsik Pe’er

1AbstractTranscriptome-Wide Association Studies discover SNP effects mediated by gene expression through a two-stage process: a typically small reference panel is used to infer SNP-expression effects, and then these are applied to discover associations between imputed expression and phenotypes. We investigate whether the accuracy of SNP-expression and expression-phenotype associations can be increased by performing inference on both the reference panel and independent GWAS cohorts simultaneously. We develop EMBER (Estimation of Mediated Binary Effects in Regression) to re-estimate these effects using a liability threshold model with an adjustment to variance components accounting for imputed expression from GWAS data. In simulated data with only gene-mediated effects, EMBER more than doubles the performance of SNP-expression linear regression, increasing mean r2 from 0.3 to 0.65 with a gene-mediated variance explained of 0.01. EMBER also improves estimation accuracy when the fraction of cis-SNP variance mediated by genes is as low as 30%. We apply EMBER to genotype and gene expression data in schizophrenia by combining 512 samples from the CommonMind Consortium and 56,081 samples from the Psychiatric Genomic Consortium. We evaluate performance of EMBER in 36 genes suggested by TWAS by concordance of inferred effects with effects reported independently for frontal cortex expression. Applying the EMBER framework to a baseline linear regression model increases performance in 26 out of 36 genes (sign test p-value .0020) with an increase in mean r2 from 0.200 to 0.235.

2019 ◽  
Author(s):  
Margaux L.A. Hujoel ◽  
Steven Gazal ◽  
Po-Ru Loh ◽  
Nick Patterson ◽  
Alkes L. Price

AbstractFamily history of disease can provide valuable information about an individual’s genetic liability for disease in case-control association studies, but it is currently unclear how to best combine case-control status and family history of disease. We developed a new association method based on posterior mean genetic liabilities under a liability threshold model, conditional on both case-control status and family history (LT-FH); association statistics are computed via linear regression of genotypes and posterior mean genetic liabilities, equivalent to a score test. We applied LT-FH to 12 diseases from the UK Biobank (average N=350K). We compared LT-FH to genome-wide association without using family history (GWAS) and a previous proxy-based method for incorporating family history (GWAX). LT-FH was +63% (s.e. 6%) more powerful than GWAS and +36% (s.e. 4%) more powerful than the trait-specific maximum of GWAS and GWAX, based on the number of independent genome-wide significant loci detected across all diseases (e.g. 690 independent loci for LT-FH vs. 423 for GWAS); the second best method was GWAX for lower-prevalence diseases and GWAS for higher-prevalence diseases, consistent with simulations. We also confirmed that LT-FH was well-calibrated (assessed via stratified LD score regression attenuation ratio), consistent with simulations. When using BOLT-LMM (instead of linear regression) to compute association statistics for all three methods (increasing the power of each method), LT-FH was +67% (s.e. 6%) more powerful than GWAS and +39% (s.e. 4%) more powerful than the trait-specific maximum of GWAS and GWAX. In summary, LT-FH greatly increases association power in case-control association studies when family history of disease is available.


2016 ◽  
Author(s):  
Xiaoyu Song ◽  
Gen Li ◽  
Iuliana Ionita-Laza ◽  
Ying Wei

AbstractOver the past decade, there has been a remarkable improvement in our understanding of the role of genetic variation in complex human diseases, especially via genome-wide association studies. However, the underlying molecular mechanisms are still poorly characterized, impending the development of therapeutic interventions. Identifying genetic variants that influence the expression level of a gene, i.e. expression quantitative trait loci (eQTLs), can help us understand how genetic variants influence traits at the molecular level. While most eQTL studies focus on identifying mean effects on gene expression using linear regression, evidence suggests that genetic variation can impact the entire distribution of the expression level. Indeed, several studies have already investigated higher order associations with a special focus on detecting heteroskedasticity. In this paper, we develop a Quantile Rank-score Based Test (QRBT) to identify eQTLs that are associated with the conditional quantile functions of gene expression. We have applied the proposed QRBT to the Genotype-Tissue Expression project, an international tissue bank for studying the relationship between genetic variation and gene expression in human tissues, and found that the proposed QRBT complements the existing methods, and identifies new eQTLs with heterogeneous effects genome-wideacross different quantile levels. Notably, we show that the eQTLs identified by QRBT but missed by linear regression are more likely to be tissue specific, and also associated with greater enrichment in genome-wide significant SNPs from the GWAS catalog. An R package implementing QRBT is available on our website.


2019 ◽  
Author(s):  
Ronald Yurko ◽  
Max G’Sell ◽  
Kathryn Roeder ◽  
Bernie Devlin

AbstractTo correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive p-value thresholding (Lei & Fithian 2018, AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association p-values play the role of the primary data for AdaPT; SNPs are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically-correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene-gene coexpression, captured by subnetwork (module) membership. In all 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefontal cortex (Werling et al. 2019). We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.


2021 ◽  
Author(s):  
Claudia Solis-Lemus ◽  
Aaron M. Holleman ◽  
Andrei Todor ◽  
Bekh Bradley ◽  
Kerry J. Ressler ◽  
...  

Genomewide association studies increasingly employ multivariate tests of multiple correlated phenotypes to exploit likely pleiotropy to improve power. Typical multivariate methods produce a global p-value of association between a variant (or set of variants) and multiple phenotypes. When the global test is significant, subsequent interest then focuses on dissecting the signal and, in particular, delineating the set of phenotypes where the genetic variant(s) have a direct effect from the remaining phenotypes where the genetic variant(s) possess either indirect or no effect. While existing techniques like mediation models can be utilized for this purpose, they generally cannot handle high-dimensional phenotypic and genotypic data. To assist in filling this important gap, we propose a modification of a kernel distance-covariance framework for gene mapping of multiple variants with multiple phenotypes to test instead whether the association between the variants and a group of phenotypes is driven through a direct association with just a subset of the phenotypes. We use simulated data to show that our new method controls for type I error and is powerful to detect a variety of models demonstrating different patterns of direct and indirect effects. We further illustrate our method using GWAS data from the Grady Trauma Project and show that an existing signal between genetic variants in the ZHX2 gene and 21 items within the Beck Depression Inventory appears to be due to a direct effect of these variants on only 3 of these items. Our approach scales to genomewide analysis, and is applicable to high-dimensional correlated phenotypes.


2020 ◽  
Vol 14 (Supplement_1) ◽  
pp. S103-S104
Author(s):  
B Steere ◽  
J Schmitz ◽  
N Powell ◽  
R Higgs ◽  
K Gottlieb ◽  
...  

Abstract Background Mirikizumab (miri), a p19-directed IL-23 antibody, demonstrated efficacy and was well-tolerated in a phase 2 randomised clinical trial in patients with moderate-to-severe UC (NCT02589665). This abstract explores gene expression changes in colonic tissue from study patients and their association with clinical outcomes. Methods Patients were randomised 1:1:1:1 to receive intravenous placebo (PBO, N = 63), miri 50 mg (N = 63) or 200 mg (N = 62) with the possibility of exposure-based dose increases, or fixed miri 600 mg (N = 61) every 4 weeks for 12 weeks. Patient biopsies were collected at baseline (BL) and Week 12, and differential gene expression was measured using an Affymetrix HTA2.0 exon-format microarray workflow. Genes were represented by their largest groups of highly correlated exons. Weeks 0 and 12 data were compared in all treatment groups to produce differential expression values (DEVs). Mean fold changes in DEVs between PBO and each dose group were calculated in a mixed-effect model. A threshold of false discovery rate-adjusted p-value ≤ 0.05 was applied to the significance of the fold change values, and a filter of an absolute value for the fold changes of ≥0.5 log2 units was applied. Results The greatest improvement in clinical outcomes at Week 12 was observed in the 200 mg miri group1; likewise, the greatest PBO-adjusted change from BL in transcripts was observed in this group. Transcripts correlating with key UC disease activity indices at BL, including modified Mayo score (MMS), ulcerative colitis Endoscopic Index of Severity (UCEIS), Geboes score, and Robarts Histopathology Index (RHI), included MMP1, MMP3, S100A8, IL1B, and UGT2A3, with the highest correlations occurring with the histopathologic indices (Figure 1). Miri treatment modulated the expression of transcriptional modules predicted to be enriched in cell profiles identified as key drivers of UC2 (Table 1, columns 1–2) as well as genes determined to be associated with UC by genome-wide association studies (GWAS; Table 1, column 3). Moreover, miri treatment affected transcripts involved in resistance to anti-TNF treatments (Table 1, column 4). A number of the genes in these categories were among those most affected by miri treatment (Table 1, columns 5–6). Conclusion This is the first large-scale gene expression study of diseased tissue from UC patients treated with anti-IL23p19 therapy. It is the first study to show how anti-IL23p19 therapy modulates biological pathways involved in resistance to anti-TNFs. These results are consistent with the demonstrated efficacy of miri in patients in whom TNF antagonists have failed. References


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 5187-5187
Author(s):  
Antonino Greco ◽  
Alessandra Trojani ◽  
Milena Lodola ◽  
Barbara Di Camillo ◽  
Alessandra Tedeschi ◽  
...  

Abstract The 2nd IWWM tried to define reproducible criteria for the diagnosis of IgM-MGUS and Waldenstrom's Macroglobulinemia. IgM-MGUS was defined as asymptomatic condition characterized by serum IgM monoclonal protein (MC) without morphologic evidence of bone marrow (BM) lymphoplasmacytic infiltration. The proposal of the guidelines was to classify as MGUS also patients with equivocal evidence of BM infiltration, such as those presenting clonal B-cells by multiparameter flow cytometry (MFC) in the absence of morphologic evidence of BM infiltration, as well as those with equivocal BM infiltrates not confirmed by immunophenotypic studies. Patients The diagnosis of IgM-MGUS was made in 11 patients (6 males, 5 females) according to the consensus panel criteria. The median age at diagnosis was 73 (range, 60-77). Ten patients had K light chains. The median erythrocyte sedimentation rate was 11. The MC level at diagnosis ranged from 0.1 to 1.2 g/dL (median 0.4). Only one patient had MC value > 1.0 g/dL. The median IgM value was 697 mg/dL (range 116-1790). Five of 11 IgM-MGUS patients showed a small clonal B-cell population (light-chain-isotype-positive B-cells) detected by MFC without histologic evidence of BM infiltration. Therefore, patients were divided in 2 groups: group 1 (n=5) showing a clonal B-cell population, and group 2 (n=6) with polyclonal B-cells at MFC. Methods and results We isolated BM CD19+ cells in the 11 IgM-MGUS patients using Miltenyi Microbeads and performed microarray with Affymetrix-HG-U133 Plus 2.0 array. Gene set enrichment analysis (GSEA) was performed and different sets of genes were defined based on REACTOME pathways, KEGG pathways and GO Biological Process Terms. Interestingly, 17 top-ranking gene sets including differently expressed genes, reached a nominal p-value lower than 0.001; 2 gene sets were upregulated (while 15 gene sets were downregulated in monoclonal vs. polyclonal IgM-MGUS (table 1). No genes resulted significantly differentially expressed between group 1 and group 2 using a classic SAM test for microarrays and correcting for multiple testing with a false discovery rate (FDR) threshold of 5%. Similarly, IgM and MC were not differentially expressed between the two groups, although IgM showed a nominal p-value of 0.09 (t-test). However, when using linear regression to explain each gene expression data as a function of both IgM and MC, UBTF, TRIM5, FLJ35816, RDH10 genes were selected based on a FDR equal to 5%, applied to the F-statistic p-value. In particular, the model fitting UBTF had a p-value of 9.461e-07 and an adjusted R-squared of 0.9786; table 2 displays the coefficients of the model and the related p-values, showing a positive co-regulation of UBTF with MC. Conclusions In conclusion, microarray of IgM-MGUS gives insights into gene expression differences in IgM-MGUS. Notably, UBTF is a transcription factor which plays a crucial role in the transcription of rRNA in ERK-pathway, suggesting a possible role of ERK-pathway in IgM-MGUS. Additional gene expression measurements are ongoing in a larger cohort of IgM-MGUS patients. Table 1. Upregulated gene sets in monoclonal vs. polyclonal IgM-MGUS GENE SET NAME REACTOME_XENOBIOTICS REGULATION_OF_CHROMOSOME_ORGANIZATION_AND_BIOGENESIS Downregulated gene sets in monoclonal vs. polyclonal IgM-MGUS Table 2. Linear regression results of both IgM and MC on UBTF expression GENE SET NAME REACTOME_ROLE_OF_DCC_IN_REGULATING_APOPTOSIS REACTOME_P38MAPK_EVENTS REACTOME_EARLY_PHASE_OF_HIV_LIFE_CYCLE REACTOME_MRNA_DECAY_BY_3_TO_5_EXORIBONUCLEASE KEGG_NICOTINATE_AND_NICOTINAMIDE_METABOLISM CHROMATIN_ASSEMBLY_OR_DISASSEMBLY ESTABLISHMENT_AND_OR_MAINTENANCE_OF_CHROMATIN_ARCHITECTURE PROTEIN_DNA_COMPLEX_ASSEMBLY RESPONSE_TO_DNA_DAMAGE_STIMULUS CHROMATIN_ASSEMBLY CHROMOSOME_ORGANIZATION_AND_BIOGENESIS DOUBLE_STRAND_BREAK_REPAIR CHROMATIN_REMODELING CENTROSOME_ORGANIZATION_AND_BIOGENESIS MICROTUBULE_ORGANIZING_CENTER_ORGANIZATION_AND_BIOGENESIS Table 3 Coefficient p-value Intercept 7.0738801 6.47e-12 IgM -0.0021593 3.78e-07 MC 1.1925055 0.000501 IgM:MC 0.0010577 0.000152 Disclosures No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Justin M. Luningham ◽  
Junyu Chen ◽  
Shizhen Tang ◽  
Philip L. De Jager ◽  
David A. Bennett ◽  
...  

AbstractTranscriptome-wide association studies (TWAS) have been widely used to integrate gene expression and genetic data for studying complex traits. Due to the computational burden, existing TWAS methods do not assess distant trans- expression quantitative trait loci (eQTL) that are known to explain important expression variation for most genes. We propose a Bayesian Genome-wide TWAS (BGW-TWAS) method which leverages both cis- and trans- eQTL information for TWAS. Our BGW-TWAS method is based on Bayesian variable selection regression, which not only accounts for cis- and trans- eQTL of the target gene but also enables efficient computation by using summary statistics from standard eQTL analyses. Our simulation studies illustrated that BGW-TWAS achieved higher power compared to existing TWAS methods that do not assess trans-eQTL information. We further applied BWG-TWAS to individual-level GWAS data (N=∼3.3K), which identified significant associations between the genetically regulated gene expression (GReX) of gene ZC3H12B and Alzheimer’s dementia (AD) (p-value= 5.42 × 10−13), neurofibrillary tangle density (p-value= 1.89 ×10−6 ), and global measure of AD pathology (p-value=9.59 × 10−7). These associations for gene ZC3H12B were completely driven by trans-eQTL. Additionally, the GReX of gene KCTD12 was found to be significantly associated with β-amyloid (p-value= 3.44 ×10 −8) which was driven by both cis- and trans- eQTL. Four of the top driven trans-eQTL of ZC3H12B are located within gene APOC1, a known major risk gene of AD and blood lipids. Additionally, by applying BGW-TWAS with summary-level GWAS data of AD (N=∼54K), we identified 13 significant genes including known GWAS risk genes HLA-DRB1 and APOC1, as well as ZC3H12B.


2019 ◽  
Author(s):  
Haoran Xue ◽  
Wei Pan ◽  

AbstractTranscriptome-wide association study (TWAS) has become popular in integrating a reference eQTL dataset with an independent main GWAS dataset to identify (putatively) causal genes, shedding mechanistic insights to biological pathways from genetic variants to a GWAS trait mediated by gene expression. Statistically TWAS is a (two-sample) 2-stage least squares (2SLS) method in the framework of instrumental variables analysis for causal inference: in Stage 1 it uses the reference eQTL data to impute a gene’s expression for the main GWAS data, then in Stage 2 it tests for association between the imputed gene expression and the GWAS trait; if an association is detected in Stage 2, a (putatively) causal relationship between the gene and the GWAS trait is claimed. If a non-linear model or a generalized linear model (GLM) is fitted in Stage 2 (e.g. for a binary GWAS trait), it is known that using only imputed gene expression, as in standard TWAS, in general does not lead to a consistent (i.e. asymptotically unbiased) estimate for the causal effect; accordingly, a variation of 2SLS, called two-stage residual inclusion (2SRI), has been proposed to yield better estimates (e.g. being consistent under suitable conditions). Our main goal is to investigate whether it is necessary or even better to apply 2SRI, instead of the standard 2SLS. In addition, due to the use of imputed gene expression (i.e. with measurement errors), it is known that in general some correction to the standard error estimate of the causal effect estimate has to be applied, while in the standard TWAS no correction is applied. Is this an issue? We also compare one-sample 2SLS with two-sample 2SLS (i.e. the standard TWAS). We used the ADNI data and simulated data mimicking the ADNI data to address the above questions. At the end, we conclude that, in practice with the large sample sizes and small effect sizes of genetic variants, the standard TWAS performs well and is recommended.


2021 ◽  
Author(s):  
Ping Zeng ◽  
Jing Dai ◽  
Siyi Jin ◽  
Xiang Zhou

Abstract Transcriptome-wide association study (TWAS) is an important integrative method for identifying genes that are causally associated with phenotypes. A key step of TWAS involves the construction of expression prediction models for every gene in turn using its cis-SNPs as predictors. Different TWAS methods rely on different models for gene expression prediction and each such model makes a distinct modeling assumption that is often suitable for a particular genetic architecture underlying expression. However, the genetic architectures underlying gene expression vary across genes throughout the transcriptome. Consequently, different TWAS methods may be beneficial in detecting genes with distinct genetic architectures. Here, we develop a new method, HMAT, which aggregates TWAS association evidence obtained across multiple gene expression prediction models by leveraging the harmonic mean p value combination strategy. Because each expression prediction model is suited to capture a particular genetic architecture, aggregating TWAS associations across prediction models as in HMAT improves accurate expression prediction and enables subsequent powerful TWAS analysis across the transcriptome. A key feature of HMAT is its ability to accommodate the correlations among different TWAS test statistics and produce calibrated p values after aggregation. Through numerical simulations, we illustrated the advantage of HMAT over commonly used TWAS methods as well as ad-hoc p value combination rules such as Fisher’s method. We also applied HMAT to analyze summary statistics of nine common diseases. In the real data applications, HMAT was on average 30.6% more powerful compared to the next best method, detecting many new disease-associated genes that were otherwise not identified by existing TWAS approaches. In conclusion, HMAT represents a flexible and powerful TWAS method that enjoys robust performance across a range of genetic architectures underlying gene expression.


2020 ◽  
Author(s):  
Guiyan Ni ◽  
Jian Zeng ◽  
Joana R Revez ◽  
Ying Wang ◽  
Tian Ge ◽  
...  

Polygenic scores (PGSs), which assess the genetic risk of individuals for a disease, are calculated as a weighted count of risk alleles identified in genome-wide association studies (GWASs). PGS methods differ in terms of which DNA variants are included in the score and the weights assigned to them. PGSs are evaluated in independent target samples of individuals with known disease status. Evaluation of new PGS methods are made using simulated data or single target cohort, however, in real data sets there can be heterogeneity between target sample cohorts, which could reflect a number of real or artefactual factors. The Psychiatric Genomics Consortium working groups for schizophrenia (SCZ) and major depressive disorder (MDD) bring together many independently collected case-control cohorts for GWAS meta-analysis. These resources are used here in repeated application of leave-one-cohort-out GWAS analyses, generating robust conclusions for PGS prediction applied across multiple target (left-out) cohorts. Eight PGS methods (P+T, SBLUP, LDpred-Inf, LDpred-funct, LDpred, PRS-CS, PRS-CS-auto, SBayesR) are compared. We found that SBayesR had the highest prediction evaluation statistics in most comparisons. For SCZ across 30 target cohorts, the SBayesR PGS achieved a mean area under the receiver operator characteristic curve (AUC) of 0.733, and explained 9.9% of variance on the liability scale. For MDD across 26 target cohorts, the AUC and variance explained were 0.601 and 4.0%, respectively. The variance explained by the SBayesR PGS was 46% and 43% higher for SCZ and MDD, respectively, compared to the basic p-value thresholding P+T method.


Sign in / Sign up

Export Citation Format

Share Document