scholarly journals MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies

PLoS Genetics ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. e1009398
Author(s):  
Arjun Bhattacharya ◽  
Yun Li ◽  
Michael I. Love

Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1–2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.

Author(s):  
Arjun Bhattacharya ◽  
Yun Li ◽  
Michael I. Love

ABSTRACTTraditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1-2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.AUTHOR SUMMARYTranscriptome-wide association studies (TWAS) are a powerful strategy to study gene-trait associations by integrating genome-wide association studies (GWAS) with gene expression datasets. TWAS increases study power and interpretability by mapping genetic variants to genes. However, traditional TWAS consider only variants that are close to a gene and thus ignores important variants far away from the gene that may be involved in complex regulatory mechanisms. Here, we present MOSTWAS (Multi-Omic Strategies for TWAS), a suite of tools that extends the TWAS framework to include these distal variants. MOSTWAS leverages multi-omic data of regulatory biomarkers (transcription factors, microRNAs, epigenetics) and borrows from techniques in mediation analysis to prioritize distal variants that are around these regulatory biomarkers. Using simulations and real public data from brain tissue and breast tumors, we show that MOSTWAS improves upon traditional TWAS in both predictive performance and power to detect gene-trait associations. MOSTWAS also aids in identifying possible mechanisms for gene regulation using a novel added-last test that assesses the added information gained from the distal variants beyond the local association. In conclusion, our method aids in detecting important risk genes for traits and disorders and the possible complex interactions underlying genetic regulation within a tissue.


Author(s):  
Ting-Hao Chen ◽  
Chen-Cheng Yang ◽  
Kuei-Hau Luo ◽  
Chia-Yen Dai ◽  
Yao-Chung Chuang ◽  
...  

Aluminum (Al) toxicity is related to renal failure and the failure of other systems. Although there were some genome-wide association studies (GWAS) in Australia and England, there were no GWAS about Han Chinese to our knowledge. Thus, this research focused on using whole genomic genotypes from the Taiwan Biobank for exploring the association between Al concentrations in plasma and renal function. Participants, who underwent questionnaire interviews, biomarkers, and genotyping, were from the Taiwan Biobank database. Then, we measured their plasma Al concentrations with ICP-MS in the laboratory at Kaohsiung Medical University. We used this data to link genome-wide association (GWA) tests while looking for candidate genes and associated plasma Al concentration to renal function. Furthermore, we examined the path relationship between Single Nucleotide Polymorphisms (SNPs), Al concentrations, and estimated glomerular filtration rates (eGFR) through the mediation analysis with 3000 replication bootstraps. Following the principles of GWAS, we focused on three SNPs within the dipeptidyl peptidase-like protein 6 (DPP6) gene in chromosome 7, rs10224371, rs2316242, and rs10268004, respectively. The results of the mediation analysis showed that all of the selected SNPs have indirectly affected eGFR through a mediation of Al concentrations. Our analysis revealed the association between DPP6 SNPs, plasma Al concentrations, and eGFR. However, further longitudinal studies and research on mechanism are in need. Our analysis was still be the first study that explored the association between the DPP6, SNPs, and Al in plasma affecting eGFR.


2021 ◽  
Author(s):  
Milton Pividori ◽  
Sumei Lu ◽  
Binglan Li ◽  
Chun Su ◽  
Matthew E. Johnson ◽  
...  

Understanding how dysregulated transcriptional processes result in tissue-specific pathology requires a mechanistic interpretation of expression regulation across different cell types. It has been shown that this insight is key for the development of new therapies. These mechanisms can be identified with transcriptome-wide association studies (TWAS), which have represented an important step forward to test the mediating role of gene expression in GWAS associations. However, due to pervasive eQTL sharing across tissues, TWAS has not been successful in identifying causal tissues, and other methods generally do not take advantage of the large amounts of RNA-seq data publicly available. Here we introduce a polygenic approach that leverages gene modules (genes with similar co-expression patterns) to project both gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. We observed that diseases were significantly associated with gene modules expressed in relevant cell types, such as hypothyroidism with T cells and thyroid, hypertension and lipids with adipose tissue, and coronary artery disease with cardiomyocytes. Our approach was more accurate in predicting known drug-disease pairs and revealed stable trait clusters, including a complex branch involving lipids with cardiovascular, autoimmune, and neuropsychiatric disorders. Furthermore, using a CRISPR-screen, we show that genes involved in lipid regulation exhibit more consistent trait associations through gene modules than individual genes. Our results suggest that a gene module perspective can contextualize genetic associations and prioritize alternative treatment targets when GWAS hits are not druggable.


2020 ◽  
Vol 117 (26) ◽  
pp. 15028-15035 ◽  
Author(s):  
Ronald Yurko ◽  
Max G’Sell ◽  
Kathryn Roeder ◽  
Bernie Devlin

To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptiveP-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS associationPvalues play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.


2019 ◽  
Author(s):  
Chan Wang ◽  
Jiyuan Hu ◽  
Martin J Blaser ◽  
Huilin Li

Abstract Motivation Recent microbiome association studies have revealed important associations between microbiome and disease/health status. Such findings encourage scientists to dive deeper to uncover the causal role of microbiome in the underlying biological mechanism, and have led to applying statistical models to quantify causal microbiome effects and to identify the specific microbial agents. However, there are no existing causal mediation methods specifically designed to handle high dimensional and compositional microbiome data. Results We propose a rigorous Sparse Microbial Causal Mediation Model (SparseMCMM) specifically designed for the high dimensional and compositional microbiome data in a typical three-factor (treatment, microbiome and outcome) causal study design. In particular, linear log-contrast regression model and Dirichlet regression model are proposed to estimate the causal direct effect of treatment and the causal mediation effects of microbiome at both the community and individual taxon levels. Regularization techniques are used to perform the variable selection in the proposed model framework to identify signature causal microbes. Two hypothesis tests on the overall mediation effect are proposed and their statistical significance is estimated by permutation procedures. Extensive simulated scenarios show that SparseMCMM has excellent performance in estimation and hypothesis testing. Finally, we showcase the utility of the proposed SparseMCMM method in a study which the murine microbiome has been manipulated by providing a clear and sensible causal path among antibiotic treatment, microbiome composition and mouse weight. Availability and implementation https://sites.google.com/site/huilinli09/software and https://github.com/chanw0/SparseMCMM. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Wen Zhang ◽  
Georgios Voloudakis ◽  
Veera M. Rajagopal ◽  
Ben Reahead ◽  
Joel T. Dudley ◽  
...  

AbstractTranscriptome-wide association studies integrate gene expression data with common risk variation to identify gene-trait associations. By incorporating epigenome data to estimate the functional importance of genetic variation on gene expression, we improve the accuracy of transcriptome prediction and the power to detect significant expression-trait associations. Joint analysis of 14 large-scale transcriptome datasets and 58 traits identify 13,724 significant expression-trait associations that converge to biological processes and relevant phenotypes in human and mouse phenotype databases. We perform drug repurposing analysis and identify known and novel compounds that mimic or reverse trait-specific changes. We identify genes that exhibit agonistic pleiotropy for genetically correlated traits that converge on shared biological pathways and elucidate distinct processes in disease etiopathogenesis. Overall, this comprehensive analysis provides insight into the specificity and convergence of gene expression on susceptibility to complex traits.


Author(s):  
Huang Yaoxing ◽  
Yu Danchun ◽  
Sun Xiaojuan ◽  
Jiang Shuman ◽  
Yan Qingqing ◽  
...  

Gastric cancer (GC) is one of the most common causes of cancer-related deaths in the world. This cancer has been regarded as a biological and genetically heterogeneous disease with a poorly understood carcinogenesis at the molecular level. Thousands of biomarkers and susceptible loci have been explored via experimental and computational methods, but their effects on disease outcome are still unknown. Genome-wide association studies (GWAS) have identified multiple susceptible loci for GC, but due to the linkage disequilibrium (LD), single-nucleotide polymorphisms (SNPs) may fall within the non-coding region and exert their biological function by modulating the gene expression level. In this study, we collected 1,091 cases and 410,350 controls from the GWAS catalog database. Integrating with gene expression level data obtained from stomach tissue, we conducted a machine learning-based method to predict GC-susceptible genes. As a result, we identified 787 novel susceptible genes related to GC, which will provide new insight into the genetic and biological basis for the mechanism and pathology of GC development.


Author(s):  
Yingjie Guo ◽  
Chenxi Wu ◽  
Zhian Yuan ◽  
Yansu Wang ◽  
Zhen Liang ◽  
...  

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 461-461
Author(s):  
Alexander S Zlobin ◽  
Natalia A Volkova ◽  
Pavel M Borodin ◽  
Tatiana I Aksenovich ◽  
Yakov A Tsepilov

Abstract Identification of quantitative trait loci (QTL) and candidate genes that affect growth intensity is a prerequisite for the marker-assisted selection for economically important traits. The number of QTL studies on sheep is relatively small in comparison to those on cattle and pigs. Current QTL Sheep database (Sheep QTLdb) contains information on 1658 QTL for 225 different traits. A few genes and markers associated with growth, carcass and meat productivity traits have been reported. The information about QTLs from the Sheep QTLdb cannot be directly used in marker assisted selection due to the lack of essential information such as effective and reference alleles, the effect direction, etc., and requires manual curation and validation. In this study we performed comprehensive search for QTLs focusing on single nucleotide polymorphisms (SNPs) associated with growth and meat traits in sheep. Using 15 different keywords combinations we found 152 papers (including duplicates). Next, all the found papers were manually curated by two researches and filtered by the relevance. We selected the most relevant papers that led to the final list of 17 publications. From these 17 papers we extracted information about associated genes and QTLs (SNPs). We extracted information about associated SNPs with all available information (effect sizes, effective and reference alleles etc). In total we found information about 156 SNP-trait associations (123 unique SNPs). Also we made the list of 164 unique genes associated with growth, carcass and meat productivity traits. As the result we made the database which contains information about 156 SNP-trait associations (123 unique SNPs) and list of 165 associated genes. The updated information is freely available at https://github.com/Defrag1236/Ovines_2018. This information can be useful for further association studies and preliminary estimation of genetic variability for economically important traits in different breeds.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Laura Florez-Sampedro ◽  
Corry-Anke Brandsma ◽  
Maaike de Vries ◽  
Wim Timens ◽  
Rene Bults ◽  
...  

Abstract Macrophage migration inhibitory factor (MIF) is a cytokine found to be associated with chronic obstructive pulmonary disease (COPD). However, there is no consensus on how MIF levels differ in COPD compared to control conditions and there are no reports on MIF expression in lung tissue. Here we studied gene expression of members of the MIF family MIF, D-Dopachrome Tautomerase (DDT) and DDT-like (DDTL) in a lung tissue dataset with 1087 subjects and identified single nucleotide polymorphisms (SNPs) regulating their gene expression. We found higher MIF and DDT expression in COPD patients compared to non-COPD subjects and found 71 SNPs significantly influencing gene expression of MIF and DDTL. Furthermore, the platform used to measure MIF (microarray or RNAseq) was found to influence the splice variants detected and subsequently the direction of the SNP effects on MIF expression. Among the SNPs found to regulate MIF expression, the major LD block identified was linked to rs5844572, a SNP previously found to be associated with lower diffusion capacity in COPD. This suggests that MIF may be contributing to the pathogenesis of COPD, as SNPs that influence MIF expression are also associated with symptoms of COPD. Our study shows that MIF levels are affected not only by disease but also by genetic diversity (i.e. SNPs). Since none of our significant eSNPs for MIF or DDTL have been described in GWAS for COPD or lung function, MIF expression in COPD patients is more likely a consequence of disease-related factors rather than a cause of the disease.


Sign in / Sign up

Export Citation Format

Share Document