scholarly journals Untargeted metabolome- and transcriptome-wide association study identifies causal genes modulating metabolite concentrations in urine

2020 ◽  
Author(s):  
Reyhan Sönmez Flitman ◽  
Bita Khalili ◽  
Zoltan Kutalik ◽  
Rico Rueedi ◽  
Sven Bergmann

SummaryIn this study we investigate the results of a metabolome- and transcriptome-wide association study to identify genes influencing the human metabolome. We used RNAseq data from lymphoblastoid cell lines (LCLs) derived from 555 Caucasian individuals to characterize their transcriptome. As for the metabolome we took an untargeted approach using binned features from 1H nuclear magnetic resonance spectroscopy (NMR) of urine samples from the same subjects allowing for data-driven discovery of associated compounds (rather than working with a limited set of quantified metabolites).Using pairwise linear regression we identified 21 study-wide significant associations between metabolome features and gene expression levels. We observed the most significant association between the gene ALMS1 and two adjacent metabolome features at 2.0325 and 2.0375 ppm. By using our previously developed metabomatching methodology, we found N-Acetylaspartate (NAA) as the potential underlying metabolite whose urine concentration is correlated with ALMS1 expression. Indeed, a number of metabolome- and genome-wide association studies (mGWAS) had already suggested the locus of this gene to be involved in regulation of N-acetylated compounds, yet were not able to identify unambiguously the exact metabolite, nor to disambiguate between ALMS1 and NAT8, another gene found in the same locus as the mediator gene. The second highest significant association was observed between HPS1 and two metabolome features at 2.8575 and 2.8725 ppm. Metabomatching of the association profile of HPS1 with all metabolite features pointed at trimethylamine (TMA) as the most likely underlying metabolite. mGWAS had previously implicated a locus containing HPS1 to be associated with TMA concentrations in urine but could not disambiguate this association signal from PYROXD2, a gene in the same locus. We used Mendelian randomization to show for both ALMS1 and HPS1 that their expression is causally linked to the respective metabolite concentrations.Our study provides evidence that the integration of metabolomics with gene expression data can support mQTL analysis, helping to identify the most likely gene involved in the modulation of the metabolite concentration.

2019 ◽  
Author(s):  
James Boocock ◽  
Megan Leask ◽  
Yukinori Okada ◽  
Hirotaka Matsuo ◽  
Yusuke Kawamura ◽  
...  

AbstractSerum urate is the end-product of purine metabolism. Elevated serum urate is causal of gout and a predictor of renal disease, cardiovascular disease and other metabolic conditions. Genome-wide association studies (GWAS) have reported dozens of loci associated with serum urate control, however there has been little progress in understanding the molecular basis of the associated loci. Here we employed trans-ancestral meta-analysis using data from European and East Asian populations to identify ten new loci for serum urate levels. Genome-wide colocalization with cis-expression quantitative trait loci (eQTL) identified a further five new loci. By cis- and trans-eQTL colocalization analysis we identified 24 and 20 genes respectively where the causal eQTL variant has a high likelihood that it is shared with the serum urate-associated locus. One new locus identified was SLC22A9 that encodes organic anion transporter 7 (OAT7). We demonstrate that OAT7 is a very weak urate-butyrate exchanger. Newly implicated genes identified in the eQTL analysis include those encoding proteins that make up the dystrophin complex, a scaffold for signaling proteins and transporters at the cell membrane; MLXIP that, with the previously identified MLXIPL, is a transcription factor that may regulate serum urate via the pentose-phosphate pathway; and MRPS7 and IDH2 that encode proteins necessary for mitochondrial function. Trans-ancestral functional fine-mapping identified six loci (RREB1, INHBC, HLF, UBE2Q2, SFMBT1, HNF4G) with colocalized eQTL that contained putative causal SNPs (posterior probability of causality > 0.8). This systematic analysis of serum urate GWAS loci has identified candidate causal genes at 19 loci and a network of previously unidentified genes likely involved in control of serum urate levels, further illuminating the molecular mechanisms of urate control.Author SummaryHigh serum urate is a prerequisite for gout and a risk factor for metabolic disease. Previous GWAS have identified numerous loci that are associated with serum urate control, however, only a small handful of these loci have known molecular consequences. The majority of loci are within the non-coding regions of the genome and therefore it is difficult to ascertain how these variants might influence serum urate levels without tangible links to gene expression and / or protein function. We have applied a novel bioinformatic pipeline where we combined population-specific GWAS data with gene expression and genome connectivity information to identify putative causal genes for serum urate associated loci. Overall, we identified 15 novel serum urate loci and show that these loci along with previously identified loci are linked to the expression of 44 genes. We show that some of the variants within these loci have strong predicted regulatory function which can be further tested in functional analyses. This study expands on previous GWAS by identifying further loci implicated in serum urate control and new causal mechanisms supported by gene expression changes.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Calwing Liao ◽  
Alexandre D. Laporte ◽  
Dan Spiegelman ◽  
Fulya Akçimen ◽  
Ridha Joober ◽  
...  

Abstract Attention deficit/hyperactivity disorder (ADHD) is a common neurodevelopmental psychiatric disorder. Genome-wide association studies (GWAS) have identified several loci associated with ADHD. However, understanding the biological relevance of these genetic loci has proven to be difficult. Here, we conduct an ADHD transcriptome-wide association study (TWAS) consisting of 19,099 cases and 34,194 controls and identify 9 transcriptome-wide significant hits, of which 6 genes were not implicated in the original GWAS. We demonstrate that two of the previous GWAS hits can be largely explained by expression regulation. Probabilistic causal fine-mapping of TWAS signals prioritizes KAT2B with a posterior probability of 0.467 in the dorsolateral prefrontal cortex and TMEM161B with a posterior probability of 0.838 in the amygdala. Furthermore, pathway enrichment identifies dopaminergic and norepinephrine pathways, which are highly relevant for ADHD. Overall, our findings highlight the power of TWAS to identify and prioritize putatively causal genes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xi Su ◽  
Wenqiang Li ◽  
Luxian Lv ◽  
Xiaoyan Li ◽  
Jinfeng Yang ◽  
...  

Anxiety disorders are common mental disorders that often result in disability. Recently, large-scale genome-wide association studies (GWASs) have identified several novel risk variants and loci for anxiety disorders (or anxiety traits). Nevertheless, how the reported risk variants confer risk of anxiety remains unknown. To identify genes whose cis-regulated expression levels are associated with risk of anxiety traits, we conducted a transcriptome-wide association study (TWAS) by integrating genome-wide associations from a large-scale GWAS (N = 175,163) (which evaluated anxiety traits based on Generalized Anxiety Disorder 2-item scale (GAD-2) score) and brain expression quantitative trait loci (eQTL) data (from the PsychENCODE and GTEx). We identified 19 and 17 transcriptome-wide significant (TWS) genes in the PsychENCODE and GTEx, respectively. Intriguingly, 10 genes showed significant associations with anxiety in both datasets, strongly suggesting that genetic risk variants may confer risk of anxiety traits by regulating the expression of these genes. Top TWS genes included RNF123, KANSL1-AS1, GLYCTK, CRHR1, DND1P1, MAPT and ARHGAP27. Of note, 25 TWS genes were not implicated in the original GWAS. Our TWAS identified 26 risk genes whose cis-regulated expression were significantly associated with anxiety, providing important insights into the genetic component of gene expression in anxiety disorders/traits and new clues for future drug development.


2020 ◽  
Vol 112 (10) ◽  
pp. 1003-1012 ◽  
Author(s):  
Jun Zhong ◽  
Ashley Jermusyk ◽  
Lang Wu ◽  
Jason W Hoskins ◽  
Irene Collins ◽  
...  

Abstract Background Although 20 pancreatic cancer susceptibility loci have been identified through genome-wide association studies in individuals of European ancestry, much of its heritability remains unexplained and the genes responsible largely unknown. Methods To discover novel pancreatic cancer risk loci and possible causal genes, we performed a pancreatic cancer transcriptome-wide association study in Europeans using three approaches: FUSION, MetaXcan, and Summary-MulTiXcan. We integrated genome-wide association studies summary statistics from 9040 pancreatic cancer cases and 12 496 controls, with gene expression prediction models built using transcriptome data from histologically normal pancreatic tissue samples (NCI Laboratory of Translational Genomics [n = 95] and Genotype-Tissue Expression v7 [n = 174] datasets) and data from 48 different tissues (Genotype-Tissue Expression v7, n = 74–421 samples). Results We identified 25 genes whose genetically predicted expression was statistically significantly associated with pancreatic cancer risk (false discovery rate < .05), including 14 candidate genes at 11 novel loci (1p36.12: CELA3B; 9q31.1: SMC2, SMC2-AS1; 10q23.31: RP11-80H5.9; 12q13.13: SMUG1; 14q32.33: BTBD6; 15q23: HEXA; 15q26.1: RCCD1; 17q12: PNMT, CDK12, PGAP3; 17q22: SUPT4H1; 18q11.22: RP11-888D10.3; and 19p13.11: PGPEP1) and 11 at six known risk loci (5p15.33: TERT, CLPTM1L, ZDHHC11B; 7p14.1: INHBA; 9q34.2: ABO; 13q12.2: PDX1; 13q22.1: KLF5; and 16q23.1: WDR59, CFDP1, BCAR1, TMEM170A). The association for 12 of these genes (CELA3B, SMC2, and PNMT at novel risk loci and TERT, CLPTM1L, INHBA, ABO, PDX1, KLF5, WDR59, CFDP1, and BCAR1 at known loci) remained statistically significant after Bonferroni correction. Conclusions By integrating gene expression and genotype data, we identified novel pancreatic cancer risk loci and candidate functional genes that warrant further investigation.


Author(s):  
Shi Yao ◽  
Hao Wu ◽  
Tong-Tong Liu ◽  
Jia-Hao Wang ◽  
Jing-Miao Ding ◽  
...  

Abstract Since the bipolar disorder (BD) signals identified by genome-wide association study (GWAS) often reside in the non-coding regions, understanding the biological relevance of these genetic loci has proven to be complicated. Transcriptome-wide association studies (TWAS) providing a powerful approach to identify novel disease risk genes and uncover possible causal genes at loci identified previously by GWAS. However, these methods did not consider the importance of epigenetic regulation in gene expression. Here, we developed a novel epigenetic element-based transcriptome-wide association study (ETWAS) that tested the effects of genetic variants on gene expression levels with the epigenetic features as prior and further mediated the association between predicted expression and BD. We conducted an ETWAS consisting of 20 352 cases and 31 358 controls and identified 44 transcriptome-wide significant hits. We found 14 conditionally independent genes, and 10 genes that did not previously implicate with BD were regarded as novel candidate genes, such as ASB16 in the cerebellar hemisphere (P = 9.29 × 10–8). We demonstrated that several genome-wide significant signals from the BD GWAS driven by genetically regulated expression, and NEK4 explained 90.1% of the GWAS signal. Additionally, ETWAS identified genes could explain heritability beyond that explained by GWAS-associated SNPs (P = 5.60 × 10–66). By querying the SNPs in the final models of identified genes in phenome databases, we identified several phenotypes previously associated with BD, such as schizophrenia and depression. In conclusion, ETWAS is a powerful method, and we identified several novel candidate genes associated with BD.


2020 ◽  
Author(s):  
Shi Yao ◽  
Jing-Miao Ding ◽  
Hao Wu ◽  
Ruo-Han Hao ◽  
Yu Rong ◽  
...  

AbstractSince the bipolar disorder (BD) signals identified by genome-wide association study (GWAS) often reside in the non-coding regions, understanding the biological relevance of these genetic loci has proven to be complicated. Transcriptome-wide association studies (TWAS) providing a powerful approach to identify novel disease risk genes and uncover possible causal genes at loci identified previously by GWAS. However, these methods did not consider the importance of epigenetic regulation in gene expression. Here, we developed a novel epigenetic element-based transcriptome-wide association study (ETWAS) that tests the effects of genetic variants on gene expression levels with the epigenetic features as prior and further mediates the association between predicted expression and BD. We conducted an ETWAS consisting of 20,352 cases and 31,358 controls and identified 44 transcriptome-wide significant hits. We found 14 conditionally independent genes, and 11 did not previously implicate with BD, which is regarded as novel candidate genes, such as ASB16 in the cerebellar hemisphere (P = 9.29×10−8). We demonstrated that several genome-wide significant signals from the BD GWAS driven by genetically regulated expression, and conditioning of NEK4 explaining 90.1% of the GWAS signal. Additionally, ETWAS identified genes could explain heritability beyond that explained by GWAS-associated SNPs (P = 0.019). By querying the SNPs in the final model of identified genes in phenome databases, we identified several phenotypes previously associated with BD, such as schizophrenia and depression. In conclusion, ETWAS is a powerful method, and we identified several novel candidate genes associated with BD.


2018 ◽  
Author(s):  
Nicholas Mancuso ◽  
Simon Gayther ◽  
Alexander Gusev ◽  
Wei Zheng ◽  
Kathryn L. Penney ◽  
...  

AbstractAlthough genome-wide association studies (GWAS) for prostate cancer (PrCa) have identified more than 100 risk regions, most of the risk genes at these regions remain largely unknown. Here, we integrate the largest PrCa GWAS (N=142,392) with gene expression measured in 45 tissues (N=4,458), including normal and tumor prostate, to perform a multi-tissue transcriptomewide association study (TWAS) for PrCa. We identify 235 genes at 87 independent 1Mb regions associated with PrCa risk, 9 of which are regions with no genome-wide significant SNP within 2Mb. 24 genes are significant in TWAS only for alternative splicing models in prostate tumor thus supporting the hypothesis of splicing driving risk for continued oncogenesis. Finally, we use a Bayesian probabilistic approach to estimate credible sets of genes containing the causal gene at pre-defined level; this reduced the list of 235 associations to 120 genes in the 90% credible set. Overall, our findings highlight the power of integrating expression with PrCa GWAS to identify novel risk loci and prioritize putative causal genes at known risk loci.


2020 ◽  
Vol 36 (9) ◽  
pp. 2936-2937 ◽  
Author(s):  
Gareth Peat ◽  
William Jones ◽  
Michael Nuhn ◽  
José Carlos Marugán ◽  
William Newell ◽  
...  

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.


2021 ◽  
Author(s):  
Robin N Beaumont ◽  
Isabelle K Mayne ◽  
Rachel M Freathy ◽  
Caroline F Wright

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.


Sign in / Sign up

Export Citation Format

Share Document