Using regulatory genomics data to interpret the function of disease variants and prioritise genes from expression studies

The identification of therapeutic targets is a critical step in the research and developement of new drugs, with several drug discovery programmes failing because of a weak linkage between target and disease. Genome-wide association studies and large-scale gene expression experiments are providing insights into the biology of several common and complex diseases, but the complexity of transcriptional regulation mechanisms often limit our understanding of how genetic variation can influence changes in gene expression. Several initiatives in the field of regulatory genomics are aiming to close this gap by systematically identifying and cataloguing regulatory elements such as promoters and enhacers across different tissues and cell types. In this Bioconductor workflow, we will explore how different types of regulatory genomic data can be used for the functional interpretation of disease-associated variants and for the prioritisation of gene lists from gene expression experiments.

Download Full-text

Using regulatory genomics data to interpret the function of disease variants and prioritise genes from expression studies

F1000Research ◽

10.12688/f1000research.13577.2 ◽

2018 ◽

Vol 7 ◽

pp. 121

Author(s):

Enrico Ferrero

Keyword(s):

Gene Expression ◽

Large Scale ◽

Association Studies ◽

Cell Types ◽

Regulatory Elements ◽

New Drugs ◽

Genome Wide Association Studies ◽

Functional Interpretation ◽

Expression Studies ◽

Regulatory Genomics

The identification of therapeutic targets is a critical step in the research and developement of new drugs, with several drug discovery programmes failing because of a weak linkage between target and disease. Genome-wide association studies and large-scale gene expression experiments are providing insights into the biology of several common diseases, but the complexity of transcriptional regulation mechanisms often limits our understanding of how genetic variation can influence changes in gene expression. Several initiatives in the field of regulatory genomics are aiming to close this gap by systematically identifying and cataloguing regulatory elements such as promoters and enhacers across different tissues and cell types. In this Bioconductor workflow, we will explore how different types of regulatory genomic data can be used for the functional interpretation of disease-associated variants and for the prioritisation of gene lists from gene expression experiments.

Download Full-text

EpiRegio: analysis and retrieval of regulatory elements linked to genes

Nucleic Acids Research ◽

10.1093/nar/gkaa382 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W193-W199 ◽

Cited By ~ 4

Author(s):

Nina Baumgarten ◽

Dennis Hecker ◽

Sivarajan Karunanithi ◽

Florian Schmidt ◽

Markus List ◽

...

Keyword(s):

Gene Expression ◽

Target Genes ◽

Association Studies ◽

Web Server ◽

Cell Types ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Genome Wide Association Studies ◽

Coding Regions ◽

Genomic Regions

Abstract A current challenge in genomics is to interpret non-coding regions and their role in transcriptional regulation of possibly distant target genes. Genome-wide association studies show that a large part of genomic variants are found in those non-coding regions, but their mechanisms of gene regulation are often unknown. An additional challenge is to reliably identify the target genes of the regulatory regions, which is an essential step in understanding their impact on gene expression. Here we present the EpiRegio web server, a resource of regulatory elements (REMs). REMs are genomic regions that exhibit variations in their chromatin accessibility profile associated with changes in expression of their target genes. EpiRegio incorporates both epigenomic and gene expression data for various human primary cell types and tissues, providing an integrated view of REMs in the genome. Our web server allows the analysis of genes and their associated REMs, including the REM’s activity and its estimated cell type-specific contribution to its target gene’s expression. Further, it is possible to explore genomic regions for their regulatory potential, investigate overlapping REMs and by that the dissection of regions of large epigenomic complexity. EpiRegio allows programmatic access through a REST API and is freely available at https://epiregio.de/.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

Human Molecular Genetics ◽

10.1093/hmg/ddaa098 ◽

2020 ◽

Vol 29 (11) ◽

pp. 1922-1932

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J Hoffmann ◽

Georg B Ehret ◽

Dan Arking ◽

...

Keyword(s):

Blood Pressure ◽

Association Studies ◽

Specific Effect ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.

Download Full-text

The New Frontier of Functional Genomics: From Chromatin Architecture and Noncoding RNAs to Therapeutic Targets

SLAS DISCOVERY Advancing Life Sciences ◽

10.1177/2472555220926158 ◽

2020 ◽

Vol 25 (6) ◽

pp. 568-580

Author(s):

Natali Papanicolaou ◽

Alessandro Bonetti

Keyword(s):

Gene Expression ◽

Association Studies ◽

Noncoding Rnas ◽

Regulatory Elements ◽

Long Noncoding Rnas ◽

Human Diseases ◽

Genome Wide Association Studies ◽

Complex Disorders ◽

Chromatin Architecture ◽

Common Diseases

Common diseases are complex, multifactorial disorders whose pathogenesis is influenced by the interplay of genetic predisposition and environmental factors. Genome-wide association studies have interrogated genetic polymorphisms across genomes of individuals to test associations between genotype and susceptibility to specific disorders, providing insights into the genetic architecture of several complex disorders. However, genetic variants associated with the susceptibility to common diseases are often located in noncoding regions of the genome, such as tissue-specific enhancers or long noncoding RNAs, suggesting that regulatory elements might play a relevant role in human diseases. Enhancers are cis-regulatory genomic sequences that act in concert with promoters to regulate gene expression in a precise spatiotemporal manner. They can be located at a considerable distance from their cognate target promoters, increasing the difficulty of their identification. Genomes are organized in domains of chromatin folding, namely topologically associating domains (TADs). Identification of enhancer–promoter interactions within TADs has revealed principles of cell-type specificity across several organisms and tissues. The vast majority of mammalian genomes are pervasively transcribed, accounting for a previously unappreciated complexity of the noncoding RNA fraction. Particularly, long noncoding RNAs have emerged as key players for the establishment of chromatin architecture and regulation of gene expression. In this perspective, we describe the new advances in the fields of transcriptomics and genome organization, focusing on the role of noncoding genomic variants in the predisposition of common diseases. Finally, we propose a new framework for the identification of the next generation of pharmacological targets for common human diseases.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

10.1101/820522 ◽

2019 ◽

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J. Hoffmann ◽

Georg B. Ehret ◽

Dan Arking ◽

...

Keyword(s):

Gene Expression ◽

Blood Pressure ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific ◽

Different Tissues

AbstractHundreds of loci have been associated with blood pressure traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ∼100,000 Genetic Epidemiology Research on Aging (GERA) study participants. In the present study, we subsequently focused on determining putative regulatory regions for these and other tissues of relevance to blood pressure, to both fine-map these loci by pinpointing genes and variants of functional interest within them, and to identify any novel genes.We constructed maps of putative cis-regulatory elements using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Sequence variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. In order to identify genes of interest, we aggregate these variants in these putative cis-regulatory elements within 50Kb of the start or end of genes considered as “expressed” in these tissues or cell types using publicly available gene expression data, and use the deltaSVM scores as weights in the well-known group-wise sequence kernel association test (SKAT). We test for association with both blood pressure traits as well as expression within these tissues or cell types of interest, and identify several genes, including MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B, and PPCDC. Although our study centers on blood pressure traits, we additionally examined two known genes, SCN5A and NOS1AP involved in the cardiac trait QT interval, in the Atherosclerosis Risk in Communities Study (ARIC), as a positive control, and observed an expected heart-specific effect. Thus, our method may be used to identify variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.Author SummarySequence change in genes (“variants”) are linked to the presence and severity of different traits or diseases. However, as genes may be expressed in different tissues and at different times and degrees, using this information is expected to more accurately identify genes of interest. Variants within the genes are essential, but also in the sequences (“regulatory elements”) that control the genes’ expression in different tissues or cell types. In this study, we aim to use this information about expression and variants potentially involved in gene expression regulation to better pinpoint genes and variants in regulatory elements of interest for blood pressure regulation. We do so by taking advantage of such data that are publicly available, and use methods to combine information about variants in aggregate within a gene’s putative regulatory elements in tissues thought to be relevant for blood pressure, and identify several genes, meant to enable experimental follow-up.

Download Full-text

Leveraging brain cortex-derived molecular data to elucidate epigenetic and transcriptomic drivers of neurological function and disease

10.1101/429134 ◽

2018 ◽

Author(s):

Charlie Hatcher ◽

Caroline L. Relton ◽

Tom R. Gaunt ◽

Tom G. Richardson

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Histone Acetylation ◽

Genetic Variants ◽

Large Scale ◽

Genetic Variant ◽

Association Studies ◽

Genome Wide Association Studies ◽

Epigenetic Mechanisms ◽

Trait Variation

AbstractIntegrative approaches which harness large-scale molecular datasets can help develop mechanistic insight into findings from genome-wide association studies (GWAS). We have performed extensive analyses to uncover transcriptional and epigenetic processes which may play a role in neurological trait variation.This was undertaken by applying Bayesian multiple-trait colocalization systematically across the genome to identify genetic variants responsible for influencing intermediate molecular phenotypes as well as neurological traits. In this analysis we leveraged high dimensional quantitative trait loci data derived from prefrontal cortex tissue (concerning gene expression, DNA methylation and histone acetylation) and GWAS findings for 5 neurological traits (Neuroticism, Schizophrenia, Educational Attainment, Insomnia and Alzheimer’s disease).There was evidence of colocalization for 118 associations suggesting that the same underlying genetic variant influenced both nearby gene expression as well as neurological trait variation. Of these, 73 associations provided evidence that the genetic variant also influenced proximal DNA methylation and/or histone acetylation. These findings support previous evidence at loci where epigenetic mechanisms may putatively mediate effects of genetic variants on traits, such as KLC1 and schizophrenia. We also uncovered evidence implicating novel loci in neurological disease susceptibility, including genes expressed predominantly in brain tissue such as MDGA1, KIRREL3 and SLC12A5.An inverse relationship between DNA methylation and gene expression was observed more than can be accounted for by chance, supporting previous findings implicating DNA methylation as a transcriptional repressor. Our study should prove valuable in helping future studies prioritise candidate genes and epigenetic mechanisms for in-depth functional follow-up analyses.

Download Full-text

Inference of cell-type specific imprinted regulatory elements and genes during human neuronal differentiation

10.1101/2021.10.04.463060 ◽

2021 ◽

Author(s):

Dan Liang ◽

Nil Aygün ◽

Nana Matoba ◽

Folami Ideraabdullah ◽

Michael I Love ◽

...

Keyword(s):

Gene Expression ◽

Neural Progenitor Cells ◽

Association Studies ◽

Cpg Islands ◽

Uniparental Disomy ◽

Cell Types ◽

Regulatory Elements ◽

Imprinted Genes ◽

Specific Cell ◽

Cell Type

Genomic imprinting results in gene expression biased by parental chromosome of origin and occurs in genes with important roles during human brain development. However, the cell-type and temporal specificity of imprinting during human neurogenesis is generally unknown. By detecting within-donor allelic biases in chromatin accessibility and gene expression that are unrelated to cross-donor genotype, we inferred imprinting in both primary human neural progenitor cells (phNPCs) and their differentiated neuronal progeny from up to 85 donors. We identified 43/20 putatively imprinted regulatory elements (IREs) in neurons/progenitors, and 133/79 putatively imprinted genes in neurons/progenitors. Though 10 IREs and 42 genes were shared between neurons and progenitors, most imprinting was only detected within specific cell types. In addition to well-known imprinted genes and their promoters, we inferred novel IREs and imprinted genes. We found IREs overlapped with CpG islands more than non-imprinted regulatory elements. Consistent with DNA methylation-based regulation of imprinted expression, some putatively imprinted regulatory elements also overlapped with differentially methylated regions on the maternal germline. Finally, we identified a progenitor-specific putatively imprinted gene overlap with copy number variation that is associated with uniparental disomy-like phenotypes. Our results can therefore be useful in interpreting the function of variants identified in future parent-of-origin association studies.

Download Full-text

Genetic variant effects on gene expression in human pancreatic islets and their implications for T2D

Nature Communications ◽

10.1038/s41467-020-18581-8 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 6

Author(s):

Ana Viñuela ◽

Arushi Varshney ◽

Martijn van de Bunt ◽

Rashmi B. Prasad ◽

Olof Asplund ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association ◽

Regulatory Sequences ◽

Genome Wide Association Studies ◽

Risk Variants ◽

Genome Wide ◽

Gwas Signal ◽

Using Data

Abstract Most signals detected by genome-wide association studies map to non-coding sequence and their tissue-specific effects influence transcriptional regulation. However, key tissues and cell-types required for functional inference are absent from large-scale resources. Here we explore the relationship between genetic variants influencing predisposition to type 2 diabetes (T2D) and related glycemic traits, and human pancreatic islet transcription using data from 420 donors. We find: (a) 7741 cis-eQTLs in islets with a replication rate across 44 GTEx tissues between 40% and 73%; (b) marked overlap between islet cis-eQTL signals and active regulatory sequences in islets, with reduced eQTL effect size observed in the stretch enhancers most strongly implicated in GWAS signal location; (c) enrichment of islet cis-eQTL signals with T2D risk variants identified in genome-wide association studies; and (d) colocalization between 47 islet cis-eQTLs and variants influencing T2D or glycemic traits, including DGKB and TCF7L2. Our findings illustrate the advantages of performing functional and regulatory studies in disease relevant tissues.

Download Full-text

Integration of methylation QTL and enhancer–target gene maps with schizophrenia GWAS summary results identifies novel genes

Bioinformatics ◽

10.1093/bioinformatics/btz161 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3576-3583 ◽

Cited By ~ 7

Author(s):

Chong Wu ◽

Wei Pan

Keyword(s):

Quantitative Trait ◽

Large Scale ◽

Target Gene ◽

Association Studies ◽

Regulatory Elements ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Novel Genes ◽

Gene Pairs ◽

Genome Wide

Abstract Motivation Most trait-associated genetic variants identified in genome-wide association studies (GWASs) are located in non-coding regions of the genome and thought to act through their regulatory roles. Results To account for enriched association signals in DNA regulatory elements, we propose a novel and general gene-based association testing strategy that integrates enhancer-target gene pairs and methylation quantitative trait locus data with GWAS summary results; it aims to both boost statistical power for new discoveries and enhance mechanistic interpretability of any new discovery. By reanalyzing two large-scale schizophrenia GWAS summary datasets, we demonstrate that the proposed method could identify some significant and novel genes (containing no genome-wide significant SNPs nearby) that would have been missed by other competing approaches, including the standard and some integrative gene-based association methods, such as one incorporating enhancer-target gene pairs and one integrating expression quantitative trait loci. Availability and implementation Software: wuchong.org/egmethyl.html Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TWENTY-FOUR GENES ARE UPREGULATED IN PATIENTS WITH HYPOSPADIAS

Balkan Journal of Medical Genetics ◽

10.2478/bjmg-2013-0030 ◽

2013 ◽

Vol 16 (2) ◽

pp. 39-43 ◽

Cited By ~ 6

Author(s):

R. Karabulut ◽

Z. Turkyilmaz ◽

K. Sonmez ◽

G. Kumas ◽

Sg. Ergun ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Association Studies ◽

Gene Expression Profiles ◽

Ventral Surface ◽

Genome Wide Association Studies ◽

Expression Array ◽

Large Patient ◽

Expression Studies ◽

Genome Wide

ABSTRACT Hypospadias is a congenital hypoplasia of the penis, with displacement of the urethral opening along the ventral surface, and has been reported to be one of the most common congenital anomalies, occurring in approximately 1:250 to 1:300 live births. As hypospadias is reported to be an easily diagnosed malformation at the crossroads of genetics and environment, it is important to study the genetic component in order to elucidate its etiology. In this study, the gene expression profiles both in human hypospadias tissues and normal penile tissues were studied by Human Gene Expression Array. Twentyfour genes were found to be upregulated. Among these, ATF3 and CYR61 have been reported previously. Other genes that have not been previously reported were also found to be upregulated: BTG2, CD69, CD9, DUSP1, EGR1, EIF4A1, FOS, FOSB, HBEGF, HNRNPUL1, IER2, JUN, JUNB, KLF2, NR4A1, NR4A2, PTGS2, RGS1, RTN4, SLC25A25, SOCS3 and ZFP36 (p <0.05). Further studies including genome-wide association studies (GWAS) with expression studies in a large patient group will help us for identifiying the candidate gene(s) in the etiology of hypospadias

Download Full-text