scholarly journals Integrative analysis of epigenetics data identifies gene-specific regulatory elements

2021 ◽  
Author(s):  
Florian Schmidt ◽  
Alexander Marx ◽  
Nina Baumgarten ◽  
Marie Hebel ◽  
Martin Wegner ◽  
...  

Abstract Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.

eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Lindsey E Montefiori ◽  
Debora R Sobreira ◽  
Noboru J Sakabe ◽  
Ivy Aneas ◽  
Amelia C Joslin ◽  
...  

Over 500 genetic loci have been associated with risk of cardiovascular diseases (CVDs); however, most loci are located in gene-distal non-coding regions and their target genes are not known. Here, we generated high-resolution promoter capture Hi-C (PCHi-C) maps in human induced pluripotent stem cells (iPSCs) and iPSC-derived cardiomyocytes (CMs) to provide a resource for identifying and prioritizing the functional targets of CVD associations. We validate these maps by demonstrating that promoters preferentially contact distal sequences enriched for tissue-specific transcription factor motifs and are enriched for chromatin marks that correlate with dynamic changes in gene expression. Using the CM PCHi-C map, we linked 1999 CVD-associated SNPs to 347 target genes. Remarkably, more than 90% of SNP-target gene interactions did not involve the nearest gene, while 40% of SNPs interacted with at least two genes, demonstrating the importance of considering long-range chromatin interactions when interpreting functional targets of disease loci.


2019 ◽  
Author(s):  
Il Bin Kim ◽  
Taeyeop Lee ◽  
Junehawk Lee ◽  
Jonghun Kim ◽  
Hyunseong Lee ◽  
...  

Three-dimensional chromatin structures regulate gene expression across genome. The significance of de novo mutations (DNMs) affecting chromatin interactions in autism spectrum disorder (ASD) remains poorly understood. We generated 931 whole-genome sequences for Korean simplex families to detect DNMs and identified target genes dysregulated by noncoding DNMs via long-range chromatin interactions between regulatory elements. Notably, noncoding DNMs that affect chromatin interactions exhibited transcriptional dysregulation implicated in ASD risks. Correspondingly, target genes were significantly involved in histone modification, prenatal brain development, and pregnancy. Both noncoding and coding DNMs collectively contributed to low IQ in ASD. Indeed, noncoding DNMs resulted in alterations, via chromatin interactions, in target gene expression in primitive neural stem cells derived from human induced pluripotent stem cells from an ASD subject. The emerging neurodevelopmental genes, not previously implicated in ASD, include CTNNA2, GRB10, IKZF1, PDE3B, and BACE1. Our results were reproducible in 517 probands from MSSNG cohort. This work demonstrates that noncoding DNMs contribute to ASD via chromatin interactions.


2019 ◽  
Author(s):  
Guoliang Li ◽  
Tongkai Sun ◽  
Huidan Chang ◽  
Liuyang Cai ◽  
Ping Hong ◽  
...  

AbstractUnderstanding chromatin interactions is important since they create chromosome conformation and link the cis- and trans-regulatory elements to their target genes for transcriptional regulation. Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) sequencing is a genome-wide high-throughput technology that detects chromatin interactions associated with a specific protein of interest. Previously we developed ChIA-PET Tool in 2010 for ChIA-PET data analysis. Here we present the updated version of ChIA-PET Tool (V3), is a computational package to process the next-generation sequence data generated from ChIA-PET experiments. It processes the short-read data and long-read ChIA-PET data with multithreading and generates the statistics of results in a HTML file. In this paper, we provide a detailed demonstration of the design of ChIA-PET Tool V3 and how to install it and analyze a specific ChIA-PET data set with it. At present, other ChIA-PET data analysis tools have developed including ChiaSig, MICC, Mango and ChIA-PET2 and so on. We compared our tool with other tools using the same public data set in the same machine. Most of peaks detected by ChIA-PET Tool V3 overlap with those from other tools. There is higher enrichment for significant chromatin interactions of ChIA-PET Tool V3 in APA plot. ChIA-PET Tool V3 is open source and is available at GitHub (https://github.com/GuoliangLi-HZAU/ChIA-PET_Tool_V3/).


2021 ◽  
Author(s):  
Noha Osman ◽  
Abd-El-Monsif Shawky ◽  
Michal Brylinski

Abstract Background: Numerous genome-wide association studies (GWAS) conducted to date revealed genetic variants associated with various diseases, including breast and prostate cancers. Despite the availability of these large-scale data, relatively few variants have been functionally characterized, mainly because the majority of single-nucleotide polymorphisms (SNPs) map to the non-coding regions of the human genome. The functional characterization of these non-coding variants and the identification of their target genes remain challenging.Results: In this communication, we explore the potential functional mechanisms of non-coding SNPs by integrating GWAS with the high-resolution chromosome conformation capture (Hi-C) data for breast and prostate cancers. We show that more genetic variants map to regulatory elements through the 3D genome structure than the 1D linear genome lacking physical chromatin interactions. Importantly, the association of enhancers, transcription factors, and their target genes with breast and prostate cancers tends to be higher when these regulatory elements are mapped to high-risk SNPs through spatial interactions compared to simply using a linear proximity. Finally, we demonstrate that topologically associating domains (TADs) carrying high-risk SNPs also contain gene regulatory elements whose association with cancer is generally higher than those belonging to control TADs containing no high-risk variants.Conclusions: Our results suggest that many SNPs may contribute to the cancer development by affecting the expression of certain tumor-related genes through long-range chromatin interactions with gene regulatory elements. Integrating large-scale genetic datasets with the 3D genome structure offers an attractive and unique approach to systematically investigate the functional mechanisms of genetic variants in disease risk and progression.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chiara Regondi ◽  
Maddalena Fratelli ◽  
Giovanna Damia ◽  
Federica Guffanti ◽  
Monica Ganzinelli ◽  
...  

Abstract Background In-depth analysis of regulation networks of genes aberrantly expressed in cancer is essential for better understanding tumors and identifying key genes that could be therapeutically targeted. Results We developed a quantitative analysis approach to investigate the main biological relationships among different regulatory elements and target genes; we applied it to Ovarian Serous Cystadenocarcinoma and 177 target genes belonging to three main pathways (DNA REPAIR, STEM CELLS and GLUCOSE METABOLISM) relevant for this tumor. Combining data from ENCODE and TCGA datasets, we built a predictive linear model for the regulation of each target gene, assessing the relationships between its expression, promoter methylation, expression of genes in the same or in the other pathways and of putative transcription factors. We proved the reliability and significance of our approach in a similar tumor type (basal-like Breast cancer) and using a different existing algorithm (ARACNe), and we obtained experimental confirmations on potentially interesting results. Conclusions The analysis of the proposed models allowed disclosing the relations between a gene and its related biological processes, the interconnections between the different gene sets, and the evaluation of the relevant regulatory elements at single gene level. This led to the identification of already known regulators and/or gene correlations and to unveil a set of still unknown and potentially interesting biological relationships for their pharmacological and clinical use.


2017 ◽  
Author(s):  
Yanli Wang ◽  
Bo Zhang ◽  
Lijun Zhang ◽  
Lin An ◽  
Jie Xu ◽  
...  

ABSTRACTRecent advent of 3C-based technologies such as Hi-C and ChIA-PET provides us an opportunity to explore chromatin interactions and 3D genome organization in an unprecedented scale and resolution. However, it remains a challenge to visualize chromatin interaction data due to its size and complexity. Here, we introduce the 3D Genome Browser (http://3dgenome.org), which allows users to conveniently explore both publicly available and their own chromatin interaction data. Users can also seamlessly integrate other “omics” data sets, such as ChIP-Seq and RNA-Seq for the same genomic region, to gain a complete view of both regulatory landscape and 3D genome structure for any given gene. Finally, our browser provides multiple methods to link distal cis-regulatory elements with their potential target genes, including virtual 4C, ChIA-PET, Capture Hi-C and cross-cell-type correlation of proximal and distal DNA hypersensitive sites, and therefore represents a valuable resource for the study of gene regulation in mammalian genomes.


Author(s):  
Tianshun Gao ◽  
Jiang Qian

Abstract Enhancers are distal cis-regulatory elements that activate the transcription of their target genes. They regulate a wide range of important biological functions and processes, including embryogenesis, development, and homeostasis. As more and more large-scale technologies were developed for enhancer identification, a comprehensive database is highly desirable for enhancer annotation based on various genome-wide profiling datasets across different species. Here, we present an updated database EnhancerAtlas 2.0 (http://www.enhanceratlas.org/indexv2.php), covering 586 tissue/cell types that include a large number of normal tissues, cancer cell lines, and cells at different development stages across nine species. Overall, the database contains 13 494 603 enhancers, which were obtained from 16 055 datasets using 12 high-throughput experiment methods (e.g. H3K4me1/H3K27ac, DNase-seq/ATAC-seq, P300, POLR2A, CAGE, ChIA-PET, GRO-seq, STARR-seq and MPRA). The updated version is a huge expansion of the first version, which only contains the enhancers in human cells. In addition, we predicted enhancer–target gene relationships in human, mouse and fly. Finally, the users can search enhancers and enhancer–target gene relationships through five user-friendly, interactive modules. We believe the new annotation of enhancers in EnhancerAtlas 2.0 will facilitate users to perform useful functional analysis of enhancers in various genomes.


2020 ◽  
Author(s):  
Pavel P. Kuksa ◽  
Chien-Yueh Lee ◽  
Alexandre Amlie-Wolf ◽  
Prabhakaran Gangadharan ◽  
Elizabeth E. Mlynarski ◽  
...  

AbstractSummaryWe report SparkINFERNO (Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants), a scalable bioinformatics pipeline characterizing noncoding GWAS association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts, and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci, and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWAS studies and show that SparkINFERNO is more than 60-times efficient and scales with data size and amount of computational resources.AvailabilitySparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/[email protected]


2021 ◽  
Vol 12 ◽  
Author(s):  
Yu Ding ◽  
Jiannan Zhu ◽  
Dongsheng Zhao ◽  
Qiaoquan Liu ◽  
Qingqing Yang ◽  
...  

Rice is the most important source of food worldwide, providing energy, and nutrition for more than half of the population worldwide. Rice grain quality is a complex trait that is affected by several factors, such as the genotype and environment, and is a major target for rice breeders. Cis-regulatory elements (CREs) are the regions of non-coding DNA, which play a critical role in gene expression regulation. Compared with gene knockout, CRE modifications can fine-tune the expression levels of target genes. Genome editing has provided opportunities to modify the genomes of organisms in a precise and predictable way. Recently, the promoter modifications of coding genes using genome editing technologies in plant improvement have become popular. In this study, we reviewed the results of recent studies on the identification, characterization, and application of CREs involved in rice grain quality. We proposed CREs as preferred potential targets to create allelic diversity and to improve quality traits via genome editing strategies in rice. We also discussed potential challenges and experimental considerations for the improvement in grain quality in crop plants.


2020 ◽  
Author(s):  
Noha Osman ◽  
Michal Brylinski

AbstractNumerous genome-wide association studies (GWAS) conducted to date revealed genetic variants associated with various diseases, including breast and prostate cancers. Despite the availability of these large-scale data, relatively few variants have been functionally characterized, mainly because the majority of single-nucleotide polymorphisms (SNPs) map to the non-coding regions of the human genome. The functional characterization of these non-coding variants and the identification of their target genes remain challenging. In this communication, we explore the potential functional mechanisms of non-coding SNPs by integrating GWAS with the high-resolution chromosome conformation capture (Hi-C) data for breast and prostate cancers. We show that more genetic variants map to regulatory elements through the 3D genome structure than the 1D linear genome lacking physical chromatin interactions. Importantly, the association of enhancers, transcription factors, and their target genes with breast and prostate cancers tends to be higher when these regulatory elements are mapped to high-risk SNPs through spatial interactions compared to simply using a linear proximity. Finally, we demonstrate that topologically associating domains (TADs) carrying high-risk SNPs also contain gene regulatory elements whose association with cancer is generally higher than those belonging to control TADs containing no high-risk variants. Our results suggest that many SNPs may contribute to the cancer development by affecting the expression of certain tumor-related genes through long-range chromatin interactions with gene regulatory elements. Integrating large-scale genetic datasets with the 3D genome structure offers an attractive and unique approach to systematically investigate the functional mechanisms of genetic variants in disease risk and progression.


Sign in / Sign up

Export Citation Format

Share Document