scholarly journals Chromatin Interaction Neural Network (ChINN): A machine learning-based method for predicting chromatin interactions from DNA sequences

2020 ◽  
Author(s):  
Fan Cao ◽  
Yu Zhang ◽  
Yichao Cai ◽  
Sambhavi Animesh ◽  
Ying Zhang ◽  
...  

AbstractChromatin interactions play important roles in regulating gene expression. However, the availability of genome-wide chromatin interaction data is limited. Various computational methods have been developed to predict chromatin interactions. Most of these methods rely on large collections of ChIP-Seq/RNA-Seq/DNase-Seq datasets and predict only enhancer-promoter interactions. Some of the ‘state-of-the-art’ methods have poor experimental designs, leading to over-exaggerated performances and misleading conclusions. Here we developed a computational method, Chromatin Interaction Neural Network (ChINN), to predict chromatin interactions between open chromatin regions by using only DNA sequences of the interacting open chromatin regions. ChINN is able to predict CTCF-, RNA polymerase II- and HiC-associated chromatin interactions between open chromatin regions. ChINN also shows good across-sample performances and captures various sequence features that are predictive of chromatin interactions. To apply our results to clinical patient data, we applied CHINN to predict chromatin interactions in 6 chronic lymphocytic leukemia (CLL) patient samples and a cohort of open chromatin data from 84 CLL samples that was previously published. Our results demonstrated extensive heterogeneity in chromatin interactions in patient samples, and one of the sources of this heterogeneity were the different subtypes of CLL.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fan Cao ◽  
Yu Zhang ◽  
Yichao Cai ◽  
Sambhavi Animesh ◽  
Ying Zhang ◽  
...  

AbstractChromatin interactions play important roles in regulating gene expression. However, the availability of genome-wide chromatin interaction data is limited. We develop a computational method, chromatin interaction neural network (ChINN), to predict chromatin interactions between open chromatin regions using only DNA sequences. ChINN predicts CTCF- and RNA polymerase II-associated and Hi-C chromatin interactions. ChINN shows good across-sample performances and captures various sequence features for chromatin interaction prediction. We apply ChINN to 6 chronic lymphocytic leukemia (CLL) patient samples and a published cohort of 84 CLL open chromatin samples. Our results demonstrate extensive heterogeneity in chromatin interactions among CLL patient samples.


2019 ◽  
Author(s):  
Fan Cao ◽  
Ying Zhang ◽  
Yan Ping Loh ◽  
Yichao Cai ◽  
Melissa J. Fullwood

AbstractChromatin interactions play important roles in regulating gene expression. However, the availability of genome-wide chromatin interaction data is very limited. Various computational methods have been developed to predict chromatin interactions. Most of these methods rely on large collections of ChIP-Seq/RNA-Seq/DNase-Seq datasets and predict only enhancer-promoter interactions. Some of the ‘state-of-the-art’ methods have poor experimental designs, leading to over-exaggerated performances and misleading conclusions. Here we developed a computational method, Chromatin Interaction Neural Network (CHINN), to predict chromatin interactions between open chromatin regions by using only DNA sequences of the interacting open chromatin regions. CHINN is able to predict CTCF- and RNA polymerase II-associated chromatin interactions between open chromatin regions. CHINN also shows good across-sample performances and captures various sequence features that are predictive of chromatin interactions. We applied CHINN to 84 chronic lymphocytic leukemia (CLL) samples and detected systematic differences in the chromatin interactome between IGVH-mutated and IGVH-unmutated CLL samples.


2019 ◽  
Vol 36 (6) ◽  
pp. 1704-1711
Author(s):  
Artur Jaroszewicz ◽  
Jason Ernst

Abstract Motivation Chromatin interactions play an important role in genome architecture and gene regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g. 5-25 kb), which is substantially coarser than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions. Results To predict the sources of Hi-C-identified interactions at a high resolution (e.g. 100 bp), we developed a computational method that integrates data from DNase-seq and ChIP-seq of TFs and histone marks. Our method, χ-CNN, uses this data to first train a convolutional neural network (CNN) to discriminate between called Hi-C interactions and non-interactions. χ-CNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also show χ-CNN predictions enrich for evolutionarily conserved bases, eQTLs and CTCF motifs, supporting their biological significance. χ-CNN provides an approach for analyzing important aspects of genome architecture and gene regulation at a higher resolution than previously possible. Availability and implementation χ-CNN software is available on GitHub (https://github.com/ernstlab/X-CNN). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Artur Jaroszewicz ◽  
Jason Ernst

AbstractChromatin interactions play an important role in genome architecture and regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g., 5-25kb), which is substantially larger than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions. To predict the sources of Hi-C identified interactions at a high resolution (e.g., 100bp), we developed a computational method that integrates ChIP-seq data of transcription factors and histone marks and DNase-seq data. Our method,χ-SCNN, uses this data to first train a Siamese Convolutional Neural Network (SCNN) to discriminate between called Hi-C interactions and non-interactions.χ-SCNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also showχ-SCNN predictions enrich for evolutionarily conserved bases, eQTLs, and CTCF motifs, supporting their biological significance.χ-SCNN provides an approach for analyzing important aspects of genome architecture and regulation at a higher resolution than previously possible.χ-SCNN software is available on GitHub (https://github.com/ernstlab/X-SCNN).


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 354
Author(s):  
Lu Zhang ◽  
Xinyi Qin ◽  
Min Liu ◽  
Ziwei Xu ◽  
Guangzhong Liu

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%–83.38% and an area under the curve (AUC) of 81.39%–91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%–83.04% and an AUC of 80.79%–91.09%, which shows an excellent generalization ability of our proposed method.


2020 ◽  
Author(s):  
Claire Marchal ◽  
Nivedita Singh ◽  
Ximena Corso-Díaz ◽  
Anand Swaroop

AbstractThree-dimensional (3D) conformation of the chromatin is crucial to stringently regulate gene expression patterns and DNA replication in a cell-type specific manner. HiC is a key technique for measuring 3D chromatin interactions genome wide. Estimating and predicting the resolution of a library is an essential step in any HiC experimental design. Here, we present the mathematical concepts to estimate the resolution of a library and predict whether deeper sequencing would enhance the resolution. We have developed HiCRes, a docker pipeline, by applying these concepts to human and mouse HiC libraries.


2019 ◽  
Vol 20 (S15) ◽  
Author(s):  
Hongda Bu ◽  
Jiaqi Hao ◽  
Yanglan Gan ◽  
Shuigeng Zhou ◽  
Jihong Guan

Abstract Background Super-enhancers (SEs) are clusters of transcriptional active enhancers, which dictate the expression of genes defining cell identity and play an important role in the development and progression of tumors and other diseases. Many key cancer oncogenes are driven by super-enhancers, and the mutations associated with common diseases such as Alzheimer’s disease are significantly enriched with super-enhancers. Super-enhancers have shown great potential for the identification of key oncogenes and the discovery of disease-associated mutational sites. Results In this paper, we propose a new computational method called DEEPSEN for predicting super-enhancers based on convolutional neural network. The proposed method integrates 36 kinds of features. Compared with existing approaches, our method performs better and can be used for genome-wide prediction of super-enhancers. Besides, we screen important features for predicting super-enhancers. Conclusion Convolutional neural network is effective in boosting the performance of super-enhancer prediction.


2014 ◽  
Vol 32 (4_suppl) ◽  
pp. 464-464
Author(s):  
Thai Huu Ho ◽  
Jeong-Heon Lee ◽  
Rafael Nunez Nateras ◽  
Erik P. Castle ◽  
Melissa L. Stanton ◽  
...  

464 Background: Although the von Hippel-Lindau (VHL) tumor suppressor gene is mutated in 60% of ccRCC, deletion of VHL in mice is insufficient for tumorigenesis. Sequencing of ccRCC tumors identified mutations in SETD2, a histone H3 lysine 36 (H3K36) trimethyltransferase. We hypothesize that loss of SETD2 methyltransferase activity alters the genome wide pattern of H3K36 trimethylation (H3K36me3) in ccRCC, and contributes to the cancer phenotype. Methods: To generate a genome-wide profile of H3K36me3 in frozen nephrectomy samples and RCC cell lines, we optimized a chromatin immunoprecipitation (ChIP) protocol for the isolation of DNA associated with H3K36me3. H3K36me3 is associated with open chromatin and an H3K36me3-specific antibody was used for immunoprecipitation of endogenous H3K36me3-bound DNA. ChIP PCR primers were optimized for active genes, such as actin, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and a “gene desert” on chromosome 12 (negative control). ChIP libraries were then generated from 3 paired uninvolved kidney and RCC and 2 RCC cell lines. In order to identify H3K36Me3 upregulated regions in uninvolved kidney and RCC, reads from the ChIP sequencing were mapped to the human genome using Burrows-Wheeler Aligner and SICER algorithms. Results: Using ChIP PCR, we found that active genomic regions were enriched 15-30 fold over the negative controls indicating that the quality and yield of immunoprecipitated DNA/chromatin complexes from frozen tissue was sufficient for ChIP sequencing. A preliminary ChIP sequencing analysis of RCC cell lines and frozen ccRCC tissue indicates that H3K36me3 enriched DNA sequences were mapped to exons (31.3%) compared to introns (13.5%, p<0.001), consistent with the role of H3K36me3 in transcription. Conclusions: Genomic regions enriched for H3K36Me3 binding were identified from patient-derived tissue and RCC cell lines. Current efforts are focused on comparing the H3K36me3 profiles between matched tumor and uninvolved kidney ChIP libraries to generate a genome wide map of dysregulated H3K36me3 modifications.


2021 ◽  
Vol 7 (26) ◽  
pp. eabf8962
Author(s):  
Ke Xiao ◽  
Dan Xiong ◽  
Gong Chen ◽  
Jinsong Yu ◽  
Yue Li ◽  
...  

Like most DNA viruses, herpesviruses precisely deliver their genomes into the sophisticatedly organized nuclei of the infected host cells to initiate subsequent transcription and replication. However, it remains elusive how the viral genome specifically interacts with the host genome and hijacks host transcription machinery. Using pseudorabies virus (PRV) as model virus, we performed chromosome conformation capture assays to demonstrate a genome-wide specific trans-species chromatin interaction between the virus and host. Our data show that the PRV genome is delivered by the host DNA binding protein RUNX1 into the open chromatin and active transcription zone. This facilitates virus hijacking host RNAPII to efficiently transcribe viral genes, which is significantly inhibited by either a RUNX1 inhibitor or RNA interference. Together, these findings provide insights into the chromatin interaction between viral and host genomes and identify new areas of research to advance the understanding of herpesvirus genome transcription.


2020 ◽  
Author(s):  
Jing Zhang ◽  
Jason Liu ◽  
Donghoon Lee ◽  
Shaoke Lou ◽  
Zhanlin Chen ◽  
...  

AbstractBackgroundDuring transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. “co-binding changes”) can affect the co-regulating associations between TFs (i.e. “rewiring the co-regulator network”). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied.MethodsTo address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease.ResultsWe assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes.ConclusionsOur method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.


Sign in / Sign up

Export Citation Format

Share Document