scholarly journals Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network

2018 ◽  
Author(s):  
Wanwen Zeng ◽  
Yong Wang ◽  
Rui Jiang

AbstractMotivationInteractions among such cis-regulatory elements as enhancers and promoters are main driving forces shaping context-specific chromatin structure and gene expression. Although there have been computational methods for predicting gene expression from genomic and epigenomic information, most of them overlook long-range enhancer-promoter interactions, due to the difficulty in precisely linking regulatory enhancers to target genes. Recently, a novel high-throughput experimental approach named HiChIP has been developed and generating comprehensive data on high-resolution interactions between promoters and distal enhancers. On the other hand, plenty of studies have suggested that deep learning achieves state-of-the-art performance in epigenomic signal prediction, and thus promoting the understanding of regulatory elements. In consideration of these two factors, we integrate proximal promoter sequences and HiChIP distal enhancer-promoter interactions to accurately model gene expression.ResultsWe propose DeepExpression, a densely connected convolutional neural network to predict gene expression using both promoter sequences and enhancer-promoter interactions. We demonstrate that our model consistently outperforms baseline methods not only in the classification of binary gene expression status but also in the regression of continuous gene expression levels, in both cross-validation experiments and cross-cell lines predictions. We show that sequential promoter information is more informative than experimental enhancer information while enhancer-promoter interactions are most beneficial from those within ±100 kbp around the TSS of a gene. We finally visualize motifs in both promoter and enhancer regions and show the match of identified sequence signatures and known motifs. We expect to see a wide spectrum of applications using HiChIP data in deciphering the mechanism of gene regulation.AvailabilityDeepExpression is freely available at https://github.com/wanwenzeng/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Author(s):  
Wanwen Zeng ◽  
Yong Wang ◽  
Rui Jiang

Abstract Motivation Interactions among cis-regulatory elements such as enhancers and promoters are main driving forces shaping context-specific chromatin structure and gene expression. Although there have been computational methods for predicting gene expression from genomic and epigenomic information, most of them neglect long-range enhancer–promoter interactions, due to the difficulty in precisely linking regulatory enhancers to target genes. Recently, HiChIP, a novel high-throughput experimental approach, has generated comprehensive data on high-resolution interactions between promoters and distal enhancers. Moreover, plenty of studies suggest that deep learning achieves state-of-the-art performance in epigenomic signal prediction, and thus promoting the understanding of regulatory elements. In consideration of these two factors, we integrate proximal promoter sequences and HiChIP distal enhancer–promoter interactions to accurately predict gene expression. Results We propose DeepExpression, a densely connected convolutional neural network, to predict gene expression using both promoter sequences and enhancer–promoter interactions. We demonstrate that our model consistently outperforms baseline methods, not only in the classification of binary gene expression status but also in regression of continuous gene expression levels, in both cross-validation experiments and cross-cell line predictions. We show that the sequential promoter information is more informative than the experimental enhancer information; meanwhile, the enhancer–promoter interactions within ±100 kbp around the TSS of a gene are most beneficial. We finally visualize motifs in both promoter and enhancer regions and show the match of identified sequence signatures with known motifs. We expect to see a wide spectrum of applications using HiChIP data in deciphering the mechanism of gene regulation. Availability and implementation DeepExpression is freely available at https://github.com/wanwenzeng/DeepExpression. Supplementary information Supplementary data are available at Bioinformatics online.


2022 ◽  
Vol 270 ◽  
pp. 547-554
Author(s):  
Grant Schumaker ◽  
Andrew Becker ◽  
Gary An ◽  
Stephen Badylak ◽  
Scott Johnson ◽  
...  

Blood ◽  
1998 ◽  
Vol 92 (12) ◽  
pp. 4529-4538 ◽  
Author(s):  
Steve N. Georas ◽  
John E. Cumberland ◽  
Thomas F. Burke ◽  
Rongbing Chen ◽  
Ulrike Schindler ◽  
...  

Abstract The differentiation of naive T-helper (Th) cells into cytokine-secreting effector Th cells requires exposure to multiple signals, including exogenous cytokines. Interleukin-4 (IL-4) plays a major role in this process by promoting the differentiation of IL-4–secreting Th2 cells. In Th2 cells, IL-4 gene expression is tightly controlled at the level of transcription by the coordinated binding of multiple transcription factors to regulatory elements in the proximal promoter region. Nuclear factor of activated T cell (NFAT) family members play a critical role in regulating IL-4 transcription and interact with up to five sequences (termed P0 through P4) in the IL-4 promoter. The molecular mechanisms by which IL-4 induces expression of the IL-4 gene are not known, although the IL-4–activated transcription factor signal transducer and activator of transcription 6 (Stat6) is required for this effect. We report here that Stat6 interacts with three binding sites in the human IL-4 promoter by electrophoretic mobility shift assays. These sites overlap the P1, P2, and P4 NFAT elements. To investigate the role of Stat6 in regulating IL-4 transcription, we used Stat6-deficient Jurkat T cells with different intact IL-4 promoter constructs in cotransfection assays. We show that, whereas a multimerized response element from the germline IgE promoter was highly induced by IL-4 in Stat6-expressing Jurkat cells, the intact human IL-4 promoter was repressed under similar conditions. We conclude that the function of Stat6 is highly dependent on promoter context and that this factor promotes IL-4 gene expression in an indirect manner.


2019 ◽  
Author(s):  
Pengchao Ye ◽  
Wenbin Ye ◽  
Congting Ye ◽  
Shuchao Li ◽  
Lishan Ye ◽  
...  

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes. Results We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth. Availability and implementation Freely available for download at https://github.com/BMILAB/scHinter. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Bong-Hyun Kim ◽  
Kijin Yu ◽  
Peter C W Lee

Abstract Motivation Cancer classification based on gene expression profiles has provided insight on the causes of cancer and cancer treatment. Recently, machine learning-based approaches have been attempted in downstream cancer analysis to address the large differences in gene expression values, as determined by single-cell RNA sequencing (scRNA-seq). Results We designed cancer classifiers that can identify 21 types of cancers and normal tissues based on bulk RNA-seq as well as scRNA-seq data. Training was performed with 7398 cancer samples and 640 normal samples from 21 tumors and normal tissues in TCGA based on the 300 most significant genes expressed in each cancer. Then, we compared neural network (NN), support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF) methods. The NN performed consistently better than other methods. We further applied our approach to scRNA-seq transformed by kNN smoothing and found that our model successfully classified cancer types and normal samples. Availability and implementation Cancer classification by neural network. Supplementary information Supplementary data are available at Bioinformatics online.


1989 ◽  
Vol 17 (11) ◽  
pp. 4327-4337 ◽  
Author(s):  
Mark W. Nachtigal ◽  
Barbara E. Nickel ◽  
Margaret E. Klassen ◽  
Wengang Zhang ◽  
Norman L. Eberhardt ◽  
...  

2021 ◽  
Author(s):  
Hayfa H. Hassani ◽  
Rakad M. Kh AL-Jumaily ◽  
Fadhel M. Lafta

Male infertility is a complex medical condition, in which epigenetic factors play an important role. Epigenetics has recently gained significant scientific attention since it has added a new dimension to genomic and proteomic research. As a mechanism for maintaining genomic integrity and controlling gene expression, epigenetic modifications hold a great promise in capturing the subtle, yet very important, regulatory elements that might drive normal and abnormal sperm functions. The sperm’s epigenome is known to be marked by constant changing over spermatogenesis, which is highly susceptible to be influenced by a wide spectrum of environmental stimuli. Recently, epigenetic aberrations have been recognized as one of the causes of idiopathic male infertility. Recent advances in technology have enabled humans to study epigenetics role in male infertility.


2019 ◽  
Author(s):  
Dan MacLean

AbstractGene Regulatory networks that control gene expression are widely studied yet the interactions that make them up are difficult to predict from high throughput data. Deep Learning methods such as convolutional neural networks can perform surprisingly good classifications on a variety of data types and the matrix-like gene expression profiles would seem to be ideal input data for deep learning approaches. In this short study I compiled training sets of expression data using the Arabidopsis AtGenExpress global stress expression data set and known transcription factor-target interactions from the Arabidopsis PLACE database. I built and optimised convolutional neural networks with a best model providing 95 % accuracy of classification on a held-out validation set. Investigation of the activations within this model revealed that classification was based on positive correlation of expression profiles in short sections. This result shows that a convolutional neural network can be used to make classifications and reveal the basis of those calssifications for gene expression data sets, indicating that a convolutional neural network is a useful and interpretable tool for exploratory classification of biological data. The final model is available for download and as a web application.


2017 ◽  
Author(s):  
Sungsoo Park ◽  
Bonggun Shin ◽  
Yoonjung Choi ◽  
Kilsoo Kang ◽  
Keunsoo Kang

AbstractMotivationNext-generation sequencing (NGS), which allows the simultaneous sequencing of billions of DNA fragments simultaneously, has revolutionized how we study genomics and molecular biology by generating genome-wide molecular maps of molecules of interest. For example, an NGS-based transcriptomic assay called RNA-seq can be used to estimate the abundance of approximately 190,000 transcripts together. As the cost of next-generation sequencing sharply declines, researchers in many fields have been conducting research using NGS. The amount of information produced by NGS has made it difficult for researchers to choose the optimal set of target genes (or genomic loci).ResultsWe have sought to resolve this issue by developing a neural network-based feature (gene) selection algorithm called Wx. The Wx algorithm ranks genes based on the discriminative index (DI) score that represents the classification power for distinguishing given groups. With a gene list ranked by DI score, researchers can institutively select the optimal set of genes from the highest-ranking ones. We applied the Wx algorithm to a TCGA pan-cancer gene-expression cohort to identify an optimal set of gene-expression biomarker (universal gene-expression biomarkers) candidates that can distinguish cancer samples from normal samples for 12 different types of cancer. The 14 gene-expression biomarker candidates identified by Wx were comparable to or outperformed previously reported universal gene expression biomarkers, highlighting the usefulness of the Wx algorithm for next-generation sequencing data. Thus, we anticipate that the Wx algorithm can complement current state-of-the-art analytical applications for the identification of biomarker candidates as an alternative method.Availabilityhttps://github.com/deargen/[email protected] informationSupplementary data are available at online.


Sign in / Sign up

Export Citation Format

Share Document