ConvChrome: Predicting Gene Expression Based on Histone Modifications using Deep Learning Techniques.

2021 ◽  
Vol 16 ◽  
Author(s):  
Rania Hamdy ◽  
Yasser M.K. Omar ◽  
Fahima A. Maghraby

Background: Gene regulation is a complex and a dynamic process that not only depends on the DNA sequence of genes, but also is influenced by a key factor called Epigenetic Mechanisms. This factor along with other factors contributes to change the behavior of DNA. While these factors cannot affect the structure of DNA, they can control the behavior of DNA by turning genes "on" or "off" that leads to determine which proteins are transcribed. Objective: This paper will focus on histone modifications mechanism, histones are the group of proteins that bundle the DNA into a structural form called nucleosomes (coils); how DNA wraps with these histone proteins describes how gene can be accessed to express or not. When histones bound tightly to DNA, that make the gene cannot be expressed and vise verse. It is important to know Histone Modifications’ combinatorial patterns, and how these combinatorial patterns can affect and work together to control the process of gene expression. Methods: In this paper, ConvChrome deep learning methodologies are proposed for predicting the gene expression behavior from Histone modifications data as an input to use more than one Convolutional Network model, this happens in order to recognize patterns of histones signals and to interpret their spatial relationship arranged on chromatin structure to give insights into regulatory signatures of histone modifications. Results and Conclusion: The experiments results show that ConvChrome achieved 88.741 % in terms of Area under the Curve (AUC) score, which is an outstanding improvement over the baseline for gene expression classification prediction task from combinatorial interactions among five histone modifications on 56 different cell-types.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
John A. Halsall ◽  
Simon Andrews ◽  
Felix Krueger ◽  
Charlotte E. Rutledge ◽  
Gabriella Ficz ◽  
...  

AbstractChromatin configuration influences gene expression in eukaryotes at multiple levels, from individual nucleosomes to chromatin domains several Mb long. Post-translational modifications (PTM) of core histones seem to be involved in chromatin structural transitions, but how remains unclear. To explore this, we used ChIP-seq and two cell types, HeLa and lymphoblastoid (LCL), to define how changes in chromatin packaging through the cell cycle influence the distributions of three transcription-associated histone modifications, H3K9ac, H3K4me3 and H3K27me3. We show that chromosome regions (bands) of 10–50 Mb, detectable by immunofluorescence microscopy of metaphase (M) chromosomes, are also present in G1 and G2. They comprise 1–5 Mb sub-bands that differ between HeLa and LCL but remain consistent through the cell cycle. The same sub-bands are defined by H3K9ac and H3K4me3, while H3K27me3 spreads more widely. We found little change between cell cycle phases, whether compared by 5 Kb rolling windows or when analysis was restricted to functional elements such as transcription start sites and topologically associating domains. Only a small number of genes showed cell-cycle related changes: at genes encoding proteins involved in mitosis, H3K9 became highly acetylated in G2M, possibly because of ongoing transcription. In conclusion, modified histone isoforms H3K9ac, H3K4me3 and H3K27me3 exhibit a characteristic genomic distribution at resolutions of 1 Mb and below that differs between HeLa and lymphoblastoid cells but remains remarkably consistent through the cell cycle. We suggest that this cell-type-specific chromosomal bar-code is part of a homeostatic mechanism by which cells retain their characteristic gene expression patterns, and hence their identity, through multiple mitoses.


2021 ◽  
Author(s):  
Anthony Mark Raus ◽  
Tyson D Fuller ◽  
Nellie E Nelson ◽  
David A Valientes ◽  
Anita Bayat ◽  
...  

Aerobic exercise promotes physiological and molecular adaptations in neurons to influence brain function and behavior. The most well studied neurobiological consequences of exercise are those which underlie exercise-induced improvements in hippocampal memory, including the expression and regulation of the neurotrophic factor Bdnf. Whether aerobic exercise taking place during early-life periods of postnatal brain maturation has similar impacts on gene expression and its regulation remains to be investigated. Using unbiased next-generation sequencing we characterize gene expression programs and their regulation by specific, memory-associated histone modifications during juvenile-adolescent voluntary exercise (ELE). Traditional transcriptomic and epigenomic sequencing approaches have either used heterogeneous cell populations from whole tissue homogenates or flow cytometry for single cell isolation to distinguish cell types / subtypes. These methods fall short in providing cell-type specificity without compromising sequencing depth or procedure-induced changes to cellular phenotype. In this study, we use simultaneous isolation of translating mRNA and nuclear chromatin from a neuron-enriched cell population to more accurately pair ELE-induced changes in gene expression with epigenetic modifications. We employ a line of transgenic mice expressing the NuTRAP (Nuclear Tagging and Translating Ribosome Affinity Purification) cassette under the Emx1 promoter allowing for brain cell-type specificity. We then developed a technique that combines nuclear isolation using Isolation of Nuclei TAgged in Specific Cell Types (INTACT) with Translating Ribosomal Affinity Purification (TRAP) methods to determine cell type-specific epigenetic modifications influencing gene expression programs from a population of Emx1 expressing hippocampal neurons. Data from RNA-seq and CUT&RUN-seq were coupled to evaluate histone modifications influencing the expression of translating mRNA in neurons after early-life exercise (ELE). We also performed separate INTACT and TRAP isolations for validation of our protocol and demonstrate similar molecular functions and biological processes implicated by gene ontology (GO) analysis. Finally, as prior studies use tissue from opposite brain hemispheres to pair transcriptomic and epigenomic data from the same rodent, we take a bioinformatics approach to compare hemispheric differences in gene expression programs and histone modifications altered by by ELE. Our data reveal transcriptional and epigenetic signatures of ELE exposure and identify novel candidate gene-histone modification interactions for further investigation. Importantly, our novel approach of combined INTACT/TRAP methods from the same cell suspension allows for simultaneous transcriptomic and epigenomic sequencing in a cell-type specific manner.


2020 ◽  
Author(s):  
John A. Halsall ◽  
Simon Andrews ◽  
Felix Krueger ◽  
Charlotte E. Rutledge ◽  
Gabriella Ficz ◽  
...  

ABSTRACTBackgroundChromatin configuration influences gene expression in eukaryotes at multiple levels, from individual nucleosomes to chromatin domains several Mb long. Post-translational modifications (PTM) of core histones seem to be involved in chromatin structural transitions, but how remains unclear.To explore this, we used ChIP-seq and two cell types, HeLa and lymphoblastoid (LCL) to define how changes in chromatin packaging through the cell cycle influence the distributions of three transcription-associated histone modifications, H3K9ac, H3K4me3 and H3K27me3.ResultsChromosome regions (bands) of 10-50Mb, detectable by immunofluorescence microscopy of metaphase (M) chromosomes, are also present in G1 and G2. We show that they comprise 1-5Mb sub-bands that differ between HeLa and LCL but remain consistent through the cell cycle. The same sub-bands are defined by H3K9ac and H3K4me3, while H3K27me3 spreads more widely.We found little change between cell cycle phases, whether compared by 5Kb rolling windows or when analysis was restricted to functional elements such as transcription start sites and topologically associating domains.Only a small number of genes showed cell-cycle related changes: at genes encoding proteins involved in mitosis, H3K9 became highly acetylated in G2M, possibly because of ongoing transcription.ConclusionsModified histone isoforms H3K9ac, H3K4me3 and H3K27me3 exhibit a characteristic genomic distribution at resolutions of 1Mb and below that differs between HeLa and lymphoblastoid cells but remains remarkably consistent through the cell cycle. We suggest that this cell-type-specific chromosomal bar-code is part of a homeostatic mechanism by which cells retain their characteristic gene expression patterns, and hence their identity, through multiple mitoses.


2021 ◽  
Author(s):  
Jiachen Li ◽  
Siheng Chen ◽  
Xiaoyong Pan ◽  
Ye Yuan ◽  
Hong-bin Shen

Abstract Spatial transcriptomics data can provide high-throughput gene expression profiling and spatial structure of tissues simultaneously. An essential question of its initial analysis is cell clustering. However, most existing studies rely on only gene expression information and cannot utilize spatial information efficiently. Taking advantages of two recent technical development, spatial transcriptomics and graph neural network, we thus introduce CCST, Cell Clustering for Spatial Transcriptomics data with graph neural network, an unsupervised cell clustering method based on graph convolutional network to improve ab initio cell clustering and discovering of novel sub cell types based on curated cell category annotation. CCST is a general framework for dealing with various kinds of spatially resolved transcriptomics. With application to five in vitro and in vivo spatial datasets, we show that CCST outperforms other spatial cluster approaches on spatial transcriptomics datasets, and can clearly identify all four cell cycle phases from MERFISH data of cultured cells, and find novel functional sub cell types with different micro-environments from seqFISH+ data of brain, which are all validated experimentally, inspiring novel biological hypotheses about the underlying interactions among cell state, cell type and micro-environment.


2016 ◽  
Vol 32 (17) ◽  
pp. i639-i648 ◽  
Author(s):  
Ritambhara Singh ◽  
Jack Lanchantin ◽  
Gabriel Robins ◽  
Yanjun Qi

Author(s):  
Musu Yuan ◽  
Liang Chen ◽  
Minghua Deng

Abstract Motivation Single-cell RNA-seq (scRNA-seq) has been widely used to resolve cellular heterogeneity. After collecting scRNA-seq data, the natural next step is to integrate the accumulated data to achieve a common ontology of cell types and states. Thus, an effective and efficient cell-type identification method is urgently needed. Meanwhile, high quality reference data remain a necessity for precise annotation. However, such tailored reference data are always lacking in practice. To address this, we aggregated multiple datasets into a meta-dataset on which annotation is conducted. Existing supervised or semi-supervised annotation methods suffer from batch effects caused by different sequencing platforms, the effect of which increases in severity with multiple reference datasets. Results Herein, a robust deep learning based single-cell Multiple Reference Annotator (scMRA) is introduced. In scMRA, a knowledge graph is constructed to represent the characteristics of cell types in different datasets, and a graphic convolutional network (GCN) serves as a discriminator based on this graph. scMRA keeps intra-cell-type closeness and the relative position of cell types across datasets. scMRA is remarkably powerful at transferring knowledge from multiple reference datasets, to the unlabeled target domain, thereby gaining an advantage over other state-of-the-art annotation methods in multi-reference data experiments. Furthermore, scMRA can remove batch effects. To the best of our knowledge, this is the first attempt to use multiple insufficient reference datasets to annotate target data, and it is, comparatively, the best annotation method for multiple scRNA-seq datasets. Availability An implementation of scMRA is available from https://github.com/ddb-qiwang/scMRA-torch Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 116 (43) ◽  
pp. 21914-21924 ◽  
Author(s):  
Laura R. Lee ◽  
Diego L. Wengier ◽  
Dominique C. Bergmann

Plant cells maintain remarkable developmental plasticity, allowing them to clonally reproduce and to repair tissues following wounding; yet plant cells normally stably maintain consistent identities. Although this capacity was recognized long ago, our mechanistic understanding of the establishment, maintenance, and erasure of cellular identities in plants remains limited. Here, we develop a cell-type–specific reprogramming system that can be probed at the genome-wide scale for alterations in gene expression and histone modifications. We show that relationships among H3K27me3, H3K4me3, and gene expression in single cell types mirror trends from complex tissue, and that H3K27me3 dynamics regulate guard cell identity. Further, upon initiation of reprogramming, guard cells induce H3K27me3-mediated repression of a regulator of wound-induced callus formation, suggesting that cells in intact tissues may have mechanisms to sense and resist inappropriate dedifferentiation. The matched ChIP-sequencing (seq) and RNA-seq datasets created for this analysis also serve as a resource enabling inquiries into the dynamic and global-scale distribution of histone modifications in single cell types in plants.


2019 ◽  
Vol 36 (7) ◽  
pp. 2293-2294
Author(s):  
Xiao Tan ◽  
Andrew Su ◽  
Minh Tran ◽  
Quan Nguyen

Abstract Motivation Spatial transcriptomics (ST) technology is increasingly being applied because it enables the measurement of spatial gene expression in an intact tissue along with imaging morphology of the same tissue. However, current analysis methods for ST data do not use image pixel information, thus missing the quantitative links between gene expression and tissue morphology. Results We developed a user-friendly deep learning software, SpaCell, to integrate millions of pixel intensity values with thousands of gene expression measurements from spatially barcoded spots in a tissue. We show the integration approach outperforms the use of gene-count data alone or imaging data alone to build deep learning models to identify cell types or predict labels of tissue images with high resolution and accuracy. Availability and implementation The SpaCell package is open source under an MIT licence and it is available at https://github.com/BiomedicalMachineLearning/SpaCell. Supplementary information Supplementary data are available at Bioinformatics online.


Diagnostics ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 66
Author(s):  
Yung-Hsien Hsieh ◽  
Fang-Rong Hsu ◽  
Seng-Tong Dai ◽  
Hsin-Ya Huang ◽  
Dar-Ren Chen ◽  
...  

In this study, we applied semantic segmentation using a fully convolutional deep learning network to identify characteristics of the Breast Imaging Reporting and Data System (BI-RADS) lexicon from breast ultrasound images to facilitate clinical malignancy tumor classification. Among 378 images (204 benign and 174 malignant images) from 189 patients (102 benign breast tumor patients and 87 malignant patients), we identified seven malignant characteristics related to the BI-RADS lexicon in breast ultrasound. The mean accuracy and mean IU of the semantic segmentation were 32.82% and 28.88, respectively. The weighted intersection over union was 85.35%, and the area under the curve was 89.47%, showing better performance than similar semantic segmentation networks, SegNet and U-Net, in the same dataset. Our results suggest that the utilization of a deep learning network in combination with the BI-RADS lexicon can be an important supplemental tool when using ultrasound to diagnose breast malignancy.


2019 ◽  
Author(s):  
Xiao Tan ◽  
Andrew Su ◽  
Minh Tran ◽  
Quan Nguyen

AbstractMotivationSpatial transcriptomics technology is increasingly being applied because it enables the measurement of spatial gene expression in an intact tissue along with imaging morphology of the same tissue. However, current analysis methods for spatial transcriptomics data do not use image pixel information, thus missing the quantitative links between gene expression and tissue morphology.ResultsWe developed an user-friendly deep learning software, SpaCell, to integrate millions of pixel intensity values with thousands of gene expression measurements from spatially-barcoded spots in a tissue. We show the integration approach outperforms the use of gene count alone or imaging data alone to create deep learning models to identify cell types or predict labels of tissue images with high resolution and accuracy.AvailabilityThe SpaCell package is open source under a MIT license and it is available at https://github.com/BiomedicalMachineLearning/[email protected]


Sign in / Sign up

Export Citation Format

Share Document