scholarly journals Advantages of using graph databases to explore chromatin conformation capture experiments

2021 ◽  
Vol 22 (S2) ◽  
Author(s):  
Daniele D’Agostino ◽  
Pietro Liò ◽  
Marco Aldinucci ◽  
Ivan Merelli

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Author(s):  
Fu-Ying Dao ◽  
Hao Lv ◽  
Dan Zhang ◽  
Zi-Mei Zhang ◽  
Li Liu ◽  
...  

Abstract The protein Yin Yang 1 (YY1) could form dimers that facilitate the interaction between active enhancers and promoter-proximal elements. YY1-mediated enhancer–promoter interaction is the general feature of mammalian gene control. Recently, some computational methods have been developed to characterize the interactions between DNA elements by elucidating important features of chromatin folding; however, no computational methods have been developed for identifying the YY1-mediated chromatin loops. In this study, we developed a deep learning algorithm named DeepYY1 based on word2vec to determine whether a pair of YY1 motifs would form a loop. The proposed models showed a high prediction performance (AUCs$\ge$0.93) on both training datasets and testing datasets in different cell types, demonstrating that DeepYY1 has an excellent performance in the identification of the YY1-mediated chromatin loops. Our study also suggested that sequences play an important role in the formation of YY1-mediated chromatin loops. Furthermore, we briefly discussed the distribution of the replication origin site in the loops. Finally, a user-friendly web server was established, and it can be freely accessed at http://lin-group.cn/server/DeepYY1.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
W. J. Pereira ◽  
F. M. Almeida ◽  
D. Conde ◽  
K. M. Balmant ◽  
P. M. Triozzi ◽  
...  

Abstract Background Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of transcriptomes, arising as a powerful tool for discovering and characterizing cell types and their developmental trajectories. However, scRNA-seq analysis is complex, requiring a continuous, iterative process to refine the data and uncover relevant biological information. A diversity of tools has been developed to address the multiple aspects of scRNA-seq data analysis. However, an easy-to-use web application capable of conducting all critical steps of scRNA-seq data analysis is still lacking. Summary We present Asc-Seurat, a feature-rich workbench, providing an user-friendly and easy-to-install web application encapsulating tools for an all-encompassing and fluid scRNA-seq data analysis. Asc-Seurat implements functions from the Seurat package for quality control, clustering, and genes differential expression. In addition, Asc-Seurat provides a pseudotime module containing dozens of models for the trajectory inference and a functional annotation module that allows recovering gene annotation and detecting gene ontology enriched terms. We showcase Asc-Seurat’s capabilities by analyzing a peripheral blood mononuclear cell dataset. Conclusions Asc-Seurat is a comprehensive workbench providing an accessible graphical interface for scRNA-seq analysis by biologists. Asc-Seurat significantly reduces the time and effort required to analyze and interpret the information in scRNA-seq datasets.


2019 ◽  
Author(s):  
Ji Hyun Bak ◽  
Min Hyeok Kim ◽  
Lei Liu ◽  
Changbong Hyeon

AbstractIdentifying chromatin domains (CDs) from high-throughput chromosome conformation capture (Hi-C) data is currently a central problem in genome research. Here we present a unified algorithm, Multi-CD, which infers CDs at various genomic scales by leveraging the information from Hi-C. By integrating a model of the chromosome from polymer physics, statistical physics-based clustering analysis, and Bayesian inference, Multi-CD identifies the CDs that best represent the global pattern of correlation manifested in Hi-C. The multi-scale intra-chromosomal structures compared across different cell types allow us to glean the principles of chromatin organization: (i) Sub-TADs, TADs, and meta-TADs constitute a robust hierarchical structure. (ii) The assemblies of compartments and TAD-based domains are governed by different organizational principles. (iii) Sub-TADs are the common building blocks of chromosome architecture. CDs obtained from Multi-CD applied to Hi-C data enable a quantitative and comparative analysis of chromosome organization in different cell types, providing glimpses into structure-function relationship in genome.


2021 ◽  
Author(s):  
Sheng Zhu ◽  
Qiwei Lian ◽  
Wenbin Ye ◽  
Wei Qin ◽  
Zhe Wu ◽  
...  

Abstract Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3′-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11333
Author(s):  
Daniyar Karabayev ◽  
Askhat Molkenov ◽  
Kaiyrgali Yerulanuly ◽  
Ilyas Kabimoldayev ◽  
Asset Daniyarov ◽  
...  

Background High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. Results Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher).


2019 ◽  
Author(s):  
Mikhail Pomaznoy ◽  
Brendan Ha ◽  
Bjoern Peters

AbstractAnalysis of transcriptomic data derived from blood samples is complicated by the complex mixture of cell types such samples contain. Transcriptomic signatures derived from such samples are often driven by a particular cell lineage within the mixture. Identifying this most contributing lineage can help to provide a biological interpretation of the signature. We created a web application CellTypeScore which quantifies and visually represents the expression level of signature genes in common blood cell types. This is done by constructing an interactive stacked bar plot with the bars representing expression of genes across blood cell types. Summed scores serve as a measure of how highly the combined signature is expressed in different cell types. An online version of the application can be found at https://tools.dice-database.org/celltypescore/.


2016 ◽  
Author(s):  
Vijay Ramani ◽  
Xinxian Deng ◽  
Kevin L Gunderson ◽  
Frank J Steemers ◽  
Christine M Disteche ◽  
...  

AbstractWe present combinatorial single cell Hi-C, a novel method that leverages combinatorial cellular indexing to measure chromosome conformation in large numbers of single cells. In this proof-of-concept, we generate and sequence combinatorial single cell Hi-C libraries for two mouse and four human cell types, comprising a total of 9,316 single cells across 5 experiments. We demonstrate the utility of single-cell Hi-C data in separating different cell types, identify previously uncharacterized cell-to-cell heterogeneity in the conformational properties of mammalian chromosomes, and demonstrate that combinatorial indexing is a generalizable molecular strategy for single-cell genomics.


2021 ◽  
Author(s):  
Juexiao Zhou ◽  
Bin Zhang ◽  
Haoyang Li ◽  
Longxi Zhou ◽  
Zhongxiao Li ◽  
...  

The accurate annotation of TSSs and their usage is critical for the mechanistic understanding of gene regulation under different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner. Various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these tools have drastic false positive predictions when applied on the genome-scale. Here, we present DeeReCT-TSS, a deep-learning-based method that is capable of TSSs identification across the whole genome based on DNA sequences and conventional RNA-seq data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous transcription start site (TSS) annotation on 10 cell types, which enables the identification of cell-type-specific TSS. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets from the ENCODE project by correlating our predicted TSSs with experimentally defined TSS chromatin states.


2021 ◽  
Author(s):  
WJ Pereira ◽  
FM Almeida ◽  
KM Balmant ◽  
DC Rodriguez ◽  
PM Triozzi ◽  
...  

AbstractSummarySingle-cell RNA sequencing (scRNA-seq) has become a popular approach for studying the transcriptome, providing a powerful tool for discovering and characterizing cell types and their developmental trajectories. However, scRNA-seq analysis is complex, requiring a continuous, iterative process to refine the data processing and uncover relevant biological information. We present Asc-Seurat, a feature rich workbench, providing a user-friendly and easy-to-install web application encapsulating the necessary tools for an all-encompassing and fluid scRNA-seq data analysis.Availability and implementationAsc-Seurat is available at https://github.com/KirstLab/asc_seurat/ and released under GNU 3 [email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 48 (3) ◽  
pp. 1131-1145
Author(s):  
She Zhang ◽  
Fangyuan Chen ◽  
Ivet Bahar

Abstract Advances in chromosome conformation capture techniques as well as computational characterization of genomic loci structural dynamics open new opportunities for exploring the mechanistic aspects of genome-scale differences across different cell types. We examined here the dynamic basis of variabilities between different cell types by investigating their chromatin mobility profiles inferred from Hi-C data using an elastic network model representation of the chromatin. Our comparative analysis of sixteen cell lines reveals close similarities between chromosomal dynamics across different cell lines on a global scale, but notable cell-specific variations emerge in the detailed spatial mobilities of genomic loci. Closer examination reveals that the differences in spatial dynamics mainly originate from the difference in the frequencies of their intrinsically accessible modes of motion. Thus, even though the chromosomes of different types of cells have access to similar modes of collective movements, not all modes are deployed by all cells, such that the effective mobilities and cross-correlations of genomic loci are cell-type-specific. Comparison with RNA-seq expression data reveals a strong overlap between highly expressed genes and those distinguished by high mobilities in the present study, in support of the role of the intrinsic spatial dynamics of chromatin as a determinant of cell differentiation.


Sign in / Sign up

Export Citation Format

Share Document