scholarly journals Computer vision for pattern detection in chromosome contact maps

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Cyril Matthey-Doret ◽  
Lyam Baudry ◽  
Axel Breuer ◽  
Rémi Montagne ◽  
Nadège Guiglielmoni ◽  
...  

AbstractChromosomes of all species studied so far display a variety of higher-order organisational features, such as self-interacting domains or loops. These structures, which are often associated to biological functions, form distinct, visible patterns on genome-wide contact maps generated by chromosome conformation capture approaches such as Hi-C. Here we present Chromosight, an algorithm inspired from computer vision that can detect patterns in contact maps. Chromosight has greater sensitivity than existing methods on synthetic simulated data, while being faster and applicable to any type of genomes, including bacteria, viruses, yeasts and mammals. Our method does not require any prior training dataset and works well with default parameters on data generated with various protocols.

Author(s):  
Cyril Matthey-Doret ◽  
Lyam Baudry ◽  
Axel Breuer ◽  
Rémi Montagne ◽  
Nadège Guiglielmoni ◽  
...  

AbstractChromosomes of all species studied so far display a variety of higher order organizational features such as domains or loops often associated to biological functions and visible on Hi-C contact maps. We developed Chromosight, an algorithm inspired from computer vision that can detect patterns in Hi-C maps. Chromosight has greater sensitivity than existing methods, while being faster and applicable to any type of genomes, including bacteria, viruses, yeasts and mammals. Code and documentation: https://github.com/koszullab/chromosight


2019 ◽  
Author(s):  
Tarik J. Salameh ◽  
Xiaotao Wang ◽  
Fan Song ◽  
Bo Zhang ◽  
Sage M. Wright ◽  
...  

ABSTRACTAccurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepen our understanding of proper gene regulation events. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of a wide variety of orthogonal data types such as ChIA-PET, GAM, SPRITE, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. Compared with current enrichment-based approaches, Peakachu identified more meaningful short-range interactions. We show that our models perform well in different platforms such as Hi-C, Micro-C, and DNA SPRITE, across different sequencing depths, and across different species. We applied this framework to systematically predict chromatin loops in 56 Hi-C datasets, and the results are available at the 3D Genome Browser (www.3dgenome.org).


2014 ◽  
Vol 30 (15) ◽  
pp. 2105-2113 ◽  
Author(s):  
Hervé Marie-Nelly ◽  
Martial Marbouty ◽  
Axel Cournac ◽  
Gianni Liti ◽  
Gilles Fischer ◽  
...  

Soft Matter ◽  
2015 ◽  
Vol 11 (5) ◽  
pp. 1019-1025 ◽  
Author(s):  
Leonid I. Nazarov ◽  
Mikhail V. Tamm ◽  
Vladik A. Avetisov ◽  
Sergei K. Nechaev

A statistical model describing a fine structure of the intra-chromosome maps obtained by a genome-wide chromosome conformation capture method (Hi–C) is proposed.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Tarik J. Salameh ◽  
Xiaotao Wang ◽  
Fan Song ◽  
Bo Zhang ◽  
Sage M. Wright ◽  
...  

2017 ◽  
Vol 20 (4) ◽  
pp. 1205-1214
Author(s):  
Jincheol Park ◽  
Shili Lin

Abstract How chromosomes fold and how distal genomic elements interact with one another at a genomic scale have been actively pursued in the past decade following the seminal work describing the Chromosome Conformation Capture (3C) assay. Essentially, 3C-based technologies produce two-dimensional (2D) contact maps that capture interactions between genomic fragments. Accordingly, a plethora of analytical methods have been proposed to take a 2D contact map as input to recapitulate the underlying whole genome three-dimensional (3D) structure of the chromatin. However, their performance in terms of several factors, including data resolution and ability to handle contact map features, have not been sufficiently evaluated. This task is taken up in this article, in which we consider several recent and/or well-regarded methods, both optimization-based and model-based, for their aptness of producing 3D structures using contact maps generated based on a population of cells. These methods are evaluated and compared using both simulated and real data. Several criteria have been used. For simulated data sets, the focus is on accurate recapitulation of the entire structure given the existence of the gold standard. For real data sets, comparison with distances measured by Florescence in situ Hybridization and consistency with several genomic features of known biological functions are examined.


Author(s):  
Zhen Tian ◽  
Xiaodong Qin ◽  
Hui Wang ◽  
Ji Li ◽  
Jinfeng Chen

AbstractThe CONSTANS-like (COL) gene family is one of the plant-specific transcription factor families that play important roles in plant growth and development. However, the knowledge of COLs related in cucumber is limited, and their biological functions, especially in the photoperiod-dependent flowering process, are still unclear. In this study, twelve CsaCOL genes were identified in the cucumber genome. Phylogenetic and conserved motif analyses provided insights into the evolutionary relationship between the CsaCOLs. Further, the comparative genome analysis revealed that COL genes are conserved in different plant species, especially collinearity gene pairs related to CsaCOL5. Ten kinds of cis-acting elements were vividly detected in CsaCOLs promoter regions, including five light-responsive elements, which echo the diurnal rhythm expression patterns of seven CsaCOL genes under SD and LD photoperiod regimes. Combined with the expression data of developmental stage, three CsaCOL genes are involved in the flowering network and play pivotal roles for the floral induction process. Our results provide useful information for further elucidating the structural characteristics, expression patterns, and biological functions of COL family genes in many plants


Methods ◽  
2017 ◽  
Vol 123 ◽  
pp. 56-65 ◽  
Author(s):  
Houda Belaghzal ◽  
Job Dekker ◽  
Johan H. Gibcus

1999 ◽  
Vol 17 (S1) ◽  
pp. S621-S626
Author(s):  
Li Hsu ◽  
Corinne Aragaki ◽  
Filemon Quiaoit ◽  
Xiangjing Wang ◽  
Xiubin Xu ◽  
...  

2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


Sign in / Sign up

Export Citation Format

Share Document