scholarly journals A supervised learning framework for chromatin loop detection in genome-wide contact maps

2019 ◽  
Author(s):  
Tarik J. Salameh ◽  
Xiaotao Wang ◽  
Fan Song ◽  
Bo Zhang ◽  
Sage M. Wright ◽  
...  

ABSTRACTAccurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepen our understanding of proper gene regulation events. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of a wide variety of orthogonal data types such as ChIA-PET, GAM, SPRITE, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. Compared with current enrichment-based approaches, Peakachu identified more meaningful short-range interactions. We show that our models perform well in different platforms such as Hi-C, Micro-C, and DNA SPRITE, across different sequencing depths, and across different species. We applied this framework to systematically predict chromatin loops in 56 Hi-C datasets, and the results are available at the 3D Genome Browser (www.3dgenome.org).

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Tarik J. Salameh ◽  
Xiaotao Wang ◽  
Fan Song ◽  
Bo Zhang ◽  
Sage M. Wright ◽  
...  

Author(s):  
Robert A. Beagrie ◽  
Christoph J. Thieme ◽  
Carlo Annunziatella ◽  
Catherine Baugher ◽  
Yingnan Zhang ◽  
...  

Summary (Abstract)Technologies for measuring 3D genome topology are increasingly important for studying mechanisms of gene regulation, for genome assembly and for mapping of genome rearrangements. Hi-C and other ligation-based methods have become routine but have specific biases. Here, we develop multiplex-GAM, a faster and more affordable version of Genome Architecture Mapping (GAM), a ligation-free technique to map chromatin contacts genomewide. We perform a detailed comparison of contacts obtained by multiplex-GAM and Hi-C using mouse embryonic stem (mES) cells. We find that both methods detect similar topologically associating domains (TADs). However, when examining the strongest contacts detected by either method, we find that only one third of these are shared. The strongest contacts specifically found in GAM often involve “active” regions, including many transcribed genes and super-enhancers, whereas in Hi-C they more often contain “inactive” regions. Our work shows that active genomic regions are involved in extensive complex contacts that currently go under-estimated in genome-wide ligation-based approaches, and highlights the need for orthogonal advances in genome-wide contact mapping technologies.


2019 ◽  
Author(s):  
Qiang Wu ◽  
Ya Guo ◽  
Yujia Lu ◽  
Jingwei Li ◽  
Yonghu Wu ◽  
...  

ABSTRACTCTCF is a key insulator-binding protein and mammalian genomes contain numerous CTCF-binding sites (CBSs), many of which are organized in tandem arrays. Here we provide direct evidence that CBSs, if located between enhancers and promoters in the Pcdhα and β-globin clusters, function as an enhancer-blocking insulator by forming distinct directional chromatin loops, regardless whether enhancers contain CBS or not. Moreover, computational simulation and experimental capture revealed balanced promoter usage in cell populations and stochastic monoallelic expression in single cells by large arrays of tandem variable CBSs. Finally, gene expression levels are negatively correlated with CBS insulators located between enhancers and promoters on a genome-wide scale. Thus, single CBS insulators ensure proper enhancer insulation and promoter activation while tandem-arrayed CBS insulators determine balanced promoter usage. This finding has interesting implications on the role of topological insulators in 3D genome folding and developmental gene regulation.


2014 ◽  
Vol 30 (15) ◽  
pp. 2105-2113 ◽  
Author(s):  
Hervé Marie-Nelly ◽  
Martial Marbouty ◽  
Axel Cournac ◽  
Gianni Liti ◽  
Gilles Fischer ◽  
...  

Soft Matter ◽  
2015 ◽  
Vol 11 (5) ◽  
pp. 1019-1025 ◽  
Author(s):  
Leonid I. Nazarov ◽  
Mikhail V. Tamm ◽  
Vladik A. Avetisov ◽  
Sergei K. Nechaev

A statistical model describing a fine structure of the intra-chromosome maps obtained by a genome-wide chromosome conformation capture method (Hi–C) is proposed.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Cyril Matthey-Doret ◽  
Lyam Baudry ◽  
Axel Breuer ◽  
Rémi Montagne ◽  
Nadège Guiglielmoni ◽  
...  

AbstractChromosomes of all species studied so far display a variety of higher-order organisational features, such as self-interacting domains or loops. These structures, which are often associated to biological functions, form distinct, visible patterns on genome-wide contact maps generated by chromosome conformation capture approaches such as Hi-C. Here we present Chromosight, an algorithm inspired from computer vision that can detect patterns in contact maps. Chromosight has greater sensitivity than existing methods on synthetic simulated data, while being faster and applicable to any type of genomes, including bacteria, viruses, yeasts and mammals. Our method does not require any prior training dataset and works well with default parameters on data generated with various protocols.


2015 ◽  
Author(s):  
Joshua Robert Puzey ◽  
John H Willis ◽  
John K Kelly

Across western North America, Mimulus guttatus exists as many local populations adapted to site-specific challenges including salt spray, temperature, water availability, and soil chemistry. Gene flow between locally adapted populations will effect genetic diversity in both local demes and across the larger meta-population. A single population of annual M. guttatus from Iron Mountain, Oregon (IM) has been extensively studied and we here building off this research by analyzing whole genome sequences from 34 inbred lines from IM in conjunction with sequences from 22 Mimulus individuals from across the geographic range. Three striking features of these data address hypotheses about migration and selection in a locally adapted population. First, we find very high intra-population polymorphism (synonymous π = 0.033). Variation outside genes may be even higher, but is difficult to estimate because excessive divergence affects read mapping. Second, IM exhibits a significantly positive genome-wide average for Tajima's D. This indicates allele frequencies are typically more intermediate than expected from neutrality, opposite the pattern observed in other species. Third, IM exhibits a distinctive haplotype structure. There is a genome-wide excess of positive associations between minor alleles; consistent with an important effect of gene flow from nearby Mimulus populations. The combination of multiple data types, including a novel, tree-based analytic method and estimates for structural polymorphism (inversions) from previous genetic mapping studies, illustrates how the balance of strong local selection, limited dispersal, and meta-population dynamics manifests across the genome.


2021 ◽  
Author(s):  
Kim Philipp Jablonski ◽  
Leopold Carron ◽  
Julien Mozziconacci ◽  
Thierry Forné ◽  
Marc-Thorsten Hütt ◽  
...  

Abstract Background Genome-wide association studies have identified statistical associations between various diseases, including cancers, and a large number of single-nucleotide polymorphisms (SNPs). However, they provide no direct explanation of the mechanisms underlying the association. Based on the recent discovery that changes in 3-dimensional genome organization may have functional consequences on gene regulation favoring diseases, we investigated systematically the genome-wide distribution of disease-associated SNPs with respect to a specific feature of 3D genome organization: topologically-associating domains (TADs) and their borders. Results For each of 449 diseases, we tested whether the associated SNPs are present in TAD borders more often than observed by chance, where chance (i.e. the null model in statistical terms) corresponds to the same number of pointwise loci drawn at random either in the entire genome, or in the entire set of disease-associated SNPs listed in the GWAS catalog. Our analysis shows that a fraction of diseases displays such a preferential localization of their risk loci. Moreover, cancers are relatively more frequent among these diseases, and this predominance is generally enhanced when considering only intergenic SNPs. The structure of SNP-based diseasome networks confirms that localization of risk loci in TAD borders differ between cancers and non-cancer diseases. Furthermore, different TAD border enrichments are observed in embryonic stem cells and differentiated cells, consistent with changes in topological domains along embryogenesis and delineating their contribution to disease risk. Conclusions Our results suggest that, for certain diseases, part of the genetic risk lies in a local genetic variation affecting the genome partitioning in topologically-insulated domains. Investigating this possible contribution to genetic risk is particularly relevant in cancers. This study thus opens a way of interpreting genome-wide association studies, by distinguishing two types of disease-associated SNPs: one with a direct effect on an individual gene, the other acting in interplay with 3D genome organization.


2022 ◽  
Vol 16 (1) ◽  
Author(s):  
Kim Philipp Jablonski ◽  
Leopold Carron ◽  
Julien Mozziconacci ◽  
Thierry Forné ◽  
Marc-Thorsten Hütt ◽  
...  

Abstract Background Genome-wide association studies have identified statistical associations between various diseases, including cancers, and a large number of single-nucleotide polymorphisms (SNPs). However, they provide no direct explanation of the mechanisms underlying the association. Based on the recent discovery that changes in three-dimensional genome organization may have functional consequences on gene regulation favoring diseases, we investigated systematically the genome-wide distribution of disease-associated SNPs with respect to a specific feature of 3D genome organization: topologically associating domains (TADs) and their borders. Results For each of 449 diseases, we tested whether the associated SNPs are present in TAD borders more often than observed by chance, where chance (i.e., the null model in statistical terms) corresponds to the same number of pointwise loci drawn at random either in the entire genome, or in the entire set of disease-associated SNPs listed in the GWAS catalog. Our analysis shows that a fraction of diseases displays such a preferential localization of their risk loci. Moreover, cancers are relatively more frequent among these diseases, and this predominance is generally enhanced when considering only intergenic SNPs. The structure of SNP-based diseasome networks confirms that localization of risk loci in TAD borders differs between cancers and non-cancer diseases. Furthermore, different TAD border enrichments are observed in embryonic stem cells and differentiated cells, consistent with changes in topological domains along embryogenesis and delineating their contribution to disease risk. Conclusions Our results suggest that, for certain diseases, part of the genetic risk lies in a local genetic variation affecting the genome partitioning in topologically insulated domains. Investigating this possible contribution to genetic risk is particularly relevant in cancers. This study thus opens a way of interpreting genome-wide association studies, by distinguishing two types of disease-associated SNPs: one with an effect on an individual gene, the other acting in interplay with 3D genome organization.


Sign in / Sign up

Export Citation Format

Share Document