scholarly journals multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments

2019 ◽  
Vol 35 (17) ◽  
pp. 2916-2923 ◽  
Author(s):  
John C Stansfield ◽  
Kellen G Cresswell ◽  
Mikhail G Dozmorov

Abstract Motivation With the development of chromatin conformation capture technology and its high-throughput derivative Hi-C sequencing, studies of the three-dimensional interactome of the genome that involve multiple Hi-C datasets are becoming available. To account for the technology-driven biases unique to each dataset, there is a distinct need for methods to jointly normalize multiple Hi-C datasets. Previous attempts at removing biases from Hi-C data have made use of techniques which normalize individual Hi-C datasets, or, at best, jointly normalize two datasets. Results Here, we present multiHiCcompare, a cyclic loess regression-based joint normalization technique for removing biases across multiple Hi-C datasets. In contrast to other normalization techniques, it properly handles the Hi-C-specific decay of chromatin interaction frequencies with the increasing distance between interacting regions. multiHiCcompare uses the general linear model framework for comparative analysis of multiple Hi-C datasets, adapted for the Hi-C-specific decay of chromatin interaction frequencies. multiHiCcompare outperforms other methods when detecting a priori known chromatin interaction differences from jointly normalized datasets. Applied to the analysis of auxin-treated versus untreated experiments, and CTCF depletion experiments, multiHiCcompare was able to recover the expected epigenetic and gene expression signatures of loss of chromatin interactions and reveal novel insights. Availability and implementation multiHiCcompare is freely available on GitHub and as a Bioconductor R package https://bioconductor.org/packages/multiHiCcompare. Supplementary information Supplementary data are available at Bioinformatics online.

2017 ◽  
Author(s):  
John C. Stansfield ◽  
Mikhail G. Dozmorov

AbstractChanges in spatial chromatin interactions are now emerging as a unifying mechanism or-chestrating regulation of gene expression. Evolution of chromatin conformation capture methods into Hi-C sequencing technology now allows an insight into chromatin interactions on a genome-wide scale. However, Hi-C data contains many DNA sequence- and technology-driven biases. These biases prevent effective comparison of chromatin interactions aimed at identifying genomic regions differentially interacting between, disease-normal states or different cell types. Several methods have been developed for normalizing individual Hi-C datasets. However, they fail to account for biases between two or more Hi-C datasets, hindering comparative analysis of chromatin interactions. We developed a simple and effective method HiCcompare for the joint normalization and differential analysis of multiple Hi-C datasets. The method avoids constraining Hi-C data within a rigid statistical model, allowing a data-driven normalization of biases using locally weighted linear regression (loess). The method identifies region-specific chromatin interaction changes complementary to changes due to large-scale genomic rearrangements, such as copy number variants (CNVs). HiCcompare outperforms methods for normalizing individual Hi-C datasets in detecting a priori known chromatin interaction differences in simulated and real-life settings while detecting biologically relevant changes. HiCcompare is freely available as a Bioconductor R package https://bioconductor.org/packages/HiCcompare/.Author SummaryAdvances in chromosome conformation capture sequencing technologies (Hi-C) have sparked interest in studying the 3-dimensional (3D) chromatin interaction structure of the human genome. The 3D structure of the genome is now considered as a primary regulator of gene expression. Changes to the 3D chromatin interactions are now emerging as a hallmark of cancer and other complex diseases. With the growing availability of Hi-C data generated under different conditions (e.g. tumor-normal, cell-type-specific), methods are needed to compare them. However, biases in Hi-C data hinder their comparative analysis. To account for biases, several normalization techniques have been developed for removing biases in individual Hi-C datasets, but very few were designed to account for between-datasets biases. We developed a new method and R package HiCcompare for the joint normalization of multiple Hi-C datasets and differential chromatin interaction detection. Our results show the superiority of our joint normalization methods compared to methods for normalizing individual datasets in detecting true chromatin interaction changes. HiCcompare enables further research into discovering the dynamics of 3D genomic changes.


2019 ◽  
Vol 35 (14) ◽  
pp. i145-i153 ◽  
Author(s):  
Abbas Roayaei Ardakany ◽  
Ferhat Ay ◽  
Stefano Lonardi

AbstractMotivationHigh-throughput conformation capture experiments, such as Hi-C provide genome-wide maps of chromatin interactions, enabling life scientists to investigate the role of the three-dimensional structure of genomes in gene regulation and other essential cellular functions. A fundamental problem in the analysis of Hi-C data is how to compare two contact maps derived from Hi-C experiments. Detecting similarities and differences between contact maps are critical in evaluating the reproducibility of replicate experiments and for identifying differential genomic regions with biological significance. Due to the complexity of chromatin conformations and the presence of technology-driven and sequence-specific biases, the comparative analysis of Hi-C data is analytically and computationally challenging.ResultsWe present a novel method called Selfish for the comparative analysis of Hi-C data that takes advantage of the structural self-similarity in contact maps. We define a novel self-similarity measure to design algorithms for (i) measuring reproducibility for Hi-C replicate experiments and (ii) finding differential chromatin interactions between two contact maps. Extensive experimental results on simulated and real data show that Selfish is more accurate and robust than state-of-the-art methods.Availability and implementationhttps://github.com/ucrbioinfo/Selfish


2020 ◽  
Author(s):  
Timothy Kunz ◽  
Lila Rieber ◽  
Shaun Mahony

ABSTRACTFew existing methods enable the visualization of relationships between regulatory genomic activities and genome organization as captured by Hi-C experimental data. Genome-wide Hi-C datasets are often displayed using “heatmap” matrices, but it is difficult to intuit from these heatmaps which biochemical activities are compartmentalized together. High-dimensional Hi-C data vectors can alternatively be projected onto three-dimensional space using dimensionality reduction techniques. The resulting three-dimensional structures can serve as scaffolds for projecting other forms of genomic information, thereby enabling the exploration of relationships between genome organization and various genome annotations. However, while three-dimensional models are contextually appropriate for chromatin interaction data, some analyses and visualizations may be more intuitively and conveniently performed in two-dimensional space.We present a novel approach to the visualization and analysis of chromatin organization based on the Self-Organizing Map (SOM). The SOM algorithm provides a two-dimensional manifold which adapts to represent the high dimensional chromatin interaction space. The resulting data structure can then be used to assess the relationships between regulatory genomic activities and chromatin interactions. For example, given a set of genomic coordinates corresponding to a given biochemical activity, the degree to which this activity is segregated or compartmentalized in chromatin interaction space can be intuitively visualized on the 2D SOM grid and quantified using Lorenz curve analysis. We demonstrate our approach for exploratory analysis of genome compartmentalization in a high-resolution Hi-C dataset from the human GM12878 cell line. Our SOM-based approach provides an intuitive visualization of the large-scale structure of Hi-C data and serves as a platform for integrative analyses of the relationships between various genomic activities and genome organization.


Author(s):  
Tiago Olivoto ◽  
Maicon Nardino

Abstract Motivation Multivariate data are common in biological experiments and using the information on multiple traits is crucial to make better decisions for treatment recommendations or genotype selection. However, identifying genotypes/treatments that combine high performance across many traits has been a challenger task. Classical linear multi-trait selection indexes are available, but the presence of multicollinearity and the arbitrary choosing of weighting coefficients may erode the genetic gains. Results We propose a novel approach for genotype selection and treatment recommendation based on multiple traits that overcome the fragility of classical linear indexes. Here, we use the distance between the genotypes/treatment with an ideotype defined a priori as a multi-trait genotype–ideotype distance index (MGIDI) to provide a selection process that is unique, easy-to-interpret, free from weighting coefficients and multicollinearity issues. The performance of the MGIDI index is assessed through a Monte Carlo simulation study where the percentage of success in selecting traits with desired gains is compared with classical and modern indexes under different scenarios. Two real plant datasets are used to illustrate the application of the index from breeders and agronomists’ points of view. Our experimental results indicate that MGIDI can effectively select superior treatments/genotypes based on multi-trait data, outperforming state-of-the-art methods, and helping practitioners to make better strategic decisions toward an effective multivariate selection in biological experiments. Availability and implementation The source code is available in the R package metan (https://github.com/TiagoOlivoto/metan) under the function mgidi(). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Gustavo A. Ruiz Buendía ◽  
Marion Leleu ◽  
Flavia Marzetta ◽  
Ludovica Vanzan ◽  
Jennifer Y. Tan ◽  
...  

AbstractExpanded CAG/CTG repeats underlie thirteen neurological disorders, including myotonic dystrophy (DM1) and Huntington’s disease (HD). Upon expansion, CAG/CTG repeat loci acquire heterochromatic characteristics. This observation raises the hypothesis that repeat expansion provokes changes to higher order chromatin folding and thereby affects both gene expression in cis and the genetic instability of the repeat tract. Here we tested this hypothesis directly by performing 4C sequencing at the DMPK and HTT loci from DM1 and HD patient-derived cells. Surprisingly, chromatin contacts remain unchanged upon repeat expansion at both loci. This was true for loci with different DNA methylation levels and CTCF binding. Repeat sizes ranging from 15 to 1,700 displayed strikingly similar chromatin interaction profiles. Our findings argue that extensive changes in heterochromatic properties are not enough to alter chromatin folding at expanded CAG/CTG repeat loci. Moreover, the ectopic insertion of an expanded repeat tract did not change three-dimensional chromatin contacts. We conclude that expanded CAG/CTG repeats have little to no effect on chromatin conformation.


2017 ◽  
Author(s):  
Adrian Zetner ◽  
Jennifer Cabral ◽  
Laura Mataseje ◽  
Natalie C Knox ◽  
Philip Mabon ◽  
...  

AbstractSummaryComparative analysis of bacterial plasmids from whole genome sequence (WGS) data generated from short read sequencing is challenging. This is due to the difficulty in identifying contigs harbouring plasmid sequence data, and further difficulty in assembling such contigs into a full plasmid. As such, few software programs and bioinformatics pipelines exist to perform comprehensive comparative analyses of plasmids within and amongst sequenced isolates. To address this gap, we have developed Plasmid Profiler, a pipeline to perform comparative plasmid content analysis without the need forde novoassembly. The pipeline is designed to rapidly identify plasmid sequences by mapping reads to a plasmid reference sequence database. Predicted plasmid sequences are then annotated with their incompatibility group, if known. The pipeline allows users to query plasmids for genes or regions of interest and visualize results as an interactive heat map.Availability and ImplementationPlasmid Profiler is freely available software released under the Apache 2.0 open source software license. A stand-alone version of the entire Plasmid Profiler pipeline is available as a Docker container athttps://hub.docker.com/r/phacnml/plasmidprofiler_0_1_6/.The conda recipe for the Plasmid R package is available at:https://anaconda.org/bioconda/r-plasmidprofilerThe custom Plasmid Profiler R package is also available as a CRAN package athttps://cran.r-project.org/web/packages/Plasmidprofiler/index.htmlGalaxy tools associated with the pipeline are available as a Galaxy tool suite athttps://toolshed.g2.bx.psu.edu/repository?repository_id=55e082200d16a504The source code is available at:https://github.com/phac-nml/plasmidprofilerThe Galaxy implementation is available at:https://github.com/phac-nml/plasmidprofiler-galaxyContactEmail:[email protected]: National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, CanadaSupplementary informationDocumentation:http://plasmid-profiler.readthedocs.io/en/latest/


2020 ◽  
Vol 6 (27) ◽  
pp. eaaz4012 ◽  
Author(s):  
Gustavo A. Ruiz Buendía ◽  
Marion Leleu ◽  
Flavia Marzetta ◽  
Ludovica Vanzan ◽  
Jennifer Y. Tan ◽  
...  

Expanded CAG/CTG repeats underlie 13 neurological disorders, including myotonic dystrophy type 1 (DM1) and Huntington’s disease (HD). Upon expansion, disease loci acquire heterochromatic characteristics, which may provoke changes to chromatin conformation and thereby affect both gene expression and repeat instability. Here, we tested this hypothesis by performing 4C sequencing at the DMPK and HTT loci from DM1 and HD–derived cells. We find that allele sizes ranging from 15 to 1700 repeats displayed similar chromatin interaction profiles. This was true for both loci and for alleles with different DNA methylation levels and CTCF binding. Moreover, the ectopic insertion of an expanded CAG repeat tract did not change the conformation of the surrounding chromatin. We conclude that CAG/CTG repeat expansions are not enough to alter chromatin conformation in cis. Therefore, it is unlikely that changes in chromatin interactions drive repeat instability or changes in gene expression in these disorders.


2019 ◽  
Author(s):  
David Gerard ◽  
Luís Felipe Ventorim Ferrão

Abstract Motivation Empirical Bayes techniques to genotype polyploid organisms usually either (i) assume technical artifacts are known a priori or (ii) estimate technical artifacts simultaneously with the prior genotype distribution. Case (i) is unappealing as it places the onus on the researcher to estimate these artifacts, or to ensure that there are no systematic biases in the data. However, as we demonstrate with a few empirical examples, case (ii) makes choosing the class of prior genotype distributions extremely important. Choosing a class that is either too flexible or too restrictive results in poor genotyping performance. Results We propose two classes of prior genotype distributions that are of intermediate levels of flexibility: the class of proportional normal distributions and the class of unimodal distributions. We provide a complete characterization of and optimization details for the class of unimodal distributions. We demonstrate, using both simulated and real data, that using these classes results in superior genotyping performance. Availability and implementation Genotyping methods that use these priors are implemented in the updog R package available on the Comprehensive R Archive Network: https://cran.r-project.org/package=updog. All code needed to reproduce the results of this paper is available on GitHub: https://github.com/dcgerard/reproduce\_prior\_sims. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Abbas Roayaei Ardakany ◽  
Ferhat Ay ◽  
Stefano Lonardi

AbstractMotivationHigh-throughput conformation capture experiments such as Hi-C provide genome-wide maps of chromatin interactions, enabling life scientists to investigate the role of the three-dimensional structure of genomes in gene regulation and other essential cellular functions. A fundamental problem in the analysis of Hi-C data is how to compare two contact maps derived from Hi-C experiments. Detecting similarities and differences between contact maps is critical in evaluating the reproducibility of replicate experiments and identifying differential genomic regions with biological significance. Due to the complexity of chromatin conformations and the presence of technology-driven and sequence-specific biases, the comparative analysis of Hi-C data is analytically and computationally challenging.ResultsWe present a novel method called Selfish for the comparative analysis of Hi-C data that takes advantage of the structural self-similarity in contact maps. We define a novel self-similarity measure to design algorithms for (i) measuring reproducibility for Hi-C replicate experiments and (ii) finding differential chromatin interactions between two contact maps. Extensive experimental results on simulated and real data show that Selfish is more accurate and robust than state-of-the-art methods.Availabilityhttps://github.com/ucrbioinfo/[email protected] and [email protected]


2021 ◽  
Vol 12 ◽  
Author(s):  
Sambhavi Animesh ◽  
Ruchi Choudhary ◽  
Bertrand Jern Han Wong ◽  
Charlotte Tze Jia Koh ◽  
Xin Yi Ng ◽  
...  

Nasopharyngeal cancer (NPC), a cancer derived from epithelial cells in the nasopharynx, is a cancer common in China, Southeast Asia, and Africa. The three-dimensional (3D) genome organization of nasopharyngeal cancer is poorly understood. A major challenge in understanding the 3D genome organization of cancer samples is the lack of a method for the characterization of chromatin interactions in solid cancer needle biopsy samples. Here, we developed Biop-C, a modified in situ Hi-C method using solid cancer needle biopsy samples. We applied Biop-C to characterize three nasopharyngeal cancer solid cancer needle biopsy patient samples. We identified topologically associated domains (TADs), chromatin interaction loops, and frequently interacting regions (FIREs) at key oncogenes in nasopharyngeal cancer from the Biop-C heatmaps. We observed that the genomic features are shared at some important oncogenes, but the patients also display extensive heterogeneity at certain genomic loci. On analyzing the super enhancer landscape in nasopharyngeal cancer cell lines, we found that the super enhancers are associated with FIREs and can be linked to distal genes via chromatin loops in NPC. Taken together, our results demonstrate the utility of our Biop-C method in investigating 3D genome organization in solid cancers.


Sign in / Sign up

Export Citation Format

Share Document