scholarly journals 4C-ker: A method to reproducibly identify genome-wide interactions captured by 4C-Seq experiments

2015 ◽  
Author(s):  
Ramya Raviram ◽  
Pedro P. Rocha ◽  
Christian L. Müller ◽  
Emily R. Miraldi ◽  
Sana Badri ◽  
...  

ABSTRACT4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or “bait”) that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.AUTHORS SUMMARYCircularized chromosome conformation capture, or 4C-Seq is a technique developed to identify regions of the genome that are in close spatial proximity to a single locus of interest (‘bait’). This technique is used to detect regulatory interactions between promoters and enhancers and to characterize the nuclear environment of different regions within and across different cell types. So far, existing methods for 4C-Seq data analysis do not comprehensively identify interactions across the entire genome due to biases in the technique that are related to the decrease in 4C signal that results from increased 3D distance from the bait. To compensate for these weaknesses in existing methods we developed 4C-ker, a method that explicitly models these biases to improve the analysis of 4C-Seq to better understand the genome wide interaction profile of an individual locus.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Damien J. Downes ◽  
Robert A. Beagrie ◽  
Matthew E. Gosden ◽  
Jelena Telenius ◽  
Stephanie J. Carpenter ◽  
...  

AbstractChromosome conformation capture (3C) provides an adaptable tool for studying diverse biological questions. Current 3C methods generally provide either low-resolution interaction profiles across the entire genome, or high-resolution interaction profiles at limited numbers of loci. Due to technical limitations, generation of reproducible high-resolution interaction profiles has not been achieved at genome-wide scale. Here, to overcome this barrier, we systematically test each step of 3C and report two improvements over current methods. We show that up to 30% of reporter events generated using the popular in situ 3C method arise from ligations between two individual nuclei, but this noise can be almost entirely eliminated by isolating intact nuclei after ligation. Using Nuclear-Titrated Capture-C, we generate reproducible high-resolution genome-wide 3C interaction profiles by targeting 8055 gene promoters in erythroid cells. By pairing high-resolution 3C interaction calls with nascent gene expression we interrogate the role of promoter hubs and super-enhancers in gene regulation.


Author(s):  
Damien J. Downes ◽  
Matthew E. Gosden ◽  
Jelena Telenius ◽  
Stephanie J. Carpenter ◽  
Lea Nussbaum ◽  
...  

ABSTRACTChromosome conformation capture (3C) provides an adaptable tool for studying diverse biological questions. Current 3C methods provide either low-resolution interaction profiles across the entire genome, or high-resolution interaction profiles at up to several hundred loci. All 3C methods are affected to varying degrees by inefficiency, bias and noise. As such, generation of reproducible high-resolution interaction profiles has not been achieved at scale. To overcome this barrier, we systematically tested and improved upon current methods. We show that isolation of 3C libraries from intact nuclei, as well as shortening and titration of enrichment oligonucleotides used in high-resolution methods reduces noise and increases on-target sequencing. We combined these technical modifications into a new method Nuclear-Titrated (NuTi) Capture-C, which provides a >3-fold increase in informative sequencing content over current Capture-C protocols. Using NuTi Capture-C we target 8,061 promoters in triplicate, demonstrating that this method generates reproducible high-resolution genome-wide 3C interaction profiles at scale.


2014 ◽  
Author(s):  
Felix A. Klein ◽  
Tibor Pakozdi ◽  
Simon Anders ◽  
Yad Ghavi-Helm ◽  
Eileen E. M. Furlong ◽  
...  

Abstract Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ?view- point? with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific interactions and to detect changes between interaction profiles of different conditions. Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonously decreasing function to suitably trans- formed count data. Based on the fit, z-scores are calculated from the residuals, with high z scores being interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statisti- cal method DESeq2 adapted from RNA-Seq analysis. Availability and Implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org.


Author(s):  
Laura D. Martens ◽  
Oisín Faust ◽  
Liviu Pirvan ◽  
Dóra Bihary ◽  
Shamith A. Samarajiwa

AbstractChromosome conformation capture methods such as Hi-C enables mapping of genome-wide chromatin interactions and is a promising technology to understand the role of spatial chromatin organisation in gene regulation. However, the generation and analysis of these data sets at high resolutions remain technically challenging and costly. We developed a machine and deep learning approach to predict functionally important, highly interacting chromatin regions (HICR) and topologically associated domain (TAD) boundaries independent of Hi-C data in both normal physiological states and pathological conditions such as cancer. This approach utilises gradient boosted trees and convolutional neural networks trained on both Hi-C and histone modification epigenomic data from three different cell types. Given only epigenomic modification data these models are able to predict chromatin interactions and TAD boundaries with high accuracy. We demonstrate that our models are transferable across cell types, indicating that combinatorial histone mark signatures may be universal predictors for highly interacting chromatin regions and spatial chromatin architecture elements.


2019 ◽  
pp. 1-10
Author(s):  
Wei-Sheng Wu ◽  
Jer-Wei Chang ◽  
Hung-Jiun Liaw ◽  
Yu-Han Chu ◽  
Yu-Xuan Jiang

Background Recent advances in ChIP-seq technologies have led to the identification of thousands of TP53 binding loci in various cell types, providing unmatched opportunities for analysis and comparison of the TP53 genome-wide binding patterns under different experimental conditions. These ChIP-seq datasets provide valuable resources for studying the function of TP53. However, there are currently no databases available for easily comparing and analyzing TP53 genome-wide binding patterns derived from different cell lines. Moreover, the TP53 ChIP-seq datasets are scattered among different papers, so extensive work is required to collect and process them for further analysis. Description To solve these problems, we comprehensively collected 13 publicly available TP53 ChIP-seq datasets derived from various cell lines. We re-mapped these 13 ChIP-seq datasets to the most updated reference human genome hg38 and identified the binding peaks (regions with significant enrichment of TP53 binding) and the target genes of TP53 in the human genome using the same data processing pipeline. Note that processing these 13 ChIP-seq datasets using the same pipeline is very crucial because it makes comparing the identified peaks and target genes of TP53 from different datasets possible. Finally, we developed a web-based platform (called the p53BLD), which provides a browse mode to visualize the binding loci of TP53 in the genome and a search mode to retrieve genes whose promoters are bound by TP53. The search mode is very powerful. Users can apply union, intersect, and/or difference operations on the 13 ChIP-seq datasets to generate a list of TP53 binding target genes that satisfies the users’ specifications. The generated gene list can then be downloaded for further analysis. Therefore, the p53BLD can also be regarded as a discovery tool that helps users to generate interesting gene lists for studying TP53. Conclusions Here we presented the first p53 Binding Loci Database (p53BLD). In the case study, we showed that using p53BLD can identify novel TP53 binding targets (KAT6A and KMT2A) in specific cancer cell lines. We believe that p53BLD is a useful resource for studying the function of TP53 in different cancer cell lines. P53BLD is available online at link1/, link2/, or link3/


2021 ◽  
Author(s):  
Anna K. Simonsen

AbstractBacteria have highly flexible pangenomes, which are thought to facilitate evolutionary responses to environmental change, but the impacts of environmental stress on pangenome evolution remain unclear. Using a landscape pangenomics approach, I demonstrate that environmental stress leads to consistent, continuous reduction in genome content along four environmental stress gradients (acidity, aridity, heat, salinity) in naturally occurring populations of Bradyrhizobium diazoefficiens (widespread soil-dwelling plant mutualists). Using gene-level network and duplication functional traits to predict accessory gene distributions across environments, genes predicted to be superfluous are more likely lost in high stress, while genes with multi-functional roles are more likely retained. Genes with higher probabilities of being lost with stress contain significantly higher proportions of codons under strong purifying and positive selection. Gene loss is widespread across the entire genome, with high gene-retention hotspots in close spatial proximity to core genes, suggesting Bradyrhizobium has evolved to cluster essential-function genes (accessory genes with multifunctional roles and core genes) in discrete genomic regions, which may stabilise viability during genomic decay. In conclusion, pangenome evolution through genome streamlining are important evolutionary responses to environmental change. This raises questions about impacts of genome streamlining on the adaptive capacity of bacterial populations facing rapid environmental change.


2021 ◽  
Author(s):  
Yajing Hao ◽  
Changwei Shao ◽  
Guofeng Zhao ◽  
Xiang-Dong Fu

AbstractThe rapid advance of high-throughput technologies has enabled the generation of two-dimensional or even multi-dimensional high-throughput data, e.g., genome-wide siRNA screen (1st dimension) for multiple changes in gene expression (2nd dimension) in many different cell types or tissues or under different experimental conditions (3rd dimension). We show that the simple Z-based statistic and derivatives are no longer suitable for analyzing such data because of the accumulation of experimental noise and/or off-target effects. Here, we introduce ZetaSuite, a statistical package designed to score and rank hits from two-dimensional screens, construct regulatory networks based on response similarities, and eliminate off-targets. Applying this method to two large cancer dependency screen datasets, we identify not only genes critical for cell fitness, but also those required for constraining cell proliferation. Strikingly, most of those cancer constraining genes function in DNA replication/repair checkpoint, suggesting that cancer cells also need to protect their genomes for long-term survival.


Author(s):  
Jelena M. Telenius ◽  
Damien J. Downes ◽  
Martin Sergeant ◽  
A. Marieke Oudelaar ◽  
Simon McGowan ◽  
...  

ABSTRACTDNA folding within nuclei is a highly ordered process, with implications for gene regulation and development. An array of chromosome conformation capture (3C) methods have been developed to investigate how DNA is packaged within nuclei and to interrogate specific interactions. While these methods use different approaches to examine target loci (many-versus-all) or the entire genome (all-versus-all), they all rely on the core principle of endonuclease digestion and proximity-based ligation to re-arrange genomic order to reflect the three-dimensional nuclear conformation. This sequence reorganization creates novel chimeric DNA fragments which require specialist bioinformatic tools to analyze and visualize. Despite this need for specialist bioinformatic skills, the core biological importance of genome folding has seen widespread methodological uptake. To service the needs of experimentalists using the many-versus-all Capture-C family of methods we have developed CaptureCompendium; a toolkit of software to simplify the design, analysis and presentation of 3C experiments.


Development ◽  
2021 ◽  
Vol 148 (9) ◽  
Author(s):  
Nicolas Lonfat ◽  
Su Wang ◽  
ChangHee Lee ◽  
Mauricio Garcia ◽  
Jiho Choi ◽  
...  

ABSTRACT The vertebrate retina is generated by retinal progenitor cells (RPCs), which produce >100 cell types. Although some RPCs produce many cell types, other RPCs produce restricted types of daughter cells, such as a cone photoreceptor and a horizontal cell (HC). We used genome-wide assays of chromatin structure to compare the profiles of a restricted cone/HC RPC and those of other RPCs in chicks. These data nominated regions of regulatory activity, which were tested in tissue, leading to the identification of many cis-regulatory modules (CRMs) active in cone/HC RPCs and developing cones. Two transcription factors, Otx2 and Oc1, were found to bind to many of these CRMs, including those near genes important for cone development and function, and their binding sites were required for activity. We also found that Otx2 has a predicted autoregulatory CRM. These results suggest that Otx2, Oc1 and possibly other Onecut proteins have a broad role in coordinating cone development and function. The many newly discovered CRMs for cones are potentially useful reagents for gene therapy of cone diseases.


2021 ◽  
Vol 7 (3) ◽  
pp. eabd9036
Author(s):  
Sara Saez-Atienzar ◽  
Sara Bandres-Ciga ◽  
Rebekah G. Langston ◽  
Jonggeol J. Kim ◽  
Shing Wan Choi ◽  
...  

Despite the considerable progress in unraveling the genetic causes of amyotrophic lateral sclerosis (ALS), we do not fully understand the molecular mechanisms underlying the disease. We analyzed genome-wide data involving 78,500 individuals using a polygenic risk score approach to identify the biological pathways and cell types involved in ALS. This data-driven approach identified multiple aspects of the biology underlying the disease that resolved into broader themes, namely, neuron projection morphogenesis, membrane trafficking, and signal transduction mediated by ribonucleotides. We also found that genomic risk in ALS maps consistently to GABAergic interneurons and oligodendrocytes, as confirmed in human single-nucleus RNA-seq data. Using two-sample Mendelian randomization, we nominated six differentially expressed genes (ATG16L2, ACSL5, MAP1LC3A, MAPKAPK3, PLXNB2, and SCFD1) within the significant pathways as relevant to ALS. We conclude that the disparate genetic etiologies of this fatal neurological disease converge on a smaller number of final common pathways and cell types.


Sign in / Sign up

Export Citation Format

Share Document