peak calling
Recently Published Documents


TOTAL DOCUMENTS

84
(FIVE YEARS 39)

H-INDEX

11
(FIVE YEARS 3)

2022 ◽  
Author(s):  
William M Yashar ◽  
Garth Kong ◽  
Jake VanCampen ◽  
Brittany M Smith ◽  
Daniel J Coleman ◽  
...  

Genome-wide mapping of the histone modification landscape is critical to understanding tran-scriptional regulation. Cleavage Under Targets and Tagmentation (CUT&Tag) is a new method for profiling the localization of covalent histone modifications, offering improved sensitivity and decreased cost compared with Chromatin Immunoprecipitation Sequencing (ChIP-seq). Here, we present GoPeaks, a peak calling method specifically designed for histone modification CUT&Tag data. GoPeaks implements a Binomial distribution and stringent read count cut-off to nominate candidate genomic regions. We compared the performance of GoPeaks against com-monly used peak calling algorithms to detect H3K4me3, H3K4me1, and H3K27Ac peaks from CUT&Tag data. These histone modifications display a range of peak profiles and are frequently used in epigenetic studies. We found GoPeaks robustly detects genome-wide histone modifica-tions and, notably, identifies H3K27Ac with improved sensitivity compared to other standard peak calling algorithms.


2021 ◽  
Vol 22 (15) ◽  
pp. 8123
Author(s):  
Anna Macioszek ◽  
Bartek Wilczynski

The explosive development of next-generation sequencing-based technologies has allowed us to take an unprecedented look at many molecular signatures of the non-coding genome. In particular, the ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) technique is now very commonly used to assess the proteins associated with different non-coding DNA regions genome-wide. While the analysis of such data related to transcription factor binding is relatively straightforward, many modified histone variants, such as H3K27me3, are very important for the process of gene regulation but are very difficult to interpret. We propose a novel method, called HERON (HiddEn MaRkov mOdel based peak calliNg), for genome-wide data analysis that is able to detect DNA regions enriched for a certain feature, even in difficult settings of weakly enriched long DNA domains. We demonstrate the performance of our method both on simulated and experimental data.


2021 ◽  
Author(s):  
Korin Sahinyan ◽  
Darren M Blackburn ◽  
Marie-Michelle Simon ◽  
Felicia Lazure ◽  
Tony Kwan ◽  
...  

Skeletal myofibers are the main components of skeletal muscle which is the largest tissue in the body. Myofibers are highly adaptive in nature and they can vary in different biological and disease conditions. Therefore, transcriptional and epigenetic studies on myofibers are crucial to discover how chromatin alterations occur in the skeletal muscle under different conditions. However, due to the heterogenous nature of skeletal muscle, studying myofibers in isolation proves to be a challenging task. Single cell sequencing has permitted for the study of the epigenome of isolated myonuclei. While this provides sequencing with high dimensionality, the sequencing depth is lacking, which makes comparisons between different biological conditions difficult. Here we report the first implementation of single myofiber ATAC-Seq, which permits for the sequencing of an individual myofiber at a depth sufficient for peak calling and for comparative analysis of chromatin accessibility under various physiological, physical and disease conditions. Application of this technique revealed significant differences in chromatin accessibility between resting and regenerating myofibers. This technique can lead to wide application in identifying chromatin regulatory elements and epigenetic mechanisms in muscle fibers during development and in muscle-wasting diseases.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
An Zheng ◽  
Michael Lamkin ◽  
Yutong Qiu ◽  
Kevin Ren ◽  
Alon Goren ◽  
...  

Abstract Background A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. Results We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips. Conclusions ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.


2021 ◽  
Author(s):  
Jeremiah Suryatenggara ◽  
Kol Jia Yong ◽  
Danielle E. Tenen ◽  
Daniel G. Tenen ◽  
Mahmoud A. Bassal

AbstractChIP-Seq is a technique used to analyse protein-DNA interactions. The protein-DNA complex is pulled down using a protein antibody, after which sequencing and analysis of the bound DNA fragments is performed. A key bioinformatics analysis step is “peak” calling - identifying regions of enrichment. Benchmarking studies have consistently shown that no optimal peak caller exists. Peak callers have distinct selectivity and specificity characteristics which are often not additive and seldom completely overlap in many scenarios. In the absence of a universal peak caller, we rationalized one ought to utilize multiple peak-callers to 1) gauge peak confidence as determined through detection by multiple algorithms, and 2) more thoroughly survey the protein-bound landscape by capturing peaks not detected by individual peak callers owing to algorithmic limitations and biases. We therefore developed an integrated ChIP-Seq Analysis Pipeline (ChIP-AP) which performs all analysis steps from raw fastq files to final result, and utilizes four commonly used peak callers to more thoroughly and comprehensively analyse datasets. Results are integrated and presented in a single file enabling users to apply selectivity and sensitivity thresholds to select the consensus peak set, the union peak set, or any sub-set in-between to more confidently and comprehensively explore the protein-bound landscape. (https://github.com/JSuryatenggara/ChIP-AP).


2021 ◽  
Author(s):  
Len Taing ◽  
Clara Cousins ◽  
Gali Bai ◽  
Paloma Cejas ◽  
Xintao Qiu ◽  
...  

AbstractMotivationThe chromatin profile measured by ATAC-seq, ChIP-seq, or DNase-seq experiments can identify genomic regions critical in regulating gene expression and provide insights on biological processes such as diseases and development. However, quality control and processing chromatin profiling data involve many steps, and different bioinformatics tools are used at each step. It can be challenging to manage the analysis.ResultsWe developed a Snakemake pipeline called CHIPS (CHromatin enrichment Processor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, PCR bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible.AvailabilityCHIPS is available at https://github.com/liulab-dfci/CHIPS


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Aseel Awdeh ◽  
Marcel Turcotte ◽  
Theodore J. Perkins

Abstract Background Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, the incorporation of control datasets in ChIP-seq analysis is an essential step. The controls are used to account for the background signal, while the remainder of the ChIP-seq signal captures true binding or histone modification. However, a recurrent issue is different types of bias in different ChIP-seq experiments. Depending on which controls are used, different aspects of ChIP-seq bias are better or worse accounted for, and peak calling can produce different results for the same ChIP-seq experiment. Consequently, generating “smart” controls, which model the non-signal effect for a specific ChIP-seq experiment, could enhance contrast and increase the reliability and reproducibility of the results. Result We propose a peak calling algorithm, Weighted Analysis of ChIP-seq (WACS), which is an extension of the well-known peak caller MACS2. There are two main steps in WACS: First, weights are estimated for each control using non-negative least squares regression. The goal is to customize controls to model the noise distribution for each ChIP-seq experiment. This is then followed by peak calling. We demonstrate that WACS significantly outperforms MACS2 and AIControl, another recent algorithm for generating smart controls, in the detection of enriched regions along the genome, in terms of motif enrichment and reproducibility analyses. Conclusions This ultimately improves our understanding of ChIP-seq controls and their biases, and shows that WACS results in a better approximation of the noise distribution in controls.


2021 ◽  
Author(s):  
Maëlle Daunesse ◽  
Rachel Legendre ◽  
Hugo Varet ◽  
Adrien Pain ◽  
Claudia Chica

AbstractWe present ChIPuana, a Snakemake-based pipeline for epigenomic data from the raw fastq files to the differential analysis. It can be applied to any chromatin factor, e.g. histone modification or transcription factor, which can be profiled with ChIP-seq. ChIPuana streamlines critical steps like the quality assessment of the immunoprecipitation using cross-correlation and the replicate comparison for both narrow and broad peaks. For the differential analysis ChIPuana provides linear and nonlinear methods for normalisation between samples as well as conservative and stringent models for estimating the variance and testing the significance of the observed binding/marking differences.ChIPuana can process in parallel multiple chromatin factors with different experimental designs, number of biological replicates and/or conditions. It also facilitates the specific parametrisation of each dataset allowing both narrow or broad peak calling, as well as comparisons between the conditions using multiple statistical settings. Finally, complete reports are produced at the end of the bioinformatic and the statistical part of the analysis, which facilitate the data quality control and the interpretation of the results.We explored the discriminative power of the statistical settings for the differential analysis, using a published dataset of three histone marks (H3K4me3, H3K27ac and H3K4me1) and two transcription factors (Oct4 and Klf4) profiled with ChIP-seq in two biological conditions (shControl and shUbc9). We show that distinct results are obtained depending on the sources of ChIP-seq variability and the dynamics of the chromatin factor under study. We propose that ChIPuana can be used to measure the richness of the epigenomic landscape underlying a biological process by identifying diverse regulatory regimes and the associated genes sets.


2021 ◽  
Author(s):  
Lance D. Hentges ◽  
Martin J. Sergeant ◽  
Damien J. Downes ◽  
Jim R. Hughes ◽  
Stephen Taylor

AbstractGenomics technologies, such as ATAC-seq, ChIP-seq, and DNase-seq, have revolutionized molecular biology, generating a complete genome’s worth of signal in a single assay. Coupled with the use of genome browsers, researchers can now see and identify important DNA encoded elements as peaks in an analog signal. Despite the ease with which humans can visually identify peaks, converting these signals into meaningful genome-wide peak calls from such massive datasets requires complex analytical techniques. Current methods use statistical frameworks to identify peaks as sites of significant signal enrichment, discounting that the analog data do not follow any archetypal distribution. Recent advances in artificial intelligence have shown great promise in image recognition, on par or exceeding human ability, providing an opportunity to reimagine and improve peak calling. We present an interactive and intuitive peak calling framework, LanceOtron, built around image recognition using a wide and deep neural network. We hand-labelled 499Mb of genomic data, built 5,000 models, and tested with over 100 unique users from labs around the world. In benchmarking open chromatin, transcription factor binding, and chromatin modification datasets, LanceOtron outperforms the long-standing, gold-standard peak caller MACS2 with its increased selectivity and near perfect sensitivity. Additionally, this command-line optional approach allows researchers to easily generate optimal peak-calls using only a web interface. Together, the enhanced performance, and usability of LanceOtron will improve the reliability and reproducibility of peak calls and subsequent data analysis. This tool highlights the general utility of applying machine learning to genomic data extraction and analysis.


Sign in / Sign up

Export Citation Format

Share Document