scholarly journals WACS: improving ChIP-seq peak calling by optimally weighting controls

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Aseel Awdeh ◽  
Marcel Turcotte ◽  
Theodore J. Perkins

Abstract Background Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, the incorporation of control datasets in ChIP-seq analysis is an essential step. The controls are used to account for the background signal, while the remainder of the ChIP-seq signal captures true binding or histone modification. However, a recurrent issue is different types of bias in different ChIP-seq experiments. Depending on which controls are used, different aspects of ChIP-seq bias are better or worse accounted for, and peak calling can produce different results for the same ChIP-seq experiment. Consequently, generating “smart” controls, which model the non-signal effect for a specific ChIP-seq experiment, could enhance contrast and increase the reliability and reproducibility of the results. Result We propose a peak calling algorithm, Weighted Analysis of ChIP-seq (WACS), which is an extension of the well-known peak caller MACS2. There are two main steps in WACS: First, weights are estimated for each control using non-negative least squares regression. The goal is to customize controls to model the noise distribution for each ChIP-seq experiment. This is then followed by peak calling. We demonstrate that WACS significantly outperforms MACS2 and AIControl, another recent algorithm for generating smart controls, in the detection of enriched regions along the genome, in terms of motif enrichment and reproducibility analyses. Conclusions This ultimately improves our understanding of ChIP-seq controls and their biases, and shows that WACS results in a better approximation of the noise distribution in controls.

2019 ◽  
Author(s):  
Aseel Awdeh ◽  
Marcel Turcotte ◽  
Theodore J. Perkins

AbstractMotivationChromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, the incorporation of control datasets in ChIP-seq analysis is an essential step. The controls are used to account for the background signal, while the remainder of the ChIP-seq signal captures true binding or histone modification. However, a recurrent issue is different types of bias in different ChIP-seq experiments. Depending on which controls are used, different aspects of ChIP-seq bias are better or worse accounted for, and peak calling can produce different results for the same ChIP-seq experiment. Consequently, generating “smart” controls, which model the non-signal effect for a specific ChIP-seq experiment, could enhance contrast and increase the reliability and reproducibility of the results.ResultsWe propose a peak calling algorithm, Weighted Analysis of ChIP-seq (WACS), which is an extension of the well-known peak caller MACS2. There are two main steps in WACS: First, weights are estimated for each control using non-negative least squares regression. The goal is to customize controls to model the noise distribution for each ChIP-seq experiment. This is then followed by peak calling. We demonstrate that WACS significantly outperforms MACS2 and AIControl, another recent algorithm for generating smart controls, in the detection of enriched regions along the genome, in terms of motif enrichment and reproducibility analyses.ConclusionThis ultimately improves our understanding of ChIP-seq controls and their biases, and shows that WACS results in a better approximation of the noise distribution in controls.


2020 ◽  
Author(s):  
Nanxiang Zhao ◽  
Alan P. Boyle

ABSTRACTGenomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing technologies. Peak calling is one of the first essential steps in analyzing these features by delineating regions such as open chromatin regions and transcription factor binding sites. Our original peak calling software, F-Seq, has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive sites sequencing (DNase-seq) data. However, F-Seq lacks support for user-input control dataset nor reporting test statistics, limiting its ability to capture systematic and experimental biases and accurately estimate background distributions. Here we present an improved version, F-Seq2, which combined the power of kernel density estimation and a dynamic “continuous” Poisson distribution to robustly account for local biases and solve ties when ranking candidate peaks. In F-score and motif distance analysis, we demonstrated the superior performance of F-Seq2 than other competing peak callers used by the ENCODE Consortium on simulated and real ATAC-seq and ChIP-seq datasets. The output of F-Seq2 is suitable for irreproducible discovery rate (IDR) analysis as the test statistics calculated for individual candidate summit and ties are robustly solved.


Mobile DNA ◽  
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jonathan Filée ◽  
Sarah Farhat ◽  
Dominique Higuet ◽  
Laure Teysset ◽  
Dominique Marie ◽  
...  

Abstract Background With the expansion of high throughput sequencing, we now have access to a larger number of genome-wide studies analyzing the Transposable elements (TEs) composition in a wide variety of organisms. However, genomic analyses often remain too limited in number and diversity of species investigated to study in depth the dynamics and evolutionary success of the different types of TEs among metazoans. Therefore, we chose to investigate the use of transcriptomes to describe the diversity of TEs in phylogenetically related species by conducting the first comparative analysis of TEs in two groups of polychaetes and evaluate the diversity of TEs that might impact genomic evolution as a result of their mobility. Results We present a detailed analysis of TEs distribution in transcriptomes extracted from 15 polychaetes depending on the number of reads used during assembly, and also compare these results with additional TE scans on associated low-coverage genomes. We then characterized the clades defined by 1021 LTR-retrotransposon families identified in 26 species. Clade richness was highly dependent on the considered superfamily. Copia elements appear rare and are equally distributed in only three clades, GalEa, Hydra and CoMol. Among the eight BEL/Pao clades identified in annelids, two small clades within the Sailor lineage are new for science. We characterized 17 Gypsy clades of which only 4 are new; the C-clade largely dominates with a quarter of the families. Finally, all species also expressed for the majority two distinct transcripts encoding PIWI proteins, known to be involved in control of TEs mobilities. Conclusions This study shows that the use of transcriptomes assembled from 40 million reads was sufficient to access to the diversity and proportion of the transposable elements compared to those obtained by low coverage sequencing. Among LTR-retrotransposons Gypsy elements were unequivocally dominant but results suggest that the number of Gypsy clades, although high, may be more limited than previously thought in metazoans. For BEL/Pao elements, the organization of clades within the Sailor lineage appears more difficult to establish clearly. The Copia elements remain rare and result from the evolutionary consistent success of the same three clades.


Vision ◽  
2020 ◽  
Vol 4 (1) ◽  
pp. 10 ◽  
Author(s):  
George Mather

Research to date has not found strong evidence for a universal link between any single low-level image statistic, such as fractal dimension or Fourier spectral slope, and aesthetic ratings of images in general. This study assessed whether different image statistics are important for artistic images containing different subjects and used partial least squares regression (PLSR) to identify the statistics that correlated most reliably with ratings. Fourier spectral slope, fractal dimension and Shannon entropy were estimated separately for paintings containing landscapes, people, still life, portraits, nudes, animals, buildings and abstracts. Separate analyses were performed on the luminance and colour information in the images. PLSR fits showed shared variance of up to 75% between image statistics and aesthetic ratings. The most important statistics and image planes varied across genres. Variation in statistics may reflect characteristic properties of the different neural sub-systems that process different types of image.


GigaScience ◽  
2019 ◽  
Vol 8 (12) ◽  
Author(s):  
Miriam Payá-Milans ◽  
Laura Poza-Viejo ◽  
Patxi San Martín-Uriz ◽  
David Lara-Astiaso ◽  
Mark D Wilkinson ◽  
...  

Abstract Background Genome-wide maps of histone modifications have been obtained for several plant species. However, most studies focus on model systems and do not enforce FAIR data management principles. Here we study the H3K27me3 epigenome and associated transcriptome of Brassica rapa, an important vegetable cultivated worldwide. Findings We performed H3K27me3 chromatin immunoprecipitation followed by high-throughput sequencing and transcriptomic analysis by 3′-end RNA sequencing from B. rapa leaves and inflorescences. To analyze these data we developed a Reproducible Epigenomic Analysis pipeline using Galaxy and Jupyter, packaged into Docker images to facilitate transparency and reuse. We found that H3K27me3 covers roughly one-third of all B. rapa protein-coding genes and its presence correlates with low transcript levels. The comparative analysis between leaves and inflorescences suggested that the expression of various floral regulatory genes during development depends on H3K27me3. To demonstrate the importance of H3K27me3 for B. rapa development, we characterized a mutant line deficient in the H3K27 methyltransferase activity. We found that braA.clf mutant plants presented pleiotropic alterations, e.g., curly leaves due to increased expression and reduced H3K27me3 levels at AGAMOUS-like loci. Conclusions We characterized the epigenetic mark H3K27me3 at genome-wide levels and provide genetic evidence for its relevance in B. rapa development. Our work reveals the epigenomic landscape of H3K27me3 in B. rapa and provides novel genomics datasets and bioinformatics analytical resources. We anticipate that this work will lead the way to further epigenomic studies in the complex genome of Brassica crops.


2018 ◽  
Vol 121 ◽  
pp. 295-305
Author(s):  
Małgorzata Orczyk

The article presents the results of internal noise study of selected public transport buses. Measurements of noise level were carried out while driving on a selected route undertaken as part of the bus timetable. To assess the noise inside the buses, a measuring set consisting of 10 microphones placed at a height of 1.6 m from the floor surface along the passage between the seats and the PULSE system from Brüel&Kjar were used. The aim of the study was to assess the noise distribution along buses and to compare it in different types of buses.


Sign in / Sign up

Export Citation Format

Share Document