scholarly journals ssvQC: an integrated CUT&RUN quality control workflow for histone modifications and transcription factors

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Joseph Boyd ◽  
Princess Rodriguez ◽  
Hilde Schjerven ◽  
Seth Frietze

Abstract Objective Among the different methods to profile the genome-wide patterns of transcription factor binding and histone modifications in cells and tissues, CUT&RUN has emerged as a more efficient approach that allows for a higher signal-to-noise ratio using fewer number of cells compared to ChIP-seq. The results from CUT&RUN and other related sequence enrichment assays requires comprehensive quality control (QC) and comparative analysis of data quality across replicates. While several computational tools currently exist for read mapping and analysis, a systematic reporting of data quality is lacking. Our aims were to (1) compare methods for using frozen versus fresh cells for CUT&RUN and (2) to develop an easy-to-use pipeline for assessing data quality. Results We compared a workflow for CUT&RUN with fresh and frozen samples, and present an R package called ssvQC for quality control and comparison of data quality derived from CUT&RUN and other enrichment-based sequence data. Using ssvQC, we evaluate results from different CUT&RUN protocols for transcription factors and histone modifications from fresh and frozen tissue samples. Overall, this process facilitates evaluation of data quality across datasets and permits inspection of peak calling analysis, replicate analysis of different data types. The package ssvQC is readily available at https://github.com/FrietzeLabUVM/ssvQC.

2021 ◽  
Author(s):  
Joseph Boyd ◽  
Princess Rodriguez ◽  
Hilde Schjerven ◽  
Seth Frietze

Abstract ObjectiveAmong the different methods to profile the genome-wide patterns of transcription factor binding and histone modifications in cells and tissues, CUT&RUN has emerged as a more efficient approach that allows for a higher signal-to-noise ratio using fewer number of cells compared to ChIP-seq. The results from CUT&RUN and other related sequence enrichment assays requires comprehensive quality control (QC) and comparative analysis of data quality across replicates. While several computational tools currently exist for read mapping and analysis, a systematic reporting of data quality is lacking. Our aims were to 1) compare methods for using frozen versus fresh cells for CUT&RUN and 2) to develop an easy-to-use pipeline for assessing data quality.ResultsWe compared a workflow for CUT&RUN with fresh and frozen samples, and present an R package called ssvQC for quality control and comparison of data quality derived from CUT&RUN and other enrichment-based sequence data. Using ssvQC, we evaluate results from different CUT&RUN protocols for transcription factors and histone modifications from fresh and frozen tissue samples. Overall, this process facilitates evaluation of data quality across datasets and permits inspection of peak calling analysis, replicate analysis of different data types. The package ssvQC is readily available at https://github.com/FrietzeLabUVM/ssvQC.


2018 ◽  
Vol 10 (9) ◽  
pp. 1476 ◽  
Author(s):  
Simone Cosoli ◽  
Badema Grcic ◽  
Stuart de Vos ◽  
Yasha Hetzel

Quality-control procedures and their impact on data quality are described for the High-Frequency Ocean Radar (HFR) network in Australia, in particular for the commercial phased-array (WERA) HFR type. Threshold-based quality-control procedures were used to obtain radial velocity and signal-to-noise ratio (SNR), however, values were set through quantitative analyses with independent measurements available within the HFR coverage, when available, or from long-term data statistics. An artifact removal procedure was also applied to the spatial distribution of SNR for the first-order Bragg peaks, under the assumption the SNR is a valid proxy for radial velocity quality and that SNR decays with range from the receiver. The proposed iterative procedure was specially designed to remove anomalous observations associated with strong SNR peaks caused by the 50 Hz sources. The procedure iteratively fits a polynomial along the radial beam (1-D case) or a surface (2-D case) to the SNR associated with the radial velocity. Observations that exceed a detection threshold were then identified and flagged. After removing suspect data, new iterations were run with updated detection thresholds until no additional spikes were found or a maximum number of iterations was reached.


2021 ◽  
Author(s):  
Bofeng Liu ◽  
Fengling Chen ◽  
Wei Xie

Several chromatin immunocleavage-based (ChIC) methods using Tn5 transposase have been developed to profile histone modifications and transcription factors bindings. A recent preprint by Wang et al. raised potential concerns that these methods are prone to open chromatin bias. While the authors are appreciated for alerting the community for this issue, it has been previously described and discussed by Henikoff and colleagues in the original CUT&Tag paper. However, as described for CUT&Tag, the signal-to-noise ratio is essential for Tn5-based profiling methods and all antibody-based enrichment assays. Based on this notion, we would like to point out a major analysis issue in Wang et al. that caused a complete loss or dramatic reduction of enrichment at true targets for datasets generated by Tn5-based methods, which in turn artificially enhanced the relative enrichment of potential open chromatin bias. Such analysis issue is caused by distinct background normalizations used towards ChIP-based (chromatin immunoprecipitation) data and Tn5-based data in Wang et al. Only the normalization for Tn5-based data, but not ChIP-seq based data, yielded such effects. Distortion of such signal-to-noise ratio would consequently lead to misleading results.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Arnaud Liehrmann ◽  
Guillem Rigaill ◽  
Toby Dylan Hocking

Abstract Background Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.


Author(s):  
Antonella D. Pontoriero ◽  
Giovanna Nordio ◽  
Rubaida Easmin ◽  
Alessio Giacomel ◽  
Barbara Santangelo ◽  
...  

2021 ◽  
Vol 22 (S6) ◽  
Author(s):  
Yasmine Mansour ◽  
Annie Chateau ◽  
Anna-Sophie Fiston-Lavier

Abstract Background Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. Results Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. Conclusions BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Charith B. Karunarathna ◽  
Jinko Graham

Abstract Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.


2001 ◽  
Vol 27 (7) ◽  
pp. 867-876 ◽  
Author(s):  
Pankajakshan Thadathil ◽  
Aravind K Ghosh ◽  
J.S Sarupria ◽  
V.V Gopalakrishna

Sign in / Sign up

Export Citation Format

Share Document