ssvQC: an integrated CUT&RUN quality control workflow for histone modifications and transcription factors

Abstract Objective Among the different methods to profile the genome-wide patterns of transcription factor binding and histone modifications in cells and tissues, CUT&RUN has emerged as a more efficient approach that allows for a higher signal-to-noise ratio using fewer number of cells compared to ChIP-seq. The results from CUT&RUN and other related sequence enrichment assays requires comprehensive quality control (QC) and comparative analysis of data quality across replicates. While several computational tools currently exist for read mapping and analysis, a systematic reporting of data quality is lacking. Our aims were to (1) compare methods for using frozen versus fresh cells for CUT&RUN and (2) to develop an easy-to-use pipeline for assessing data quality. Results We compared a workflow for CUT&RUN with fresh and frozen samples, and present an R package called ssvQC for quality control and comparison of data quality derived from CUT&RUN and other enrichment-based sequence data. Using ssvQC, we evaluate results from different CUT&RUN protocols for transcription factors and histone modifications from fresh and frozen tissue samples. Overall, this process facilitates evaluation of data quality across datasets and permits inspection of peak calling analysis, replicate analysis of different data types. The package ssvQC is readily available at https://github.com/FrietzeLabUVM/ssvQC.

Download Full-text

An Integrated CUT&RUN Quality Control Workflow for Histone Modifications and Transcription Factors

10.21203/rs.3.rs-646006/v1 ◽

2021 ◽

Author(s):

Joseph Boyd ◽

Princess Rodriguez ◽

Hilde Schjerven ◽

Seth Frietze

Keyword(s):

Quality Control ◽

Transcription Factors ◽

Data Quality ◽

Histone Modifications ◽

Sequence Data ◽

Signal To Noise Ratio ◽

R Package ◽

Data Types ◽

Tissue Samples ◽

Related Sequence

Abstract ObjectiveAmong the different methods to profile the genome-wide patterns of transcription factor binding and histone modifications in cells and tissues, CUT&RUN has emerged as a more efficient approach that allows for a higher signal-to-noise ratio using fewer number of cells compared to ChIP-seq. The results from CUT&RUN and other related sequence enrichment assays requires comprehensive quality control (QC) and comparative analysis of data quality across replicates. While several computational tools currently exist for read mapping and analysis, a systematic reporting of data quality is lacking. Our aims were to 1) compare methods for using frozen versus fresh cells for CUT&RUN and 2) to develop an easy-to-use pipeline for assessing data quality.ResultsWe compared a workflow for CUT&RUN with fresh and frozen samples, and present an R package called ssvQC for quality control and comparison of data quality derived from CUT&RUN and other enrichment-based sequence data. Using ssvQC, we evaluate results from different CUT&RUN protocols for transcription factors and histone modifications from fresh and frozen tissue samples. Overall, this process facilitates evaluation of data quality across datasets and permits inspection of peak calling analysis, replicate analysis of different data types. The package ssvQC is readily available at https://github.com/FrietzeLabUVM/ssvQC.

Download Full-text

Improving Data Quality for the Australian High Frequency Ocean Radar Network through Real-Time and Delayed-Mode Quality-Control Procedures

Remote Sensing ◽

10.3390/rs10091476 ◽

2018 ◽

Vol 10 (9) ◽

pp. 1476 ◽

Cited By ~ 12

Author(s):

Simone Cosoli ◽

Badema Grcic ◽

Stuart de Vos ◽

Yasha Hetzel

Keyword(s):

Quality Control ◽

Data Quality ◽

Radial Velocity ◽

High Frequency ◽

Signal To Noise Ratio ◽

Detection Threshold ◽

Radial Beam ◽

Control Procedures ◽

Quantitative Analyses ◽

Ocean Radar

Quality-control procedures and their impact on data quality are described for the High-Frequency Ocean Radar (HFR) network in Australia, in particular for the commercial phased-array (WERA) HFR type. Threshold-based quality-control procedures were used to obtain radial velocity and signal-to-noise ratio (SNR), however, values were set through quantitative analyses with independent measurements available within the HFR coverage, when available, or from long-term data statistics. An artifact removal procedure was also applied to the spatial distribution of SNR for the first-order Bragg peaks, under the assumption the SNR is a valid proxy for radial velocity quality and that SNR decays with range from the receiver. The proposed iterative procedure was specially designed to remove anomalous observations associated with strong SNR peaks caused by the 50 Hz sources. The procedure iteratively fits a polynomial along the radial beam (1-D case) or a surface (2-D case) to the SNR associated with the radial velocity. Observations that exceed a detection threshold were then identified and flagged. After removing suspect data, new iterations were run with updated detection thresholds until no additional spikes were found or a maximum number of iterations was reached.

Download Full-text

Evaluation of ChIC-based data requires normalization that properly retains signal-to-noise ratios

10.1101/2021.08.14.456176 ◽

2021 ◽

Author(s):

Bofeng Liu ◽

Fengling Chen ◽

Wei Xie

Keyword(s):

Transcription Factors ◽

Histone Modifications ◽

Chromatin Immunoprecipitation ◽

Signal To Noise Ratio ◽

Complete Loss ◽

Open Chromatin ◽

Dramatic Reduction ◽

Signal To Noise ◽

Noise Ratio ◽

Tn5 Transposase

Several chromatin immunocleavage-based (ChIC) methods using Tn5 transposase have been developed to profile histone modifications and transcription factors bindings. A recent preprint by Wang et al. raised potential concerns that these methods are prone to open chromatin bias. While the authors are appreciated for alerting the community for this issue, it has been previously described and discussed by Henikoff and colleagues in the original CUT&Tag paper. However, as described for CUT&Tag, the signal-to-noise ratio is essential for Tn5-based profiling methods and all antibody-based enrichment assays. Based on this notion, we would like to point out a major analysis issue in Wang et al. that caused a complete loss or dramatic reduction of enrichment at true targets for datasets generated by Tn5-based methods, which in turn artificially enhanced the relative enrichment of potential open chromatin bias. Such analysis issue is caused by distinct background normalizations used towards ChIP-based (chromatin immunoprecipitation) data and Tn5-based data in Wang et al. Only the normalization for Tn5-based data, but not ChIP-seq based data, yielded such effects. Distortion of such signal-to-noise ratio would consequently lead to misleading results.

Download Full-text

Data Assimilation and Ocean Data Quality Control Upgrades in SWAFS

10.21236/ada626595 ◽

2002 ◽

Author(s):

Clark Rowley ◽

James Cummings

Keyword(s):

Quality Control ◽

Data Assimilation ◽

Data Quality ◽

Data Quality Control

Download Full-text

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models

BMC Bioinformatics ◽

10.1186/s12859-021-04221-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Arnaud Liehrmann ◽

Guillem Rigaill ◽

Toby Dylan Hocking

Keyword(s):

Histone Modifications ◽

Count Data ◽

High Throughput Sequencing ◽

Genetic Regulation ◽

Regulation Of Gene Expression ◽

Basic Mechanism ◽

R Package ◽

Detection Accuracy ◽

Full Potential ◽

Segmentation Models

Abstract Background Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.

Download Full-text

Automated Data Quality Control in FDOPA brain PET Imaging using Deep Learning

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2021.106239 ◽

2021 ◽

pp. 106239

Author(s):

Antonella D. Pontoriero ◽

Giovanna Nordio ◽

Rubaida Easmin ◽

Alessio Giacomel ◽

Barbara Santangelo ◽

...

Keyword(s):

Quality Control ◽

Deep Learning ◽

Data Quality ◽

Pet Imaging ◽

Data Quality Control ◽

Brain Pet

Download Full-text

BREC: an R package/Shiny app for automatically identifying heterochromatin boundaries and estimating local recombination rates along chromosomes

BMC Bioinformatics ◽

10.1186/s12859-021-04233-1 ◽

2021 ◽

Vol 22 (S6) ◽

Author(s):

Yasmine Mansour ◽

Annie Chateau ◽

Anna-Sophie Fiston-Lavier

Keyword(s):

Data Quality ◽

Data Science ◽

Fruit Fly ◽

R Package ◽

Model Organisms ◽

Data Quality Control ◽

Recombination Rates ◽

Functional Dynamics ◽

Shiny App ◽

User Friendly

Abstract Background Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. Results Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. Conclusions BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization.

Download Full-text

A Machine Learning Approach for Data Quality Control of Earth Observation Data Management System

IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss39084.2020.9323615 ◽

2020 ◽

Author(s):

Weiguo Han ◽

Matthew Jochum

Keyword(s):

Machine Learning ◽

Quality Control ◽

Data Quality ◽

Earth Observation ◽

Data Management System ◽

Learning Approach ◽

Observation Data ◽

Data Quality Control ◽

Machine Learning Approach ◽

Earth Observation Data

Download Full-text

perfectphyloR: An R package for reconstructing perfect phylogenies

BMC Bioinformatics ◽

10.1186/s12859-019-3313-4 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Charith B. Karunarathna ◽

Jinko Graham

Keyword(s):

Binary Tree ◽

Sequence Data ◽

R Package ◽

Binary Sequences ◽

Ancestral Haplotype ◽

Perfect Phylogeny ◽

Nested Partitions ◽

Genetic Sequence ◽

Insight Into ◽

Rooted Binary Tree

Abstract Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.

Download Full-text