S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic data

Abstract Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.

Download Full-text

S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic data

10.1101/506634 ◽

2018 ◽

Cited By ~ 1

Author(s):

Guanjue Xiang ◽

Cheryl A. Keller ◽

Belinda Giardine ◽

Lin An ◽

Qunhua Li ◽

...

Keyword(s):

Gene Expression Regulation ◽

Signal To Noise Ratio ◽

Cell Types ◽

Biological Variation ◽

Sequencing Depth ◽

Data Sets ◽

Data Normalization ◽

Signal To Noise ◽

Experimental Conditions ◽

Multiple Cell

ABSTRACTQuantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.

Download Full-text

HOT or not: examining the basis of high-occupancy target regions

10.1101/107680 ◽

2017 ◽

Cited By ~ 3

Author(s):

Katarzyna Wreczycka ◽

Vedran Franke ◽

Bora Uyar ◽

Ricardo Wurmus ◽

Altuna Akalin

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Binding Sites ◽

Cell Types ◽

Data Sets ◽

Gene Promoters ◽

Quadruplex Dna ◽

Golden Standard ◽

Multiple Cell ◽

Multiple Species

AbstractHigh-occupancy target (HOT) regions are the segments of the genome with unusually high number of transcription factor binding sites. These regions are observed in multiple species and thought to have biological importance due to high transcription factor occupancy. Furthermore, they coincide with house-keeping gene promoters and the associated genes are stably expressed across multiple cell types. Despite these features, HOT regions are solemnly defined using ChIP-seq experiments and shown to lack canonical motifs for transcription factors that are thought to be bound there. Although, ChIP-seq experiments are the golden standard for finding genome-wide binding sites of a protein, they are not noise free. Here, we show that HOT regions are likely to be ChIP-seq artifacts and they are similar to previously proposed “hyper-ChIPable” regions. Using ChIP-seq data sets for knocked-out transcription factors, we demonstrate presence of false positive signals on HOT regions. We observe sequence characteristics and genomic features that are discriminatory of HOT regions, such as GC/CpG-rich k-mers and enrichment of RNA-DNA hybrids (R-loops) and DNA tertiary structures (G-quadruplex DNA). The artificial ChIP-seq enrichment on HOT regions could be associated to these discriminatory features. Furthermore, we propose strategies to deal with such artifacts for the future ChIP-seq studies.

Download Full-text

New processing tools for weak and/or spatially overlapped macromolecular diffraction patterns

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444999008355 ◽

1999 ◽

Vol 55 (10) ◽

pp. 1733-1741 ◽

Cited By ~ 12

Author(s):

Dominique Bourgeois

Keyword(s):

Signal To Noise Ratio ◽

Data Sets ◽

Signal To Noise ◽

Weighting Method ◽

Time Resolved ◽

Accurate Evaluation ◽

Profile Fitting ◽

Integration Program ◽

Diffraction Patterns ◽

New Processing

Tools originally developed for the treatment of weak and/or spatially overlapped time-resolved Laue patterns were extended to improve the processing of difficult monochromatic data sets. The integration programPrOWallows deconvolution of spatially overlapped spots which are usually rejected by standard packages. By using dynamically adjusted profile-fitting areas, a carefully built library of reference spots and interpolation of reference profiles, this program also provides a more accurate evaluation of weak spots. In addition, by using Wilson statistics, it allows rejection of non-redundant strong outliers such as zingers, which otherwise may badly corrupt the data. A weighting method for optimizing structure-factor amplitude differences, based on Bayesian statistics and originally applied to low signal-to-noise ratio time-resolved Laue data, is also shown to significantly improve other types of subtle amplitude differences, such as anomalous differences.

Download Full-text

Assisted traveltime picking of crosshole GPR data

Geophysics ◽

10.1190/1.3141002 ◽

2009 ◽

Vol 74 (4) ◽

pp. J35-J48 ◽

Cited By ~ 9

Author(s):

Bernard Giroux ◽

Abderrezak Bouchedda ◽

Michel Chouteau

Keyword(s):

Time Window ◽

Signal To Noise Ratio ◽

Real Data ◽

Information Criterion ◽

High Signal ◽

Data Sets ◽

Signal To Noise ◽

Arrival Times ◽

Common Time ◽

Waveform Similarity

We introduce two new traveltime picking schemes developed specifically for crosshole ground-penetrating radar (GPR) applications. The main objective is to automate, at least partially, the traveltime picking procedure and to provide first-arrival times that are closer in quality to those of manual picking approaches. The first scheme is an adaptation of a method based on cross-correlation of radar traces collated in gathers according to their associated transmitter-receiver angle. A detector is added to isolate the first cycle of the radar wave and to suppress secon-dary arrivals that might be mistaken for first arrivals. To improve the accuracy of the arrival times obtained from the crosscorrelation lags, a time-rescaling scheme is implemented to resize the radar wavelets to a common time-window length. The second method is based on the Akaike information criterion(AIC) and continuous wavelet transform (CWT). It is not tied to the restrictive criterion of waveform similarity that underlies crosscorrelation approaches, which is not guaranteed for traces sorted in common ray-angle gathers. It has the advantage of being automated fully. Performances of the new algorithms are tested with synthetic and real data. In all tests, the approach that adds first-cycle isolation to the original crosscorrelation scheme improves the results. In contrast, the time-rescaling approach brings limited benefits, except when strong dispersion is present in the data. In addition, the performance of crosscorrelation picking schemes degrades for data sets with disparate waveforms despite the high signal-to-noise ratio of the data. In general, the AIC-CWT approach is more versatile and performs well on all data sets. Only with data showing low signal-to-noise ratios is the AIC-CWT superseded by the modified crosscorrelation picker.

Download Full-text

Comparative Performance of Catalytic Fenton Oxidation with Zero-Valent Iron (Fe(0)) in Comparison with Ferrous Sulphate for the Removal of Micropollutants

Applied Sciences ◽

10.3390/app9112181 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2181

Author(s):

Anuradha Goswami ◽

Jia-Qian Jiang

Keyword(s):

Standard Deviation ◽

Signal To Noise Ratio ◽

Ferrous Sulphate ◽

Fenton Oxidation ◽

Zero Valent Iron ◽

Fenton Process ◽

Signal To Noise ◽

Experimental Conditions ◽

Comparative Performance ◽

Sludge Production

This research aims to depict the comparative performance of micropollutants’ removal by FeSO4- and zero-valent iron (Fe(0))-catalytic Fenton oxidation and to explore the possibilities of minimising the sludge production from the process. The emerging micropollutants used for the study were gabapentin, sulfamethoxazole, diuron, terbutryn and terbuthylazine. The Taguchi method, which evaluates the signal-to-noise ratio instead of the standard deviation, was used to develop robust experimental conditions. Though both FeSO4- and Fe(0)-catalytic Fenton oxidation were able to completely degrade the stated micropollutants, the Fe(0)-catalytic Fenton process delivered better removal of dissolved organic carbon (DOC; 70%) than FeSO4 catalytic Fenton oxidation (45%). Fe(0)-catalytic Fenton oxidation facilitated heterogeneous treatment functions, which eliminated toxicity from contaminated solution and there was no recognisable sludge production.

Download Full-text

Comparing current noise in biological and solid-state nanopores

10.1101/866384 ◽

2019 ◽

Cited By ~ 1

Author(s):

A. Fragasso ◽

S. Schmid ◽

C. Dekker

Keyword(s):

Solid State ◽

Single Molecule ◽

Signal To Noise Ratio ◽

Low Cost ◽

Ionic Current ◽

Low Noise ◽

Signal To Noise ◽

Experimental Conditions ◽

High Frequencies ◽

Noise Ratio

AbstractNanopores bear great potential as single-molecule tools for bioanalytical sensing and sequencing, due to their exceptional sensing capabilities, high-throughput, and low cost. The detection principle relies on detecting small differences in the ionic current as biomolecules traverse the nanopore. A major bottleneck for the further progress of this technology is the noise that is present in the ionic current recordings, because it limits the signal-to-noise ratio and thereby the effective time resolution of the experiment. Here, we review the main types of noise at low and high frequencies and discuss the underlying physics. Moreover, we compare biological and solid-state nanopores in terms of the signal-to-noise ratio (SNR), the important figure of merit, by measuring free translocations of a short ssDNA through a selected set of nanopores under typical experimental conditions. We find that SiNx solid-state nanopores provide the highest SNR, due to the large currents at which they can be operated and the relatively low noise at high frequencies. However, the real game-changer for many applications is a controlled slowdown of the translocation speed, which for MspA was shown to increase the SNR >160-fold. Finally, we discuss practical approaches for lowering the noise for optimal experimental performance and further development of the nanopore technology.

Download Full-text

Cell segmentation and tracking using CNN-based distance predictions and a graph-based matching strategy

PLoS ONE ◽

10.1371/journal.pone.0243219 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243219

Author(s):

Tim Scherr ◽

Katharina Löffler ◽

Moritz Böhland ◽

Ralf Mikut

Keyword(s):

Cell Tracking ◽

Signal To Noise Ratio ◽

Cell Types ◽

Tracking Algorithm ◽

Training Data ◽

Cell Segmentation ◽

Data Sets ◽

Matching Strategy ◽

Microscopy Images ◽

Segmentation And Tracking

The accurate segmentation and tracking of cells in microscopy image sequences is an important task in biomedical research, e.g., for studying the development of tissues, organs or entire organisms. However, the segmentation of touching cells in images with a low signal-to-noise-ratio is still a challenging problem. In this paper, we present a method for the segmentation of touching cells in microscopy images. By using a novel representation of cell borders, inspired by distance maps, our method is capable to utilize not only touching cells but also close cells in the training process. Furthermore, this representation is notably robust to annotation errors and shows promising results for the segmentation of microscopy images containing in the training data underrepresented or not included cell types. For the prediction of the proposed neighbor distances, an adapted U-Net convolutional neural network (CNN) with two decoder paths is used. In addition, we adapt a graph-based cell tracking algorithm to evaluate our proposed method on the task of cell tracking. The adapted tracking algorithm includes a movement estimation in the cost function to re-link tracks with missing segmentation masks over a short sequence of frames. Our combined tracking by detection method has proven its potential in the IEEE ISBI 2020 Cell Tracking Challenge (http://celltrackingchallenge.net/) where we achieved as team KIT-Sch-GE multiple top three rankings including two top performances using a single segmentation model for the diverse data sets.

Download Full-text

Making External Validation Valid for Molecular Classifier Development

JCO Precision Oncology ◽

10.1200/po.21.00103 ◽

2021 ◽

pp. 1250-1258

Author(s):

Yilin Wu ◽

Huei-Chung Huang ◽

Li-Xuan Qin

Keyword(s):

Test Data ◽

Signal To Noise Ratio ◽

External Validation ◽

R Package ◽

Specific Method ◽

Data Sets ◽

Precision Oncology ◽

Data Normalization ◽

Sample Allocation ◽

Molecular Classifier

PURPOSE Accurate assessment of a molecular classifier that guides patient care is of paramount importance in precision oncology. Recent years have seen an increasing use of external validation for such assessment. However, little is known about how it is affected by ubiquitous unwanted variations in test data because of disparate experimental handling and by the use of data normalization for alleviating such variations. METHODS In this paper, we studied these issues using two microarray data sets for the same set of tumor samples and additional data simulated by resampling under various levels of signal-to-noise ratio and different designs for array-to-sample allocation. RESULTS We showed that (1) unwanted variations can lead to biased classifier assessment and (2) data normalization mitigates the bias to varying extents depending on the specific method used. In particular, frozen normalization methods for test data outperform their conventional forms in terms of both reducing the bias in accuracy estimation and increasing robustness to handling effects. We make available our benchmarking tool as an R package on GitHub for performing such evaluation on additional methods for normalization and classification. CONCLUSION Our findings thus highlight the importance of proper test-data normalization for valid assessment by external validation and call for caution on the choice of normalization method for molecular classifier development.

Download Full-text

OPTIMIZATION OF PROCESS PARAMETERS IN ND: YAG LASER WELDING OF HASTELLOY SHEETS THROUGH TAGUCHI METHOD

Journal of Manufacturing Engineering ◽

10.37255/jme.v4i1pp001-005 ◽

2019 ◽

Vol 14 (1) ◽

Author(s):

Saravanan S

Keyword(s):

Mathematical Model ◽

Tensile Strength ◽

Regression Analysis ◽

Signal To Noise Ratio ◽

Signal To Noise ◽

Experimental Conditions ◽

Statistical Software ◽

Yag Laser ◽

Weld Width ◽

Optimization Of Process Parameters

Optimization of weld width and tensile strength in pulsed Nd: YAG laser welded Hastelloy C-276 sheets, subjected to varied welding speed (350-450 mm/min), pulse energy (10-14 J) and pulse duration (6-8 ms), is attempted. Experimental conditions are designed based on Taguchi L9 orthogonal array. The parameters for attaining a minimum weld width and maximum tensile strength were determined by computing the Signal-to-Noise ratio. Further, a mathematical model is developed for determining the weld width and tensile strength of the weld, based on the regression analysis using statistical software MINITAB-16 and the level of fit are determined by analysis of variance.

Download Full-text

Optical Interferometry in the Multi-Speckle Mode

Symposium - International Astronomical Union ◽

10.1017/s007418090010796x ◽

1994 ◽

Vol 158 ◽

pp. 373-375

Author(s):

T. Reinheimer ◽

K.-H. Hofmann ◽

G. Weigelt

Keyword(s):

Computer Simulations ◽

Building Block ◽

Signal To Noise Ratio ◽

Simulated Data ◽

Optical Interferometry ◽

Data Sets ◽

Interferometric Imaging ◽

Signal To Noise ◽

Block Method ◽

Simulated Data Sets

We have studied interferometric imaging in the multi-speckle mode by computer simulations. From various simulated data sets diffraction-limited images were reconstructed by the speckle masking method and the iterative building block method. The reconstructed images show the dependence of the signal-to-noise ratio on photon noise.

Download Full-text