Issues of Z-factor and an approach to avoid them for quality control in high-throughput screening studies

Bioinformatics ◽

10.1093/bioinformatics/btaa1049 ◽

2020 ◽

Author(s):

Xiaohua Douglas Zhang ◽

Dandan Wang ◽

Shixue Sun ◽

Heping Zhang

Keyword(s):

Quality Control ◽

High Throughput ◽

High Throughput Screening ◽

Theoretical Basis ◽

Sampling Error ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Automation Technology ◽

General Public License

Abstract Motivation High-throughput screening (HTS) is a vital automation technology in biomedical research in both industry and academia. The well-known Z-factor has been widely used as a gatekeeper to assure assay quality in an HTS study. However, many researchers and users may not have realized that Z-factor has major issues. Results In this article, the following four major issues are explored and demonstrated so that researchers may use the Z-factor appropriately. First, the Z-factor violates the Pythagorean theorem of statistics. Second, there is no adjustment of sampling error in the application of the Z-factor for quality control (QC) in HTS studies. Third, the expectation of the sample-based Z-factor does not exist. Fourth, the thresholds in the Z-factor-based criterion lack a theoretical basis. Here, an approach to avoid these issues was proposed and new QC criteria under homoscedasticity were constructed so that researchers can choose a statistically grounded criterion for QC in the HTS studies. We implemented this approach in an R package and demonstrated its utility in multiple CRISPR/CAS9 or siRNA HTS studies. Availability and implementation The R package qcSSMDhomo is freely available from GitHub: https://github.com/Karena6688/qcSSMDhomo. The file qcSSMDhomo_1.0.0.tar.gz (for Windows) containing qcSSMDhomo is also available at Bioinformatics online. qcSSMDhomo is distributed under the GNU General Public License. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files

Bioinformatics ◽

10.1093/bioinformatics/btz937 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2587-2588 ◽

Cited By ~ 10

Author(s):

Christopher M Ward ◽

Thu-Hien To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

R Package ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GladiaTOX: GLobal Assessment of Dose-IndicAtor in TOXicology

Bioinformatics ◽

10.1093/bioinformatics/btz187 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4190-4192 ◽

Cited By ~ 4

Author(s):

Vincenzo Belcastro ◽

Stephane Cano ◽

Diego Marescotti ◽

Stefano Acali ◽

Carine Poussin ◽

...

Keyword(s):

Quality Control ◽

Data Processing ◽

Web Service ◽

Biomedical Research ◽

Global Assessment ◽

R Package ◽

Supplementary Information ◽

High Content Screening ◽

Supplementary Data ◽

Severity Scores

Abstract Summary GladiaTOX R package is an open-source, flexible solution to high-content screening data processing and reporting in biomedical research. GladiaTOX takes advantage of the ‘tcpl’ core functionalities and provides a number of extensions: it provides a web-service solution to fetch raw data; it computes severity scores and exports ToxPi formatted files; furthermore it contains a suite of functionalities to generate PDF reports for quality control and data processing. Availability and implementation GladiaTOX R package (bioconductor). Also available via: git clone https://github.com/philipmorrisintl/GladiaTOX.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Breeze: an integrated quality control and data analysis application for high-throughput drug screening

Bioinformatics ◽

10.1093/bioinformatics/btaa138 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3602-3604 ◽

Cited By ~ 6

Author(s):

Swapnil Potdar ◽

Aleksandr Ianevski ◽

John-Patrick Mpindi ◽

Dmitrii Bychkov ◽

Clément Fiere ◽

...

Keyword(s):

Quality Control ◽

Data Analysis ◽

High Throughput ◽

High Throughput Screening ◽

Complete Solution ◽

Supplementary Information ◽

Source Analysis ◽

Technical Documentation ◽

Resistance Patterns ◽

Analysis Application

Abstract Summary High-throughput screening (HTS) enables systematic testing of thousands of chemical compounds for potential use as investigational and therapeutic agents. HTS experiments are often conducted in multi-well plates that inherently bear technical and experimental sources of error. Thus, HTS data processing requires the use of robust quality control procedures before analysis and interpretation. Here, we have implemented an open-source analysis application, Breeze, an integrated quality control and data analysis application for HTS data. Furthermore, Breeze enables a reliable way to identify individual drug sensitivity and resistance patterns in cell lines or patient-derived samples for functional precision medicine applications. The Breeze application provides a complete solution for data quality assessment, dose–response curve fitting and quantification of the drug responses along with interactive visualization of the results. Availability and implementation The Breeze application with video tutorial and technical documentation is accessible at https://breeze.fimm.fi; the R source code is publicly available at https://github.com/potdarswapnil/Breeze under GNU General Public License v3.0. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

VeriNA3d: an R package for nucleic acids data mining

Bioinformatics ◽

10.1093/bioinformatics/btz553 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5334-5336 ◽

Cited By ~ 1

Author(s):

Diego Gallego ◽

Leonardo Darré ◽

Pablo D Dans ◽

Modesto Orozco

Keyword(s):

Data Mining ◽

Nucleic Acids ◽

High Throughput ◽

Structural Data ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Structures ◽

High Throughput Analysis ◽

Single Structure

Abstract Summary veriNA3d is an R package for the analysis of nucleic acids structural data, with an emphasis in complex RNA structures. In addition to single-structure analyses, veriNA3d also implements functions to handle whole datasets of mmCIF/PDB structures that could be retrieved from public/local repositories. Our package aims to fill a gap in the data mining of nucleic acids structures to produce flexible and high throughput analysis of structural databases. Availability and implementation http://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

NACHO: an R package for quality control of NanoString nCounter data

Bioinformatics ◽

10.1093/bioinformatics/btz647 ◽

2019 ◽

Author(s):

Mickaël Canouil ◽

Gerard A Bouland ◽

Amélie Bonnefond ◽

Philippe Froguel ◽

Leen M ’t Hart ◽

...

Keyword(s):

Quality Control ◽

R Package ◽

Third Parties ◽

Supplementary Information ◽

Expression Data ◽

Supplementary Data ◽

Scaling Factors ◽

Comprehensive Method ◽

Normalization Methods

Abstract Summary The NanoStringTM nCounter® is a platform for the targeted quantification of expression data in biofluids and tissues. While software by the manufacturer is available in addition to third parties packages, they do not provide a complete quality control (QC) pipeline. Here, we present NACHO (‘NAnostring quality Control dasHbOard’), a comprehensive QC R-package. The package consists of three subsequent steps: summarize, visualize and normalize. The summarize function collects all the relevant data and stores it in a tidy format, the visualize function initiates a dashboard with plots of the relevant QC outcomes. It contains QC metrics that are measured by default by the manufacturer, but also calculates other insightful measures, including the scaling factors that are needed in the normalization step. In this normalization step, different normalization methods can be chosen to optimally preprocess data. Together, NACHO is a comprehensive method that optimizes insight and preprocessing of nCounter® data. Availability and implementation NACHO is available as an R-package on CRAN and the development version on GitHub https://github.com/mcanouil/NACHO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays

Genomics ◽

10.1016/j.ygeno.2006.12.014 ◽

2007 ◽

Vol 89 (4) ◽

pp. 552-561 ◽

Cited By ~ 119

Author(s):

Xiaohua Douglas Zhang

Keyword(s):

Quality Control ◽

Rna Interference ◽

High Throughput ◽

High Throughput Screening ◽

Statistical Parameters ◽

Screening Assays

Download Full-text

Robust Method for High-Throughput Screening of Fatty Acids by Multisegment Injection-Nonaqueous Capillary Electrophoresis–Mass Spectrometry with Stringent Quality Control

Analytical Chemistry ◽

10.1021/acs.analchem.8b05054 ◽

2018 ◽

Vol 91 (3) ◽

pp. 2329-2336 ◽

Cited By ~ 22

Author(s):

Sandi Azab ◽

Ritchie Ly ◽

Philip Britz-McKibbin

Keyword(s):

Mass Spectrometry ◽

Fatty Acids ◽

Quality Control ◽

Capillary Electrophoresis ◽

High Throughput ◽

High Throughput Screening ◽

Robust Method ◽

Nonaqueous Capillary Electrophoresis ◽

Stringent Quality ◽

Capillary Electrophoresis Mass Spectrometry

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

Top-Down Garbage Collector: a tool for selecting high-quality top-down proteomics mass spectra

Bioinformatics ◽

10.1093/bioinformatics/btz085 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3489-3490 ◽

Cited By ~ 1

Author(s):

Diogo B Lima ◽

André R F Silva ◽

Mathieu Dupré ◽

Marlon D M Santos ◽

Milan A Clasen ◽

...

Keyword(s):

Quality Control ◽

Mass Spectra ◽

Rate Increase ◽

Supplementary Information ◽

Supplementary Data ◽

Top Down ◽

High Quality ◽

Garbage Collector ◽

E Coli ◽

Spectral Libraries

Abstract Motivation We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates. Results We demonstrate that a twofold rate increase for two E. coli top-down proteomics datasets may be achievable. Availability and implementation http://patternlabforproteomics.org/tdgc, freely available for academic use. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Dissecting differential signals in high-throughput data from complex tissues

Bioinformatics ◽

10.1093/bioinformatics/btz196 ◽

2019 ◽

Vol 35 (20) ◽

pp. 3898-3905 ◽

Cited By ~ 5

Author(s):

Ziyi Li ◽

Zhijin Wu ◽

Peng Jin ◽

Hao Wu

Keyword(s):

High Throughput ◽

Cell Types ◽

R Package ◽

Supplementary Information ◽

Simulation Studies ◽

Clinical Practices ◽

High Throughput Data ◽

Heterogeneous Samples ◽

Cell Type Specific ◽

Different Cell Types

Abstract Motivation Samples from clinical practices are often mixtures of different cell types. The high-throughput data obtained from these samples are thus mixed signals. The cell mixture brings complications to data analysis, and will lead to biased results if not properly accounted for. Results We develop a method to model the high-throughput data from mixed, heterogeneous samples, and to detect differential signals. Our method allows flexible statistical inference for detecting a variety of cell-type specific changes. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method compared with existing ones serving similar purpose. Availability and implementation The proposed method is implemented as an R package and is freely available on GitHub (https://github.com/ziyili20/TOAST). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text