Quality control and processing of nascent RNA profiling data

Experiments that profile nascent RNA are growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniform processed output files for downstream analysis, including alignment files, signal tracks, and count matrices. Furthermore, PEPPRO simplifies downstream analysis by using a standard project definition format which can be read using metadata APIs in R and Python. For quality control, PEPPRO provides several novel statistics and plots, including assessments of adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report for navigating results. It can be run on local hardware or using any cluster resource manager, using either native software or a provided modular Linux container environment. PEPPRO is thus a robust and portable first step for genomic nascent RNA analysis.AvailabilityBSD2-licensed code and documentation: https://peppro.databio.org.

Download Full-text

PEPPRO: quality control and processing of nascent RNA profiling data

Genome Biology ◽

10.1186/s13059-021-02349-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jason P. Smith ◽

Arun B. Dutta ◽

Kizhakke Mattada Sathyan ◽

Michael J. Guertin ◽

Nathan C. Sheffield

Keyword(s):

Fault Tolerant ◽

Analysis Pipeline ◽

Web Based ◽

Rna Integrity ◽

Rna Profiling ◽

Project Report ◽

Library Complexity ◽

Nascent Rna ◽

Assess Quality ◽

Downstream Analysis

AbstractNascent RNA profiling is growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniformly processed output files for downstream analysis and assesses adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report. PEPPRO can be run locally or using a cluster, providing a portable first step for genomic nascent RNA analysis.

Download Full-text

PEPATAC: An optimized ATAC-seq pipeline with serial alignments

10.1101/2020.10.21.347054 ◽

2020 ◽

Author(s):

Jason P. Smith ◽

M. Ryan Corces ◽

Jin Xu ◽

Vincent P. Reuter ◽

Howard Y. Chang ◽

...

Keyword(s):

Quality Control ◽

Large Scale ◽

Fault Tolerant ◽

Chromatin Accessibility ◽

Resource Manager ◽

Specific Data ◽

Data Formats ◽

Quality Control Metrics ◽

Downstream Analysis ◽

Analytical Approaches

MotivationAs chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects.ResultsPEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project.AvailabilityBSD2-licensed code and documentation at https://pepatac.databio.org.

Download Full-text

PEPATAC: an optimized pipeline for ATAC-seq data analysis with serial alignments

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab101 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Jason P Smith ◽

M Ryan Corces ◽

Jin Xu ◽

Vincent P Reuter ◽

Howard Y Chang ◽

...

Keyword(s):

Quality Control ◽

Data Analysis ◽

Large Scale ◽

Fault Tolerant ◽

Chromatin Accessibility ◽

Resource Manager ◽

Data Formats ◽

Quality Control Metrics ◽

Downstream Analysis ◽

Analytical Approaches

Abstract As chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. PEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project. BSD2-licensed code and documentation are available at https://pepatac.databio.org.

Download Full-text

A Regional Web-Based Automated Quality Control Platform

Breast Imaging - Lecture Notes in Computer Science ◽

10.1007/978-3-319-07887-8_62 ◽

2014 ◽

pp. 444-451 ◽

Cited By ~ 1

Author(s):

Stephen Smithbower ◽

Rasika Rajapakshe ◽

Janette Sam ◽

Nancy Aldoff ◽

Teresa Wight

Keyword(s):

Quality Control ◽

Web Based ◽

Automated Quality Control

Download Full-text

Effect of implementing a web-based application for spirometry quality control in a public health system. A 10-year prospective study, including the covid-19 pandemic

10.1183/13993003.congress-2021.pa3629 ◽

2021 ◽

Author(s):

Joseba Andia Iturrate ◽

Elena Garay Llorente ◽

Alejandro Rezola Carasusan ◽

Edurne Echevarria Guerrero ◽

Elena Lopez Santamaria ◽

...

Keyword(s):

Public Health ◽

Quality Control ◽

Prospective Study ◽

Health System ◽

Public Health System ◽

Web Based ◽

System A

Download Full-text

Fault Tolerant Cloud Systems

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch013 ◽

2019 ◽

pp. 171-190

Author(s):

Sathish Kumar ◽

Balamurugan B

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Software As A Service ◽

Arrival Process ◽

Cloud Provider ◽

Infrastructure As A Service ◽

Batch Mode ◽

Web Based ◽

Platform As A Service ◽

Cloud Systems

Cloud computing refers to a model for accessing computing resource like networks, servers, storage, applications, and services remotely. Cloud computing offers these resources as a service, namely infrastructure-as-a-service, platform-as-a-service, and software-as-a-service. To use these services, two roles involved: the cloud provider offers the service and the cloud customer consumes the service. These resources are efficiently shared and utilized by customers and it is called workload. The requirement of workload depends on customer demands that vary from higher to lower. Based on the customer demand, cloud provider makes the resource available efficiently. In the context of cloud, the workload is based on web-based service or jobs processed in batch mode. The arrival process of jobs in the cloud is not often deterministic. The irregular increase or decrease in workload has a vital impact on resource provision. Monitoring the resources helps in measuring the performance of the cloud so that the resource can be provisioned to customers efficiently.

Download Full-text

A Value Based Dynamic Resource Provisioning Model in Cloud

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2013040104 ◽

2013 ◽

Vol 3 (2) ◽

pp. 35-46 ◽

Cited By ~ 1

Author(s):

Sandeep K. Sood

Keyword(s):

Resource Utilization ◽

Virtual Machines ◽

Fault Tolerant ◽

Resource Provisioning ◽

Resource Usage ◽

Resource Manager ◽

Usage Rate ◽

Computing Paradigm ◽

A Value ◽

Dynamic Resource Provisioning

Cloud computing has become an innovative computing paradigm, which aims at providing reliable, customized, Quality of Service (QoS) and guaranteed computing infrastructures for users. Efficient resource provisioning is required in cloud for effective resource utilization. For resource provisioning, cloud provides virtualized computing resources that are dynamically scalable. This property of cloud differentiates it from the traditional computing paradigm. But the initialization of a new virtual instance causes a several minutes delay in the hardware resource allocation. Furthermore, cloud provides a fault tolerant service to its clients using the virtualization. But, in order to attain higher resource utilization over this technology, a technique or a strategy is needed using which virtual machines can be deployed over physical machines by predicting its need in advance so that the delay can be avoided. To address these issues, a value based prediction model in this paper is proposed for resource provisioning in which a resource manager is used for dynamically allocating or releasing a virtual machine depending upon the resource usage rate. In order to know the recent resource usage rate, the resource manager uses sliding window to analyze the resource usage rate and to predict the system behavior in advance. By predicting the resource requirements in advance, a lot of processing time can be saved. Earlier, a server has to perform all the calculations regarding the resource usage that in turn wastes a lot of processing power thus decreasing its overall capacity to handle the incoming request. The main feature of the proposed model is that a lot of load is being shifted from the individual server to the resource manager as it performs all the calculations and therefore the server is free to handle the incoming requests to its full capacity.

Download Full-text

Web-based quality control of ready mixed concrete

Building and Environment ◽

10.1016/j.buildenv.2005.12.020 ◽

2007 ◽

Vol 42 (3) ◽

pp. 1465-1470 ◽

Cited By ~ 4

Author(s):

Ömer Arıöz ◽

Gökhan Arslan ◽

Mustafa Tuncan ◽

Serkan Kıvrak

Keyword(s):

Quality Control ◽

Web Based ◽

Ready Mixed Concrete ◽

Mixed Concrete

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

Demystification of RNAseq Quality Control

JITA - Journal of Information Technology and Applications (Banja Luka) - APEIRON ◽

10.7251/jit2102073d ◽

2021 ◽

Vol 22 (2) ◽

Author(s):

Dragana Dudić ◽

Bojana Banović Đeri ◽

Vesna Pajić ◽

Gordana Pavlović-Lažetić

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Rna Sequencing ◽

Next Generation ◽

Comprehensive Guidance ◽

Dna And Rna ◽

Control Evaluation ◽

Downstream Analysis ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Next Generation Sequencing (NGS) analysis has become a widely used method for studying the structure of DNA and RNA, but complexity of the procedure leads to obtaining error-prone datasets which need to be cleansed in order to avoid misinterpretation of data. We address the usage and proper interpretations of characteristic metrics for RNA sequencing (RNAseq) quality control, implemented in and reported by FastQC, and provide a comprehensive guidance for their assessment in the context of total RNAseq quality control of Illumina raw reads. Additionally, we give recommendations how to adequately perform the quality control preprocessing step of raw total RNAseq Illumina reads according to the obtained results of the quality control evaluation step; the aim is to provide the best dataset to downstream analysis, rather than to get better FastQC results. We also tested effects of different preprocessing approaches to the downstream analysis and recommended the most suitable approach.

Download Full-text