scholarly journals Quality control and processing of nascent RNA profiling data

2020 ◽  
Author(s):  
Jason P. Smith ◽  
Arun B. Dutta ◽  
Kizhakke Mattada Sathyan ◽  
Michael J. Guertin ◽  
Nathan C. Sheffield

Experiments that profile nascent RNA are growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniform processed output files for downstream analysis, including alignment files, signal tracks, and count matrices. Furthermore, PEPPRO simplifies downstream analysis by using a standard project definition format which can be read using metadata APIs in R and Python. For quality control, PEPPRO provides several novel statistics and plots, including assessments of adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report for navigating results. It can be run on local hardware or using any cluster resource manager, using either native software or a provided modular Linux container environment. PEPPRO is thus a robust and portable first step for genomic nascent RNA analysis.AvailabilityBSD2-licensed code and documentation: https://peppro.databio.org.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jason P. Smith ◽  
Arun B. Dutta ◽  
Kizhakke Mattada Sathyan ◽  
Michael J. Guertin ◽  
Nathan C. Sheffield

AbstractNascent RNA profiling is growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniformly processed output files for downstream analysis and assesses adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report. PEPPRO can be run locally or using a cluster, providing a portable first step for genomic nascent RNA analysis.


2020 ◽  
Author(s):  
Jason P. Smith ◽  
M. Ryan Corces ◽  
Jin Xu ◽  
Vincent P. Reuter ◽  
Howard Y. Chang ◽  
...  

MotivationAs chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects.ResultsPEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project.AvailabilityBSD2-licensed code and documentation at https://pepatac.databio.org.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Jason P Smith ◽  
M Ryan Corces ◽  
Jin Xu ◽  
Vincent P Reuter ◽  
Howard Y Chang ◽  
...  

Abstract As chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. PEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project. BSD2-licensed code and documentation are available at https://pepatac.databio.org.


Author(s):  
Stephen Smithbower ◽  
Rasika Rajapakshe ◽  
Janette Sam ◽  
Nancy Aldoff ◽  
Teresa Wight

Author(s):  
Joseba Andia Iturrate ◽  
Elena Garay Llorente ◽  
Alejandro Rezola Carasusan ◽  
Edurne Echevarria Guerrero ◽  
Elena Lopez Santamaria ◽  
...  

Author(s):  
Sathish Kumar ◽  
Balamurugan B

Cloud computing refers to a model for accessing computing resource like networks, servers, storage, applications, and services remotely. Cloud computing offers these resources as a service, namely infrastructure-as-a-service, platform-as-a-service, and software-as-a-service. To use these services, two roles involved: the cloud provider offers the service and the cloud customer consumes the service. These resources are efficiently shared and utilized by customers and it is called workload. The requirement of workload depends on customer demands that vary from higher to lower. Based on the customer demand, cloud provider makes the resource available efficiently. In the context of cloud, the workload is based on web-based service or jobs processed in batch mode. The arrival process of jobs in the cloud is not often deterministic. The irregular increase or decrease in workload has a vital impact on resource provision. Monitoring the resources helps in measuring the performance of the cloud so that the resource can be provisioned to customers efficiently.


2013 ◽  
Vol 3 (2) ◽  
pp. 35-46 ◽  
Author(s):  
Sandeep K. Sood

Cloud computing has become an innovative computing paradigm, which aims at providing reliable, customized, Quality of Service (QoS) and guaranteed computing infrastructures for users. Efficient resource provisioning is required in cloud for effective resource utilization. For resource provisioning, cloud provides virtualized computing resources that are dynamically scalable. This property of cloud differentiates it from the traditional computing paradigm. But the initialization of a new virtual instance causes a several minutes delay in the hardware resource allocation. Furthermore, cloud provides a fault tolerant service to its clients using the virtualization. But, in order to attain higher resource utilization over this technology, a technique or a strategy is needed using which virtual machines can be deployed over physical machines by predicting its need in advance so that the delay can be avoided. To address these issues, a value based prediction model in this paper is proposed for resource provisioning in which a resource manager is used for dynamically allocating or releasing a virtual machine depending upon the resource usage rate. In order to know the recent resource usage rate, the resource manager uses sliding window to analyze the resource usage rate and to predict the system behavior in advance. By predicting the resource requirements in advance, a lot of processing time can be saved. Earlier, a server has to perform all the calculations regarding the resource usage that in turn wastes a lot of processing power thus decreasing its overall capacity to handle the incoming request. The main feature of the proposed model is that a lot of load is being shifted from the individual server to the resource manager as it performs all the calculations and therefore the server is free to handle the incoming requests to its full capacity.


2007 ◽  
Vol 42 (3) ◽  
pp. 1465-1470 ◽  
Author(s):  
Ömer Arıöz ◽  
Gökhan Arslan ◽  
Mustafa Tuncan ◽  
Serkan Kıvrak

2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13890 ◽  
Author(s):  
Changjin Hong ◽  
Solaiappan Manimaran ◽  
William Evan Johnson

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .


Author(s):  
Dragana Dudić ◽  
Bojana Banović Đeri ◽  
Vesna Pajić ◽  
Gordana Pavlović-Lažetić

Next Generation Sequencing (NGS) analysis has become a widely used method for studying the structure of DNA and RNA, but complexity of the procedure leads to obtaining error-prone datasets which need to be cleansed in order to avoid misinterpretation of data. We address the usage and proper interpretations of characteristic metrics for RNA sequencing (RNAseq) quality control, implemented in and reported by FastQC, and provide a comprehensive guidance for their assessment in the context of total RNAseq quality control of Illumina raw reads. Additionally, we give recommendations how to adequately perform the quality control preprocessing step of raw total RNAseq Illumina reads according to the obtained results of the quality control evaluation step; the aim is to provide the best dataset to downstream analysis, rather than to get better FastQC results. We also tested effects of different preprocessing approaches to the downstream analysis and recommended the most suitable approach.


Sign in / Sign up

Export Citation Format

Share Document