PEPPRO: quality control and processing of nascent RNA profiling data

AbstractNascent RNA profiling is growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniformly processed output files for downstream analysis and assesses adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report. PEPPRO can be run locally or using a cluster, providing a portable first step for genomic nascent RNA analysis.

Download Full-text

Quality control and processing of nascent RNA profiling data

10.1101/2020.02.27.956110 ◽

2020 ◽

Author(s):

Jason P. Smith ◽

Arun B. Dutta ◽

Kizhakke Mattada Sathyan ◽

Michael J. Guertin ◽

Nathan C. Sheffield

Keyword(s):

Quality Control ◽

Fault Tolerant ◽

Resource Manager ◽

Web Based ◽

Rna Integrity ◽

Rna Profiling ◽

Library Complexity ◽

Nascent Rna ◽

Assess Quality ◽

Downstream Analysis

Experiments that profile nascent RNA are growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniform processed output files for downstream analysis, including alignment files, signal tracks, and count matrices. Furthermore, PEPPRO simplifies downstream analysis by using a standard project definition format which can be read using metadata APIs in R and Python. For quality control, PEPPRO provides several novel statistics and plots, including assessments of adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report for navigating results. It can be run on local hardware or using any cluster resource manager, using either native software or a provided modular Linux container environment. PEPPRO is thus a robust and portable first step for genomic nascent RNA analysis.AvailabilityBSD2-licensed code and documentation: https://peppro.databio.org.

Download Full-text

SC1: A web-based single cell RNA-seq analysis pipeline

2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2018.8542088 ◽

2018 ◽

Cited By ~ 2

Author(s):

Marmar Moussa ◽

Ion I. Mandoiu

Keyword(s):

Single Cell ◽

Rna Seq ◽

Analysis Pipeline ◽

Web Based

Download Full-text

Bactopia: a flexible pipeline for complete analysis of bacterial genomes

10.1101/2020.02.28.969394 ◽

2020 ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Standard Procedure ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Bacterial Genomes ◽

Analysis Pipeline ◽

Genomic Analyses ◽

Conserved Genes ◽

Downstream Analysis

AbstractSequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a dataset setup step (Bactopia Datasets; BaDs) where a series of customizable datasets are created for the species of interest; the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly and several other functions based on the available datasets and outputs the processed data to a structured directory format; and a series of Bactopia Tools (BaTs) that perform specific post-processing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on L. crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to thousands that allows for great flexibility in choosing comparison datasets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia.

Download Full-text

Fault Tolerant Cloud Systems

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch013 ◽

2019 ◽

pp. 171-190

Author(s):

Sathish Kumar ◽

Balamurugan B

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Software As A Service ◽

Arrival Process ◽

Cloud Provider ◽

Infrastructure As A Service ◽

Batch Mode ◽

Web Based ◽

Platform As A Service ◽

Cloud Systems

Cloud computing refers to a model for accessing computing resource like networks, servers, storage, applications, and services remotely. Cloud computing offers these resources as a service, namely infrastructure-as-a-service, platform-as-a-service, and software-as-a-service. To use these services, two roles involved: the cloud provider offers the service and the cloud customer consumes the service. These resources are efficiently shared and utilized by customers and it is called workload. The requirement of workload depends on customer demands that vary from higher to lower. Based on the customer demand, cloud provider makes the resource available efficiently. In the context of cloud, the workload is based on web-based service or jobs processed in batch mode. The arrival process of jobs in the cloud is not often deterministic. The irregular increase or decrease in workload has a vital impact on resource provision. Monitoring the resources helps in measuring the performance of the cloud so that the resource can be provisioned to customers efficiently.

Download Full-text

ASAP 2020 update: an open, scalable and interactive web-based portal for (single-cell) omics analyses

Nucleic Acids Research ◽

10.1093/nar/gkaa412 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W403-W414

Author(s):

Fabrice P A David ◽

Maria Litovchenko ◽

Bart Deplancke ◽

Vincent Gardeux

Keyword(s):

Big Data ◽

Single Cell ◽

Single Cell Analysis ◽

Omics Data ◽

Cell Analysis ◽

Rna Seq ◽

Analysis Pipeline ◽

Web Based ◽

Data Analyses ◽

The Web

Abstract Single-cell omics enables researchers to dissect biological systems at a resolution that was unthinkable just 10 years ago. However, this analytical revolution also triggered new demands in ‘big data’ management, forcing researchers to stay up to speed with increasingly complex analytical processes and rapidly evolving methods. To render these processes and approaches more accessible, we developed the web-based, collaborative portal ASAP (Automated Single-cell Analysis Portal). Our primary goal is thereby to democratize single-cell omics data analyses (scRNA-seq and more recently scATAC-seq). By taking advantage of a Docker system to enhance reproducibility, and novel bioinformatics approaches that were recently developed for improving scalability, ASAP meets challenging requirements set by recent cell atlasing efforts such as the Human (HCA) and Fly (FCA) Cell Atlas Projects. Specifically, ASAP can now handle datasets containing millions of cells, integrating intuitive tools that allow researchers to collaborate on the same project synchronously. ASAP tools are versioned, and researchers can create unique access IDs for storing complete analyses that can be reproduced or completed by others. Finally, ASAP does not require any installation and provides a full and modular single-cell RNA-seq analysis pipeline. ASAP is freely available at https://asap.epfl.ch.

Download Full-text

MicroRNA-Seq Data Analysis Pipeline to Identify Blood Biomarkers for Alzheimer's Disease from Public Data

Biomarker Insights ◽

10.4137/bmi.s25132 ◽

2015 ◽

Vol 10 ◽

pp. BMI.S25132 ◽

Cited By ~ 68

Author(s):

Jun-ichi Satoh ◽

Yoshihiro Kino ◽

Shumpei Niida

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Data Analysis ◽

Small Rna ◽

Small Rna Sequencing ◽

Rna Seq ◽

Analysis Pipeline ◽

Web Based ◽

Public Data ◽

Data Analysis Pipeline

Background Alzheimer's disease (AD) is the most common cause of dementia with no curative therapy currently available. Establishment of sensitive and non-invasive biomarkers that promote an early diagnosis of AD is crucial for the effective administration of disease-modifying drugs. MicroRNAs (miRNAs) mediate posttranscriptional repression of numerous target genes. Aberrant regulation of miRNA expression is implicated in AD pathogenesis, and circulating miRNAs serve as potential biomarkers for AD. However, data analysis of numerous AD-specific miRNAs derived from small RNA-sequencing (RNA-Seq) is most often laborious. Methods To identify circulating miRNA biomarkers for AD, we reanalyzed a publicly available small RNA-Seq dataset, composed of blood samples derived from 48 AD patients and 22 normal control (NC) subjects, by a simple web-based miRNA data analysis pipeline that combines omiRas and DIANA miRPath. Results By using omiRas, we identified 27 miRNAs expressed differentially between both groups, including upregulation in AD of miR-26b-3p, miR-28–3p, miR-30c-5p, miR-30d-5p, miR-148b-5p, miR-151a-3p, miR-186–5p, miR-425–5p, miR-550a-5p, miR-1468, miR-4781–3p, miR-5001–3p, and miR-6513–3p and downregulation in AD of let-7a-5p, let-7e-5p, let-7f-5p, let-7g-5p, miR-15a-5p, miR-17–3p, miR-29b-3p, miR-98–5p, miR-144–5p, miR-148a-3p, miR-502–3p, miR-660–5p, miR-1294, and miR-3200–3p. DIANA miRPath indicated that miRNA-regulated pathways potentially down– regulated in AD are linked with neuronal synaptic functions, while those upregulated in AD are implicated in cell survival and cellular communication. Conclusions The simple web-based miRNA data analysis pipeline helps us to effortlessly identify candidates for miRNA biomarkers and pathways of AD from the complex small RNA–Seq data.

Download Full-text

Demystifying “drop-outs” in single cell UMI data

10.1101/2020.03.31.018911 ◽

2020 ◽

Author(s):

Tae Kim ◽

Xiang Zhou ◽

Mengjie Chen

Keyword(s):

Feature Selection ◽

Data Analysis ◽

Comparative Studies ◽

Biological Information ◽

Cell Type ◽

Analysis Pipeline ◽

Count Model ◽

Variance Stabilization ◽

Downstream Analysis ◽

Drop Outs

AbstractAnalysis of scRNA-seq data has been challenging particularly because of excessive zeros observed in UMI counts. Prevalent opinions are that many of the detected zeros are “drop-outs” that occur during experiments and that those zeros should be accounted for through procedures such as normalization, variance stabilization, and imputation. Here, we extensively analyze publicly available UMI datasets and challenge the existing scRNA-seq workflows. Our results strongly suggest that resolving cell-type heterogeneity should be the foremost step of the scRNA-seq analysis pipeline because once cell-type heterogeneity is resolved, “drop-outs” disappear. Additionally, we show that the simplest parametric count model, Poisson, is sufficient to fully leverage the biological information contained in the UMI data, thus offering a more optimistic view of the data analysis. However, if the cell-type heterogeneity is not appropriately taken into account, pre-processing such as normalization or imputation becomes inappropriate and can introduce unwanted noise. Inspired by these analyses, we propose a zero inflation test that can select gene features contributing to cell-type heterogeneity. We integrate feature selection and clustering into iterative pre-processing in our novel, efficient, and straightforward framework for UMI analysis, HIPPO (Heterogeneity Inspired Pre-Processing tOol). HIPPO leads to downstream analysis with much better interpretability than alternatives in our comparative studies.

Download Full-text

PEPATAC: An optimized ATAC-seq pipeline with serial alignments

10.1101/2020.10.21.347054 ◽

2020 ◽

Author(s):

Jason P. Smith ◽

M. Ryan Corces ◽

Jin Xu ◽

Vincent P. Reuter ◽

Howard Y. Chang ◽

...

Keyword(s):

Quality Control ◽

Large Scale ◽

Fault Tolerant ◽

Chromatin Accessibility ◽

Resource Manager ◽

Specific Data ◽

Data Formats ◽

Quality Control Metrics ◽

Downstream Analysis ◽

Analytical Approaches

MotivationAs chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects.ResultsPEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project.AvailabilityBSD2-licensed code and documentation at https://pepatac.databio.org.

Download Full-text

Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists

10.1101/110759 ◽

2017 ◽

Cited By ~ 1

Author(s):

Xun Zhu ◽

Thomas Wolfgruber ◽

Austin Tasato ◽

David G. Garmire ◽

Lana X Garmire

Keyword(s):

Single Cell ◽

Enrichment Analysis ◽

Graphical Interface ◽

Analysis Pipeline ◽

Web Based ◽

Cell Clustering ◽

Gene Filtering ◽

Study Heterogeneity ◽

Differential Gene ◽

Cell Series

AbstractBackgroundSingle-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level.Computational methods to process scRNA-Seq have limited accessibility to bench scientists as they require significant amounts of bioinformatics skills.ResultsWe have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene filtering, geneexpression normalization, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein-networ interaction visualization, and pseudo-time cell series construction.ConclusionsGranatum enables broad adoption of scRNA-Seq technology by empowering the bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use athttp://garmiregroup.org/granatum/app

Download Full-text

Fault Tolerant Cloud Systems

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch093 ◽

2018 ◽

pp. 1075-1090

Author(s):

Sathish Kumar ◽

Balamurugan B

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Software As A Service ◽

Arrival Process ◽

Cloud Provider ◽

Infrastructure As A Service ◽

Batch Mode ◽

Web Based ◽

Platform As A Service ◽

Cloud Systems

Cloud computing refers to a model for accessing computing resource like networks, servers, storage, applications and services by remotely. Cloud computing offers these resources as a service, namely infrastructure –as-a-service, platform-as-a-service, and software-as-a-service. To use these service two roles involved: the cloud provider offers the service and the cloud customer consumes the service. These resources are efficiently shared and utilized by customers and it is called workload. The requirement of workload depends on customer demands that vary from higher to lower. Based on the customer demand, cloud provider makes the resource available efficiently. In the context of cloud, the workload is based on web-based service or jobs processed in batch mode. The arrival process of jobs in the cloud is no often deterministic. The irregular increase or decrease in workload has a vital impact on resource provision. Monitoring the resources helps in measuring the performance of the cloud so that the resource can be provisioned to customers efficiently.

Download Full-text