scholarly journals Reproducible Bioconductor Workflows Using Browser-based Interactive Notebooks and Containers

2017 ◽  
Author(s):  
Reem Almugbel ◽  
Ling-Hong Hung ◽  
Jiaming Hu ◽  
Abeer Almutairy ◽  
Nicole Ortogero ◽  
...  

ABSTRACTObjectiveBioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, to allow users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server.Materials and methodsWe present three different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets and process RNA-seq data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need for installation of any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder.ResultsBiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods.ConclusionGiven the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as ubiquitous and necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols.

2017 ◽  
Vol 25 (1) ◽  
pp. 4-12 ◽  
Author(s):  
Reem Almugbel ◽  
Ling-Hong Hung ◽  
Jiaming Hu ◽  
Abeer Almutairy ◽  
Nicole Ortogero ◽  
...  

Abstract Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Guan Wang ◽  
Traci Kitaoka ◽  
Ali Crawford ◽  
Qian Mao ◽  
Andrew Hesketh ◽  
...  

AbstractRNA-seq has matured and become an important tool for studying RNA biology. Here we compared two RNA-seq (MGI DNBSEQ and Illumina NextSeq 500) and two microarray platforms (GeneChip Human Transcriptome Array 2.0 and Illumina Expression BeadChip) in healthy individuals administered recombinant human erythropoietin for transcriptome-wide quantification of differential gene expression. The results show that total RNA DNB-seq generated a multitude of target genes compared to other platforms. Pathway enrichment analyses revealed genes correlate to not only erythropoiesis and oxygen transport but also a wide range of other functions, such as tissue protection and immune regulation. This study provides a knowledge base of genes relevant to EPO biology through cross-platform comparisons and validation.


2021 ◽  
Vol 22 (18) ◽  
pp. 9870
Author(s):  
Julia Panov ◽  
Hanoch Kaphzan

Angelman-like syndromes are a group of neurodevelopmental disorders that entail clinical presentation similar to Angelman Syndrome (AS). In our previous study, we showed that calcium signaling is disrupted in AS, and we identified calcium-target and calcium-regulating gene signatures that are able to differentiate between AS and their controls in different models. In the herein study, we evaluated these sets of calcium-target and calcium-regulating genes as signatures of AS-like and non-AS-like syndromes. We collected a number of RNA-seq datasets of various AS-like and non-AS-like syndromes and performed Principle Component Analysis (PCA) separately on the two sets of signature genes to visualize the distribution of samples on the PC1–PC2 plane. In addition to the evaluation of calcium signature genes, we performed differential gene expression analyses to identify calcium-related genes dysregulated in each of the studied syndromes. These analyses showed that the calcium-target and calcium-regulating signatures differentiate well between AS-like syndromes and their controls. However, in spite of the fact that many of the non-AS-like syndromes have multiple differentially expressed calcium-related genes, the calcium signatures were not efficient classifiers for non-AS-like neurodevelopmental disorders. These results show that features based on clinical presentation are reflected in signatures derived from bioinformatics analyses and suggest the use of bioinformatics as a tool for classification.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alessandro La Ferlita ◽  
Salvatore Alaimo ◽  
Sebastiano Di Bella ◽  
Emanuele Martorana ◽  
Georgios I. Laliotis ◽  
...  

Abstract Background RNA-Seq is a well-established technology extensively used for transcriptome profiling, allowing the analysis of coding and non-coding RNA molecules. However, this technology produces a vast amount of data requiring sophisticated computational approaches for their analysis than other traditional technologies such as Real-Time PCR or microarrays, strongly discouraging non-expert users. For this reason, dozens of pipelines have been deployed for the analysis of RNA-Seq data. Although interesting, these present several limitations and their usage require a technical background, which may be uncommon in small research laboratories. Therefore, the application of these technologies in such contexts is still limited and causes a clear bottleneck in knowledge advancement. Results Motivated by these considerations, we have developed RNAdetector, a new free cross-platform and user-friendly RNA-Seq data analysis software that can be used locally or in cloud environments through an easy-to-use Graphical User Interface allowing the analysis of coding and non-coding RNAs from RNA-Seq datasets of any sequenced biological species. Conclusions RNAdetector is a new software that fills an essential gap between the needs of biomedical and research labs to process RNA-Seq data and their common lack of technical background in performing such analysis, which usually relies on outsourcing such steps to third party bioinformatics facilities or using expensive commercial software.


2021 ◽  
Author(s):  
Guan Wang ◽  
Traci Kitaoka ◽  
Ali Crawford ◽  
Qian Mao ◽  
Andrew Hesketh ◽  
...  

Abstract RNA-seq has matured and become an important tool for studying RNA biology. Here we compared two RNA-seq (Illumina sequencing by synthesis and MGI DNBSEQ™) and two microarray platforms (Illumina Expression BeadChip and GeneChip™ Human Transcriptome Array 2.0) in healthy individuals administered recombinant human erythropoietin for transcriptome-wide quantification of differential gene expression. The results show that total RNA sequencing combined with DNB-seq produced a multitude of genes of biological relevance and significance in response to recombinant human erythropoietin, in contrast to other platforms. Through data triangulation linking genes to functions, genes representing the processes of erythropoiesis as well as non-erythropoietic functions of erythropoietin were unveiled. This study provides a knowledge base of genes characterising the responses to recombinant human erythropoietin through cross-platform comparison and validation.


2017 ◽  
Author(s):  
Gabriela A. Merino ◽  
Ana Conesa ◽  
Elmer A. Fernández

ABSTRACTOver the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases. Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over three different experimental scenarios where changes in absolute and relative isoform expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on DESeq, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular, we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated.


Author(s):  
Katharina T. Schmid ◽  
Cristiana Cruceanu ◽  
Anika Böttcher ◽  
Heiko Lickert ◽  
Elisabeth B. Binder ◽  
...  

AbstractBackgroundThe identification of genes associated with specific experimental conditions, genotypes or phenotypes through differential expression analysis has long been the cornerstone of transcriptomic analysis. Single cell RNA-seq is revolutionizing transcriptomics and is enabling interindividual differential gene expression analysis and identification of genetic variants associated with gene expression, so called expression quantitative trait loci at cell-type resolution. Current methods for power analysis and guidance of experimental design either do not account for the specific characteristics of single cell data or are not suitable to model interindividual comparisons.ResultsHere we present a statistical framework for experimental design and power analysis of single cell differential gene expression between groups of individuals and expression quantitative trait locus analysis. The model relates sample size, number of cells per individual and sequencing depth to the power of detecting differentially expressed genes within individual cell types. Power analysis is based on data driven priors from literature or pilot experiments across a wide range of application scenarios and single cell RNA-seq platforms. Using these priors we show that, for a fixed budget, the number of cells per individual is the major determinant of power.ConclusionOur model is general and allows for systematic comparison of alternative experimental designs and can thus be used to guide experimental design to optimize power. For a wide range of applications, shallow sequencing of high numbers of cells per individual leads to higher overall power than deep sequencing of fewer cells. The model is implemented as an R package scPower.


2021 ◽  
Author(s):  
Xiao Zheng ◽  
Jiajun Cui ◽  
Yixuan Wang ◽  
Jing Zhang ◽  
Chaochen Wang

AbstractCRISPR-based gene activation (CRISPRa) or interference (CRISPRi) are powerful and easy-to-use approaches to modify the transcription of endogenous genes in eukaryotes. Successful CRISPRa/i requires sgRNA binding and alteration of local chromatin structure, hence largely depends on the original epigenetic status of the target. Consequently, the efficacy of the CRISPRa/i varies in a wide range when applied to target different gene loci, while a reliable prediction tool is unavailable. To address this problem, we integrated published single cell RNA-Seq data involved CRISPRa/i and epigenomic profiles from K562 cells, identified the significant epigenetic features contributing to CRISPRa/i efficacy by ranking the weight of each feature. We further established a mathematic model and built a user-friendly webtool to predict the CRISPRa/i efficacy of customer-designed sgRNA in different cells. Moreover, we experimentally validated our model by employing CROP-Seq assays. Our work provides both the epigenetic insights into CRISPRa/i and an effective tool for the users.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nicholas J. Eagles ◽  
Emily E. Burke ◽  
Jacob Leonard ◽  
Brianna K. Barry ◽  
Joshua M. Stolz ◽  
...  

Abstract Background RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step–such as alignment of reads to a reference genome–of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. Results In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided (http://research.libd.org/SPEAQeasy/). Conclusions SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.


2020 ◽  
Author(s):  
Zihan Zheng ◽  
Qiu Xin ◽  
Haiyang Wu ◽  
Ling Chang ◽  
Xiangyu Tang ◽  
...  

Recent advances in bioinformatics analyses have led to the development of novel tools enabling the capture and trajectory mapping of single-cell RNA sequencing (scRNAseq) data. However, there is a lack of methods to assess the contributions of biological pathways and transcription factors to an overall developmental trajectory mapped from scRNAseq data. In this manuscript, we present a simplified approach for trajectory inference of pathway significance (TIPS) that leverages existing knowledgebases of functional pathways and transcription factor targets to enable further mechanistic insights into a biological process. TIPS returns both the key pathways whose changes are associated with the process of interest, as well as the individual genes that best reflect these changes. TIPS also provides insight into the relative timing of pathway changes, as well as a suite of visualizations to enable simplified data interpretation of scRNAseq libraries generated using a wide range of techniques. The TIPS package can be run through either a web server, or downloaded as a user-friendly GUI run in R, and may serve as a useful tool to help biologists perform deeper functional analyses and visualization of their single-cell and/or large cohort RNAseq data.


Sign in / Sign up

Export Citation Format

Share Document