CoGe LoadExp+: A web-based suite that integrates next-gen sequencing data analysis workflows and visualization

NASQAR: A web-based platform for high-throughput sequencing data analysis and visualization

10.1101/709980 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C Gunsalus

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

High Throughput ◽

High Throughput Sequencing ◽

Web Applications ◽

Rna Seq ◽

Sequencing Data ◽

Web Based ◽

Link Type

AbstractBackgroundAs high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).ResultsNASQAR offers a collection of custom and publicly available open-source web applications that make extensive use of a variety of R packages to provide interactive data analysis and visualization. The platform is publicly accessible at http://nasqar.abudhabi.nyu.edu/. Open-source code is on GitHub at https://github.com/nasqar/NASQAR, and the system is also available as a Docker image at https://hub.docker.com/r/aymanm/nasqarall. NASQAR is a collaboration between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology.ConclusionsNASQAR empowers non-programming experts with a versatile and intuitive toolbox to easily and efficiently explore, analyze, and visualize their Transcriptomics data interactively. Popular tools for a variety of applications are currently available, including Transcriptome Data Preprocessing, RNA-seq Analysis (including Single-cell RNA-seq), Metagenomics, and Gene Enrichment.

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

10.7287/peerj.preprints.3417v2 ◽

2017 ◽

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbing Huang ◽

Jun Chen

Keyword(s):

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data ◽

Sequencing Data Analysis

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species

BMC Genomics ◽

10.1186/s12864-016-3118-z ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 5

Author(s):

Weizhong Li ◽

R. Alexander Richter ◽

Yunsup Jung ◽

Qiyun Zhu ◽

Robert W. Li

Keyword(s):

Animal Species ◽

Rna Seq ◽

Web Based ◽

Agricultural Animal ◽

End To End ◽

Bioinformatics Workflows

A Review of Cloud Computing Bioinformatics Solutions for Next-Gen Sequencing Data Analysis and Research

Methods in Next Generation Sequencing ◽

10.1515/mngs-2015-0003 ◽

2015 ◽

Vol 2 (1) ◽

Cited By ~ 1

Author(s):

Konstantinos Krampis ◽

Claudia Wultsch

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Large Scale ◽

Storage Capacity ◽

Biological Research ◽

Sequencing Data ◽

Computing Services ◽

Informatics Infrastructure ◽

And Storage ◽

Sequencing Data Analysis

Abstract Research in biology has entered a digital era, where next-generation sequencing instruments generate multiple terabytes of data but are equipped with minimal computational and storage capacity that is not sufficient for large-scale, post-sequencing data analysis. Therefore, scientific value cannot be obtained from investment in a sequencing instrument, unless it is also combined with a significant expense for informatics infrastructure. An alternative option for laboratories is to outsource their informatics infrastructure, by leasing computational cycles and storage capacity from cloud computing services. Development of cloud-based bioinformatics tool suites can provide users with access to pre-configured software and on-demand computing resources for genomic data analysis, while at the same time lower the barrier for working with sequencing datasets, leading to broader adoption of genomic technologies for basic biological research. We conclude that along with the democratization of genome sequencing through the availability of lowcost, bench-top sequencers, cloud computing can in turn democratize access to computational capacity and informatics infrastructures required for sequencing data analysis.

Transcript-Level Dysregulation of BCL2 Family Genes in Acute Myeloblastic Leukemia

Cancers ◽

10.3390/cancers13133175 ◽

2021 ◽

Vol 13 (13) ◽

pp. 3175

Author(s):

Luiza Handschuh ◽

Pawel Wojciechowski ◽

Maciej Kazmierczak ◽

Krzysztof Lewandowski

Keyword(s):

Exome Sequencing ◽

Family Members ◽

Transcript Level ◽

Rna Seq ◽

Sequencing Data ◽

Bcl2 Family ◽

Kaplan Meier ◽

Mutational Status ◽

Apoptosis Regulation ◽

Sequencing Data Analysis

The expression of apoptosis-related BCL2 family genes, fine-tuned in normal cells, is dysregulated in many neoplasms. In acute myeloid leukemia (AML), this problem has not been studied comprehensively. To address this issue, RNA-seq data were used to analyze the expression of 26 BCL2 family members in 27 AML FAB M1 and M2 patients, divided into subgroups differently responding to chemotherapy. A correlation analysis, analysis of variance, and Kaplan-Meier analysis were applied to associate the expression of particular genes with other gene expression, clinical features, and the presence of mutations detected by exome sequencing. The expression of BCL2 family genes was dysregulated in AML, as compared to healthy controls. An upregulation of anti-apoptotic and downregulation of pro-apoptotic genes was observed, though only a decrease in BMF, BNIP1, and HRK was statistically significant. In a group of patients resistant to chemotherapy, overexpression of BCL2L1 was manifested. In agreement with the literature data, our results reveal that BCL2L1 is one of the key players in apoptosis regulation in different types of tumors. An exome sequencing data analysis indicates that BCL2 family genes are not mutated in AML, but their expression is correlated with the mutational status of other genes, including those recurrently mutated in AML and splicing-related. High levels of some BCL2 family members, in particular BIK and BCL2L13, were associated with poor outcome.

RNASeqR: an R package for automated two-group RNA-Seq analysis workflow

10.1101/641324 ◽

2019 ◽

Author(s):

Kuan-Hao Chao ◽

Yi-Wen Hsiao ◽

Yi-Fang Lee ◽

Chien-Yueh Lee ◽

Liang-Chuan Lai ◽

...

Keyword(s):

R Package ◽

Software Tools ◽

Biological Research ◽

Command Line ◽

Rna Seq ◽

Sequencing Data ◽

Tissue Samples ◽

Fundamental Information ◽

Biological Interpretation ◽

Fast Light

RNA-Seq analysis has revolutionized researchers' understanding of the transcriptome in biological research. Assessing the differences in transcriptomic profiles between tissue samples or patient groups enables researchers to explore the underlying biological impact of transcription. RNA-Seq analysis requires multiple processing steps and huge computational capabilities. There are many well-developed R packages for individual steps; however, there are few R/Bioconductor packages that integrate existing software tools into a comprehensive RNA-Seq analysis and provide fundamental end-to-end results in pure R environment so that researchers can quickly and easily get fundamental information in big sequencing data. To address this need, we have developed the open source R/Bioconductor package, RNASeqR. It allows users to run an automated RNA-Seq analysis with only six steps, producing essential tabular and graphical results for further biological interpretation. The features of RNASeqR include: six-step analysis, comprehensive visualization, background execution version, and the integration of both R and command-line software. RNASeqR provides fast, light-weight, and easy-to-run RNA-Seq analysis pipeline in pure R environment. It allows users to efficiently utilize popular software tools, including both R/Bioconductor and command-line tools, without predefining the resources or environments. RNASeqR is freely available for Linux and macOS operating systems from Bioconductor (https://bioconductor.org/packages/release/bioc/html/RNASeqR.html).

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

10.7287/peerj.preprints.3417 ◽

2018 ◽

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbing Huang ◽

Xuefeng Wang ◽

...

Keyword(s):

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data ◽

Sequencing Data Analysis

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Comparison of Short-Read Sequence Aligners Indicates Strengths and Weaknesses for Biologists to Consider

Frontiers in Plant Science ◽

10.3389/fpls.2021.657240 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ryan Musich ◽

Lance Cadle-Davidson ◽

Michael V. Osier

Keyword(s):

Downstream Processing ◽

Rna Seq ◽

Sequencing Data ◽

Powdery Mildew Fungus ◽

Short Read ◽

Grapevine Powdery Mildew ◽

Limited Experience ◽

Secondary Factor ◽

Short Read Sequence ◽

Gene Coverage

Aligning short-read sequences is the foundational step to most genomic and transcriptomic analyses, but not all tools perform equally, and choosing among the growing body of available tools can be daunting. Here, in order to increase awareness in the research community, we discuss the merits of common algorithms and programs in a way that should be approachable to biologists with limited experience in bioinformatics. We will only in passing consider the effects of data cleanup, a precursor analysis to most alignment tools, and no consideration will be given to downstream processing of the aligned fragments. To compare aligners [Bowtie2, Burrows Wheeler Aligner (BWA), HISAT2, MUMmer4, STAR, and TopHat2], an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus Erysiphe necator. Based on alignment rate and gene coverage, all aligners performed well with the exception of TopHat2, which HISAT2 superseded. BWA perhaps had the best performance in these metrics, except for longer transcripts (>500 bp) for which HISAT2 and STAR performed well. HISAT2 was ~3-fold faster than the next fastest aligner in runtime, which we consider a secondary factor in most alignments. At the end, this direct comparison of commonly used aligners illustrates key considerations when choosing which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many quality aligners available.

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

10.7287/peerj.preprints.3417v1 ◽

2017 ◽

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbin Huang ◽

Jun Chen

Keyword(s):

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data ◽

Sequencing Data Analysis

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization

BMC Bioinformatics ◽

10.1186/s12859-020-03577-4 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C. Gunsalus

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Web Based ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis