Evaluation of Seven Different RNA-Seq Alignment Tools Based on Experimental Data from the Model Plant Arabidopsis thaliana

Stephanie Schaarschmidt; Axel Fischer; Ellen Zuther; Dirk K. Hincha

doi:10.3390/ijms21051720

Evaluation of Seven Different RNA-Seq Alignment Tools Based on Experimental Data from the Model Plant Arabidopsis thaliana

International Journal of Molecular Sciences ◽

10.3390/ijms21051720 ◽

2020 ◽

Vol 21 (5) ◽

pp. 1720 ◽

Cited By ~ 5

Author(s):

Stephanie Schaarschmidt ◽

Axel Fischer ◽

Ellen Zuther ◽

Dirk K. Hincha

Keyword(s):

Gene Expression ◽

Arabidopsis Thaliana ◽

Reference Genome ◽

Physiological Data ◽

Rna Seq ◽

Genetic Perturbations ◽

Model Plant Arabidopsis Thaliana ◽

Differential Gene ◽

Highly Correlated

Quantification of gene expression is crucial to connect genome sequences with phenotypic and physiological data. RNA-Sequencing (RNA-Seq) has taken a prominent role in the study of transcriptomic reactions of plants to various environmental and genetic perturbations. However, comparative tests of different tools for RNA-Seq read mapping and quantification have been mainly performed on data from animals or humans, which necessarily neglect, for example, the large genetic variability among natural accessions within plant species. Here, we compared seven computational tools for their ability to map and quantify Illumina single-end reads from the Arabidopsis thaliana accessions Columbia-0 (Col-0) and N14. Between 92.4% and 99.5% of all reads were mapped to the reference genome or transcriptome and the raw count distributions obtained from the different mappers were highly correlated. Using the software DESeq2 to determine differential gene expression (DGE) between plants exposed to 20 °C or 4 °C from these read counts showed a large pairwise overlap between the mappers. Interestingly, when the commercial CLC software was used with its own DGE module instead of DESeq2, strongly diverging results were obtained. All tested mappers provided highly similar results for mapping Illumina reads of two polymorphic Arabidopsis accessions to the reference genome or transcriptome and for the determination of DGE when the same software was used for processing.

Download Full-text

How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana

Bioinformatics ◽

10.1093/bioinformatics/btz089 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3372-3377 ◽

Cited By ~ 2

Author(s):

Kimon Froussios ◽

Nick J Schurch ◽

Katarzyna Mackinnon ◽

Marek Gierliński ◽

Céline Duc ◽

...

Keyword(s):

Gene Expression ◽

Arabidopsis Thaliana ◽

Normal Distribution ◽

Differential Gene Expression ◽

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Supplementary Information ◽

Rna Seq ◽

Differential Gene

Abstract Motivation RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. Results We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. Availability and implementation The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ProkSeq for complete analysis of RNA-Seq data from prokaryotes

Bioinformatics ◽

10.1093/bioinformatics/btaa1063 ◽

2020 ◽

Author(s):

A K M Firoj Mahmud ◽

Nicolas Delhomme ◽

Soumyadeep Nandi ◽

Maria Fällman

Keyword(s):

Gene Expression ◽

Pathogenic Bacteria ◽

Supplementary Information ◽

Complete Analysis ◽

Rna Seq ◽

Differential Gene ◽

User Friendly ◽

Multiple Samples ◽

Data Analysis Pipeline

Abstract Summary Since its introduction, RNA-Seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for studying gene expression, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-Seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results. Availability and implementation ProkSeq is implemented in Python and is published under the MIT source license. The pipeline is available as a Docker container https://hub.docker.com/repository/docker/snandids/prokseq-v2.0, or can be used through Anaconda: https://anaconda.org/snandiDS/prokseq. The code is available on Github: https://github.com/snandiDS/prokseq and a detailed user documentation, including a manual and tutorial can be found at https://prokseqV20.readthedocs.io Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

ProkSeq for complete analysis of RNA-seq data from prokaryotes

10.1101/2020.06.09.135822 ◽

2020 ◽

Cited By ~ 2

Author(s):

A K M Firoj Mahmud ◽

Soumyadeep Nandi ◽

Maria Fällman

Keyword(s):

Gene Expression ◽

Pathogenic Bacteria ◽

Complete Analysis ◽

Rna Seq ◽

Link Type ◽

Eukaryotic Genes ◽

Differential Gene ◽

User Friendly ◽

Multiple Samples

AbstractSummarySince its introduction, RNA-seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for assessing gene expression have been designed around the structures of eukaryotic genes. There are a few stand-alone tools designed for prokaryotes, and they require improvement. A well-defined pipeline for prokaryotes that includes all the necessary tools for quality control, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results, and it produces publication-quality figures.Availability and implementationProkSeq is implemented in Python and is published under the ISC open source license. The tool and a detailed user manual are hosted at Docker: https://hub.docker.com/repository/docker/snandids/prokseq-v2.1, Anaconda: https://anaconda.org/snandiDS/prokseq; Github: https://github.com/snandiDS/prokseq.

Download Full-text

Faculty Opinions recommendation of Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717971189.793469500 ◽

2013 ◽

Author(s):

Stephen Turner

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Rna Seq ◽

Web Tool ◽

Differential Gene

Download Full-text

Zea mays RNA-seq estimated transcript abundances are strongly affected by read mapping bias

BMC Genomics ◽

10.1186/s12864-021-07577-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shuhua Zhan ◽

Cortland Griswold ◽

Lewis Lukens

Keyword(s):

Gene Expression ◽

Zea Mays ◽

Reference Genome ◽

Transcript Abundance ◽

Gene Transcript ◽

Rna Seq ◽

Individual Genome ◽

Abundance Estimates ◽

Mapping Bias ◽

Quantify Gene Expression

Abstract Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.

Download Full-text

Differential Gene Expression by RNA-Seq Analysis of the Primo Vessel in the Rabbit Lymph

Journal of Acupuncture and Meridian Studies ◽

10.1016/j.jams.2018.10.008 ◽

2019 ◽

Vol 12 (1) ◽

pp. 11-19 ◽

Cited By ~ 1

Author(s):

Jun-Young Shin ◽

Sang-Heon Choi ◽

Da-Woon Choi ◽

Ye-Jin An ◽

Jae-Hyuk Seo ◽

...

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Rna Seq ◽

Differential Gene

Download Full-text

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2018.2790918 ◽

2019 ◽

Vol 16 (2) ◽

pp. 442-454 ◽

Cited By ~ 5

Author(s):

Kefei Liu ◽

Jieping Ye ◽

Yang Yang ◽

Li Shen ◽

Hui Jiang

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Unified Model ◽

Rna Seq ◽

Differential Gene

Download Full-text

RNAflow: An Effective and Simple RNA-Seq Differential Gene Expression Pipeline Using Nextflow

Genes ◽

10.3390/genes11121487 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1487

Author(s):

Marie Lataretu ◽

Martin Hölzer

Keyword(s):

Gene Expression ◽

Homo Sapiens ◽

Standard Technique ◽

Common Species ◽

Rna Seq ◽

Rna Molecules ◽

Gene Filtering ◽

Differential Gene ◽

High Level ◽

Very High

RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies. Here, we present a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus. We evaluated the DEG detection functionality while using qRT-PCR data serving as a reference and observed a very high correlation of the logarithmized gene expression fold changes.

Download Full-text

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

Genome Biology ◽

10.1186/gb-2013-14-9-r95 ◽

2013 ◽

Vol 14 (9) ◽

pp. R95 ◽

Cited By ~ 408

Author(s):

Franck Rapaport ◽

Raya Khanin ◽

Yupu Liang ◽

Mono Pirun ◽

Azra Krek ◽

...

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Comprehensive Evaluation ◽

Rna Seq ◽

Differential Gene Expression Analysis ◽

Analysis Methods ◽

Differential Gene

Download Full-text

Identification and Comparative Analysis of Differential Gene Expression in Soybean Leaf Tissue under Drought and Flooding Stress Revealed by RNA-Seq

Frontiers in Plant Science ◽

10.3389/fpls.2016.01044 ◽

2016 ◽

Vol 7 ◽

Cited By ~ 44

Author(s):

Wei Chen ◽

Qiuming Yao ◽

Gunvant B. Patil ◽

Gaurav Agarwal ◽

Rupesh K. Deshmukh ◽

...

Keyword(s):

Gene Expression ◽

Comparative Analysis ◽

Differential Gene Expression ◽

Leaf Tissue ◽

Rna Seq ◽

Flooding Stress ◽

Soybean Leaf ◽

Differential Gene

Download Full-text