scholarly journals Evaluation of Seven Different RNA-Seq Alignment Tools Based on Experimental Data from the Model Plant Arabidopsis thaliana

2020 ◽  
Vol 21 (5) ◽  
pp. 1720 ◽  
Author(s):  
Stephanie Schaarschmidt ◽  
Axel Fischer ◽  
Ellen Zuther ◽  
Dirk K. Hincha

Quantification of gene expression is crucial to connect genome sequences with phenotypic and physiological data. RNA-Sequencing (RNA-Seq) has taken a prominent role in the study of transcriptomic reactions of plants to various environmental and genetic perturbations. However, comparative tests of different tools for RNA-Seq read mapping and quantification have been mainly performed on data from animals or humans, which necessarily neglect, for example, the large genetic variability among natural accessions within plant species. Here, we compared seven computational tools for their ability to map and quantify Illumina single-end reads from the Arabidopsis thaliana accessions Columbia-0 (Col-0) and N14. Between 92.4% and 99.5% of all reads were mapped to the reference genome or transcriptome and the raw count distributions obtained from the different mappers were highly correlated. Using the software DESeq2 to determine differential gene expression (DGE) between plants exposed to 20 °C or 4 °C from these read counts showed a large pairwise overlap between the mappers. Interestingly, when the commercial CLC software was used with its own DGE module instead of DESeq2, strongly diverging results were obtained. All tested mappers provided highly similar results for mapping Illumina reads of two polymorphic Arabidopsis accessions to the reference genome or transcriptome and for the determination of DGE when the same software was used for processing.

2019 ◽  
Vol 35 (18) ◽  
pp. 3372-3377 ◽  
Author(s):  
Kimon Froussios ◽  
Nick J Schurch ◽  
Katarzyna Mackinnon ◽  
Marek Gierliński ◽  
Céline Duc ◽  
...  

Abstract Motivation RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. Results We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. Availability and implementation The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
A K M Firoj Mahmud ◽  
Nicolas Delhomme ◽  
Soumyadeep Nandi ◽  
Maria Fällman

Abstract Summary Since its introduction, RNA-Seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for studying gene expression, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-Seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results. Availability and implementation ProkSeq is implemented in Python and is published under the MIT source license. The pipeline is available as a Docker container https://hub.docker.com/repository/docker/snandids/prokseq-v2.0, or can be used through Anaconda: https://anaconda.org/snandiDS/prokseq. The code is available on Github: https://github.com/snandiDS/prokseq and a detailed user documentation, including a manual and tutorial can be found at https://prokseqV20.readthedocs.io Supplementary information Supplementary data are available at Bioinformatics online


Author(s):  
A K M Firoj Mahmud ◽  
Soumyadeep Nandi ◽  
Maria Fällman

AbstractSummarySince its introduction, RNA-seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for assessing gene expression have been designed around the structures of eukaryotic genes. There are a few stand-alone tools designed for prokaryotes, and they require improvement. A well-defined pipeline for prokaryotes that includes all the necessary tools for quality control, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results, and it produces publication-quality figures.Availability and implementationProkSeq is implemented in Python and is published under the ISC open source license. The tool and a detailed user manual are hosted at Docker: https://hub.docker.com/repository/docker/snandids/prokseq-v2.1, Anaconda: https://anaconda.org/snandiDS/prokseq; Github: https://github.com/snandiDS/prokseq.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuhua Zhan ◽  
Cortland Griswold ◽  
Lewis Lukens

Abstract Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.


2019 ◽  
Vol 12 (1) ◽  
pp. 11-19 ◽  
Author(s):  
Jun-Young Shin ◽  
Sang-Heon Choi ◽  
Da-Woon Choi ◽  
Ye-Jin An ◽  
Jae-Hyuk Seo ◽  
...  

Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1487
Author(s):  
Marie Lataretu ◽  
Martin Hölzer

RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies. Here, we present a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus. We evaluated the DEG detection functionality while using qRT-PCR data serving as a reference and observed a very high correlation of the logarithmized gene expression fold changes.


2013 ◽  
Vol 14 (9) ◽  
pp. R95 ◽  
Author(s):  
Franck Rapaport ◽  
Raya Khanin ◽  
Yupu Liang ◽  
Mono Pirun ◽  
Azra Krek ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document