scholarly journals Comparative evaluation of full-length isoform quantification from RNA-Seq

2019 ◽  
Author(s):  
Dimitra Sarantopoulou ◽  
Soumyashant Nayak ◽  
Thomas G. Brooks ◽  
Nicholas F. Lahens ◽  
Gregory R. Grant

AbstractFull-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and an area of active development. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are typically short. We have generated realistic benchmarking data, and have performed a comprehensive comparative analysis of isoform quantification, including evaluating them on the level of differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a naive approach is included to establish a baseline. Kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform considerably better than the naive approach. We determine the effect of structural parameters, such as number of exons or number of isoforms, on accuracy. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification should be employed selectively.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dimitra Sarantopoulou ◽  
Thomas G. Brooks ◽  
Soumyashant Nayak ◽  
Antonijo Mrčela ◽  
Nicholas F. Lahens ◽  
...  

Abstract Background Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. Results Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. Conclusions Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.



2021 ◽  
Author(s):  
Hiroki Ura ◽  
Sumihito Togi ◽  
Yo Niida

Abstract Background mRNA sequencing is a powerful technique, which is used to investigate the transcriptome status of a gene of interest, such as its transcription level and splicing variants. Presently, several RNA sequencing (RNA-Seq) methods have been developed; however, the relative advantage of each method has remained unknown. Here we used three commercially available RNA-Seq library preparation kits; the traditional method (TruSeq), in addition to full-length double-stranded cDNA methods (SMARTer and TeloPrime) to investigate the advantages and disadvantages of these three approaches in transcriptome analysis. Results We observed that the number of expressed genes detected from the TeloPrime sequencing method was fewer than that obtained using the TruSeq and SMARTer. We also observed that the expression patterns between TruSeq and SMARTer correlated strongly. Alternatively, SMARTer and TeloPrime methods underestimated the expression of relatively long transcripts. Moreover, genes having low expression levels were undetected stochastically regardless of any three methods used. Furthermore, although TeloPrime detected a significantly higher proportion at the transcription start site (TSS), its coverage of the gene body was not uniform. SMARTer is proposed to be yielded for nonspecific genomic DNA amplification. In contrast, the detected splicing event number was highest in the TruSeq. The percent spliced in index (PSI) of the three methods was highly correlated. Conclusions TruSeq detected transcripts and splicing events better than the other methods and measured expression levels of genes, in addition to splicing events accurately. However, although detected transcripts and splicing events in TeloPrime were fewer, the coverage at TSS was highest. Additionally, SMARTer was better than TeloPrime among the understudied full-length double-stranded cDNA methods. In conclusion, for short-read sequencing, TruSeq has relative advantages for use in transcriptome analysis.



2019 ◽  
Author(s):  
Etienne Becht ◽  
Edward Zhao ◽  
Robert Amezquita ◽  
Raphael Gottardo

AbstractMultivariate logistic regression (mLR) has been recently proposed by Ntranos et al. to perform gene differential expression analyses of single-cell RNA-sequencing (scRNAseq) data. Herein we reproduce and extend some of their findings. We notably show that while mLR performs better in simulated datasets, these simulations do not recapitulate important features of experimental datasets. Indeed, our results suggest that MAST followed by Sidak aggregation of the p-values perform better than mLR on experimental datasets. Overall, we highlight that most of the new results obtained by Ntranos et al is likely due to the quantification of scRNAseq data at the transcript or transcript compatibility classes level, rather than the use of mLR.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Chung ◽  
Vincent M. Bruno ◽  
David A. Rasko ◽  
Christina A. Cuomo ◽  
José F. Muñoz ◽  
...  

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.



Animals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1745
Author(s):  
Ben-Ben Miao ◽  
Su-Fang Niu ◽  
Ren-Xie Wu ◽  
Zhen-Bang Liang ◽  
Bao-Gui Tang ◽  
...  

Pearl gentian grouper (Epinephelus fuscoguttatus ♀ × Epinephelus lanceolatus ♂) is a fish of high commercial value in the aquaculture industry in Asia. However, this hybrid fish is not cold-tolerant, and its molecular regulation mechanism underlying cold stress remains largely elusive. This study thus investigated the liver transcriptomic responses of pearl gentian grouper by comparing the gene expression of cold stress groups (20, 15, 12, and 12 °C for 6 h) with that of control group (25 °C) using PacBio SMRT-Seq and Illumina RNA-Seq technologies. In SMRT-Seq analysis, a total of 11,033 full-length transcripts were generated and used as reference sequences for further RNA-Seq analysis. In RNA-Seq analysis, 3271 differentially expressed genes (DEGs), two low-temperature specific modules (tan and blue modules), and two significantly expressed gene sets (profiles 0 and 19) were screened by differential expression analysis, weighted gene co-expression networks analysis (WGCNA), and short time-series expression miner (STEM), respectively. The intersection of the above analyses further revealed some key genes, such as PCK, ALDOB, FBP, G6pC, CPT1A, PPARα, SOCS3, PPP1CC, CYP2J, HMGCR, CDKN1B, and GADD45Bc. These genes were significantly enriched in carbohydrate metabolism, lipid metabolism, signal transduction, and endocrine system pathways. All these pathways were linked to biological functions relevant to cold adaptation, such as energy metabolism, stress-induced cell membrane changes, and transduction of stress signals. Taken together, our study explores an overall and complex regulation network of the functional genes in the liver of pearl gentian grouper, which could benefit the species in preventing damage caused by cold stress.



2021 ◽  
Vol 11 (8) ◽  
pp. 3562
Author(s):  
Yong Jin Lee ◽  
Sang Yong Park ◽  
Dae Yeon Kim ◽  
Jae Yoon Kim

Preharvest sprouting (PHS) is a key global issue in production and end-use quality of cereals, particularly in regions where the rainfall season overlaps the harvest. To investigate transcriptomic changes in genes affected by PHS-induction and ABA-treatment, RNA-seq analysis was performed in two wheat cultivars that differ in PHS tolerance. A total of 123 unigenes related to hormone metabolism and signaling for abscisic acid (ABA), gibberellic acid (GA), indole-3-acetic acid (IAA), and cytokinin were identified and 1862 of differentially expressed genes were identified and divided into 8 groups by transcriptomic analysis. DEG analysis showed the majority of genes were categorized in sugar related processes, which interact with ABA signaling in PHS tolerant cultivar under PHS-induction. Thus, genes related to ABA are key regulators of dormancy and germination. Our results give insight into global changes in expression of plant hormone related genes in response to PHS.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ammar Zaghlool ◽  
Adnan Niazi ◽  
Åsa K. Björklund ◽  
Jakub Orzechowski Westholm ◽  
Adam Ameur ◽  
...  

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we separated cytosolic and nuclear RNA from human fetal and adult brain samples and performed a comprehensive analysis of cytosolic and nuclear transcriptomes. There are significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. We show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Differential expression analysis between fetal and adult frontal cortex show that results obtained from the cytosolic RNA differ from results using nuclear RNA both at the level of transcript types and the number of differentially expressed genes. Our data provide a resource for the subcellular localization of thousands of RNA transcripts in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for expression analysis.



2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Haifeng Yan ◽  
Huiwen Zhou ◽  
Hanmin Luo ◽  
Yegeng Fan ◽  
Zhongfeng Zhou ◽  
...  

Abstract Background Although extensive breeding efforts are ongoing in sugarcane (Saccharum officinarum L.), the average yield is far below the theoretical potential. Tillering is an important component of sugarcane yield, however, the molecular mechanism underlying tiller development is still elusive. The limited genomic data in sugarcane, particularly due to its complex and large genome, has hindered in-depth molecular studies. Results Herein, we generated full-length (FL) transcriptome from developing leaf and tiller bud samples based on PacBio Iso-Seq. In addition, we performed RNA-seq from tiller bud samples at three developmental stages (T0, T1 and T2) to uncover key genes and biological pathways involved in sugarcane tiller development. In total, 30,360 and 20,088 high-quality non-redundant isoforms were identified in leaf and tiller bud samples, respectively, representing 41,109 unique isoforms in sugarcane. Likewise, we identified 1063 and 1037 alternative splicing events identified in leaf and tiller bud samples, respectively. We predicted the presence of coding sequence for 40,343 isoforms, 98% of which was successfully annotated. Comparison with previous FL transcriptomes in sugarcane revealed 2963 unreported isoforms. In addition, we characterized 14,946 SSRs from 11,700 transcripts and 310 lncRNAs. By integrating RNA-seq with the FL transcriptome, 468 and 57 differentially expressed genes (DEG) were identified in T1vsT0 and T2vsT0, respectively. Strong up-regulation of several pyruvate phosphate dikinase and phosphoenolpyruvate carboxylase genes suggests enhanced carbon fixation and protein synthesis to facilitate tiller growth. Similarly, up-regulation of linoleate 9S-lipoxygenase and lipoxygenase genes in the linoleic acid metabolism pathway suggests high synthesis of key oxylipins involved in tiller growth and development. Conclusions Collectively, we have enriched the genomic data available in sugarcane and provided candidate genes for manipulating tiller formation and development, towards productivity enhancement in sugarcane.



2006 ◽  
Vol 128 (10) ◽  
pp. 1070-1080 ◽  
Author(s):  
Debashis Pramanik ◽  
Sujoy K. Saha

The heat transfer and the pressure drop characteristics of laminar flow of viscous oil through rectangular and square ducts with internal transverse rib turbulators on two opposite surfaces of the ducts and fitted with twisted tapes have been studied experimentally. The tapes have been full length, short length, and regularly spaced types. The transverse ribs in combination with full-length twisted tapes have been found to perform better than either ribs or twisted tapes acting alone. The heat transfer and the pressure drop measurements have been taken in separate test sections. Heat transfer tests were carried out in electrically heated stainless steel ducts incorporating uniform wall heat flux boundary conditions. Pressure drop tests were carried out in acrylic ducts. The flow was periodically fully developed in the regularly spaced twisted-tape elements case and decaying swirl flow in the short-length twisted tapes case. The flow characteristics are governed by twist ratio, space ratio, and length of twisted tape, Reynolds number, Prandtl number, rod-to-tube diameter ratio, duct aspect ratio, rib height, and rib spacing. Correlations developed for friction factor and Nusselt number have predicted the experimental data satisfactorily. The performance of the geometry under investigation has been evaluated. It has been found that on the basis of both constant pumping power and constant heat duty, the regularly spaced twisted-tape elements in specific cases perform marginally better than their full-length counterparts. However, the short-length twisted-tape performance is worse than the full-length twisted tapes. Therefore, full-length twisted tapes and regularly spaced twisted-tape elements in combination with transverse ribs are recommended for laminar flows. However, the short-length twisted tapes are not recommended.



Sign in / Sign up

Export Citation Format

Share Document