scholarly journals Comprehensive analysis of RNA-seq kits for standard, low and ultra-low quantity samples

2019 ◽  
Author(s):  
Marie-Ange Palomares ◽  
Cyril Dalmasso ◽  
Eric Bonnet ◽  
Céline Derbois ◽  
Solène Brohard-Julien ◽  
...  

ABSTRACTHigh-throughput RNA-sequencing has become the gold standard method for whole-transcriptome gene expression analysis, and is widely used in numerous applications to study cell and tissue transcriptomes. It is also being increasingly used in a number of clinical applications, including expression profiling for diagnostics and alternative transcript detection. However, despite its many advantages, RNA sequencing can be challenging in some situations, for instance in cases of low input amounts or degraded RNA samples. Several protocols have been proposed to overcome these challenges, and many are available as commercial kits. In this study, we comprehensively test three recent commercial technologies for RNA-seq library preparation (TruSeq, SMARTer and SMARTer Ultra-Low) on human reference tissue preparations, using standard (1μg), low (100 and 10 ng) and ultra-low (< 1 ng) input amounts, and for mRNA and total RNA, stranded or unstranded. The results are analyzed using read quality and alignment metrics, gene detection and differential gene expression metrics. Overall, we show that the TruSeq kit performs well with an input amount of 100 ng, while the SMARTer kit shows degraded performance for inputs of 100 and 10 ng, and the SMARTer Ultra-Low kit performs relatively well for input amounts < 1 ng. All the results are discussed in detail, and we provide guidelines for biologists for the selection of a RNA-seq library preparation kit.

2018 ◽  
Author(s):  
Adam McDermaid ◽  
Xin Chen ◽  
Yiran Zhang ◽  
Juan Xie ◽  
Cankun Wang ◽  
...  

AbstractMotivationOne of the main benefits of using modern RNA-sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses.ResultsOur investigation into 95 RNA-Seq datasets from seven species (totaling 1,951GB) indicates an average of roughly 22% of all reads are MMRs for plant and animal species. Here we present a tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene’s expression level. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability.AvailabilityGeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/[email protected] informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Adam McDermaid ◽  
Brandon Monier ◽  
Jing Zhao ◽  
Qin Ma

AbstractDifferential gene expression (DGE) is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes (DEGs) across two or more conditions. Interpretation of the DGE results can be non-intuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we present an R package, ViDGER (Visualization of Differential Gene Expression Results using R), which contains nine functions that generate information-rich visualizations for the interpretation of DGE results from three widely-used tools, Cuffdiff, DESeq2, and edgeR.


2021 ◽  
Author(s):  
Rashid Saif ◽  
Aniqa Ejaz ◽  
Tania Mahmood ◽  
Saeeda Zia

ABSTRACTAdvances in the next generation sequencing (NGS) technologies, their cost effectiveness and well-developed pipelines using computational tools/softwares has allowed researchers to reveal ground-breaking discoveries in multi-omics data analysis. However, there is still uncertainty due to massive upsurge in parallel tools and difficulty in choosing best practiced pipeline for expression profiling of RNA sequenced (RNA-seq) data. Here, we detail the optimized pipeline that works at a fast pace with enhanced accuracy on personal computer rather than using cloud or high-performance computing clusters (HPC). The steps include quality check, base filtration, quasi-mapping, quantification of samples, estimation and counting of transcript/gene expression abundances, identification and clustering of differentially expressed features and visualization of the data. The tools FastQC, Trimmomatic, Salmon and some other scripts in Trinity toolkit were applied on two paired-end datasets. An extension of this pipeline may also be formulated in future for the gene ontology enrichment analysis and functional annotation of the differential expression matrix to make this data biologically more significant.


2018 ◽  
Author(s):  
Eric Reed ◽  
Elizabeth Moses ◽  
Xiaohui Xiao ◽  
Gang Liu ◽  
Joshua Campbell ◽  
...  

AbstractThe need to reduce per sample cost of RNA-seq profiling for scalable data generation has led to the emergence of highly multiplexed RNA-seq. These technologies utilize barcoding of cDNA sequences in order to combine samples into single sequencing lane to be separated during data processing. In this study, we report the performance of one such technique denoted as sparse full length sequencing (SFL), a ribosomal RNA depletion-based RNA sequencing approach that allows for the simultaneous sequencing of 96 samples and higher. We offer comparisons to well established single-sample techniques, including: full coverage Poly-A capture RNA-seq and microarray, as well as another low-cost highly multiplexed technique known as 3’ digital gene expression (3’DGE). Data was generated for a set of exposure experiments on immortalized human lung epithelial (AALE) cells in a two-by-two study design, in which samples received both genetic and chemical perturbations of known oncogenes/tumor suppressors and lung carcinogens. SFL demonstrated improved performance over 3’DGE in terms of coverage, power to detect differential gene expression, and biological recapitulation of patterns of differential gene expression from in vivo lung cancer mutation signatures.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Milda Mickutė ◽  
Kotryna Kvederavičiūtė ◽  
Aleksandr Osipenko ◽  
Raminta Mineikaitė ◽  
Saulius Klimašauskas ◽  
...  

Abstract Background Targeted installation of designer chemical moieties on biopolymers provides an orthogonal means for their visualisation, manipulation and sequence analysis. Although high-throughput RNA sequencing is a widely used method for transcriptome analysis, certain steps, such as 3′ adapter ligation in strand-specific RNA sequencing, remain challenging due to structure- and sequence-related biases introduced by RNA ligases, leading to misrepresentation of particular RNA species. Here, we remedy this limitation by adapting two RNA 2′-O-methyltransferases from the Hen1 family for orthogonal chemo-enzymatic click tethering of a 3′ sequencing adapter that supports cDNA production by reverse transcription of the tagged RNA. Results We showed that the ssRNA-specific DmHen1 and dsRNA-specific AtHEN1 can be used to efficiently append an oligonucleotide adapter to the 3′ end of target RNA for sequencing library preparation. Using this new chemo-enzymatic approach, we identified miRNAs and prokaryotic small non-coding sRNAs in probiotic Lactobacillus casei BL23. We found that compared to a reference conventional RNA library preparation, methyltransferase-Directed Orthogonal Tagging and RNA sequencing, mDOT-seq, avoids misdetection of unspecific highly-structured RNA species, thus providing better accuracy in identifying the groups of transcripts analysed. Our results suggest that mDOT-seq has the potential to advance analysis of eukaryotic and prokaryotic ssRNAs. Conclusions Our findings provide a valuable resource for studies of the RNA-centred regulatory networks in Lactobacilli and pave the way to developing novel transcriptome and epitranscriptome profiling approaches in vitro and inside living cells. As RNA methyltransferases share the structure of the AdoMet-binding domain and several specific cofactor binding features, the basic principles of our approach could be easily translated to other AdoMet-dependent enzymes for the development of modification-specific RNA-seq techniques.


2019 ◽  
Vol 12 (1) ◽  
pp. 11-19 ◽  
Author(s):  
Jun-Young Shin ◽  
Sang-Heon Choi ◽  
Da-Woon Choi ◽  
Ye-Jin An ◽  
Jae-Hyuk Seo ◽  
...  

2019 ◽  
Author(s):  
Christopher A. Hilker ◽  
Aditya V. Bhagwate ◽  
Jin Sung Jang ◽  
Jeffrey G Meyer ◽  
Asha A. Nair ◽  
...  

AbstractFormalin fixed paraffin embedded (FFPE) tissues are commonly used biospecimen for clinical diagnosis. However, RNA degradation is extensive when isolated from FFPE blocks making it challenging for whole transcriptome profiling (RNA-seq). Here, we examined RNA isolation methods, quality metrics, and the performance of RNA-seq using different approaches with RNA isolated from FFPE and fresh frozen (FF) tissues. We evaluated FFPE RNA extraction methods using six different tissues and five different methods. The reproducibility and quality of the prepared libraries from these RNAs were assessed by RNA-seq. We next examined the performance and reproducibility of RNA-seq for gene expression profiling with FFPE and FF samples using targeted (Kinome capture) and whole transcriptome capture based sequencing. Finally, we assessed Agilent SureSelect All-Exon V6+UTR capture and the Illumina TruSeq RNA Access protocols for their ability to detect known gene fusions in FFPE RNA samples. Although the overall yield of RNA varied among extraction methods, gene expression profiles generated by RNA-seq were highly correlated (>90%) when the input RNA was of sufficient quality (≥DV200 30%) and quantity (≥ 100 ng). Using gene capture, we observed a linear relationship between gene expression levels for shared genes that were captured using either All-Exon or Kinome kits. Gene expression correlations between the two capture-based approaches were similar using RNA from FFPE and FF samples. However, TruSeq RNA Access protocol provided significantly higher exon and junction reads when compared to the SureSelect All-Exon capture kit and was more sensitive for fusion gene detection. Our study established pre and post library construction QC parameters that are essential to reproducible RNA-seq profiling using FFPE samples. We show that gene capture based NGS sequencing is an efficient and highly reproducible strategy for gene expression measurements as well as fusion gene detection.


2021 ◽  
Vol 11 ◽  
Author(s):  
Dong-Liang Lin ◽  
Li-Li Wang ◽  
Peng Zhao ◽  
Wen-Wen Ran ◽  
Wei Wang ◽  
...  

Goblet cell adenocarcinoma (GCA) is a rare amphicrine tumor and difficult to diagnose. GCA is traditionally found in the appendix, but extra-appendiceal GCA may be underestimated. Intestinal adenocarcinoma with signet ring cell component is also very rare, and some signet ring cell carcinomas are well cohesive, having some similar morphological features to GCAs. It is necessary to differentiate GCA from intestinal adenocarcinomas with cohesive signet ring cell component (IACSRCC). The goal of this study is to find occurrence of extra-appendiceal GCA and characterize the histological, immunohistochemical, transcriptional, and immune landscape of GCA. We collected 12 cases of GCAs and 10 IACSRCCs and reviewed the clinicopathologic characters of these cases. Immunohistochemical stains were performed with synaptophysin, chromogranin A, CD56, somatostatin receptor (SSTR) 2, and Ki-67. Whole transcriptome RNA-sequencing was performed, and data were used to analyze differential gene expression and predict immune cell infiltration levels in GCA and IACSRCC. RNA-sequencing data for colorectal adenocarcinoma were gathered from TCGA data portal. Of the 12 patients with GCA, there were 4 women and 8 men. There were three appendiceal cases and nine extra-appendiceal cases. GCAs were immunohistochemically different from IACSRCC. GCA also had different levels of B-cell and CD8+ T-cell infiltration compared to both colorectal adenocarcinoma and cohesive IACSRCCs. Differential gene expression analysis showed distinct gene expression patterns in GCA compared to colorectal adenocarcinoma, with a number of cancer-related differentially expressed genes, including upregulation of TMEM14A, GOLT1A, DSCC1, and HSD17B8, and downregulation of KCNQ1OT1 and MXRA5. GCA also had several differentially expressed genes compared to IACSRCCs, including upregulation of PRSS21, EPPIN, RPRM, TNFRSF12A, and BZRAP1, and downregulation of HIST1H2BE, TCN1, AC069363.1, RP11-538I12.2, and REG4. In summary, the number of extra-appendiceal GCA was underestimated in Chinese patients. GCA can be seen as a distinct morphological, immunohistochemical, transcriptomic, and immunological entity. The classic low-grade component of GCA and the immunoreactivity for neuroendocrine markers are the key points to diagnosing GCA.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11875
Author(s):  
Tomoko Matsuda

Large volumes of high-throughput sequencing data have been submitted to the Sequencing Read Archive (SRA). The lack of experimental metadata associated with the data makes reuse and understanding data quality very difficult. In the case of RNA sequencing (RNA-Seq), which reveals the presence and quantity of RNA in a biological sample at any moment, it is necessary to consider that gene expression responds over a short time interval (several seconds to a few minutes) in many organisms. Therefore, to isolate RNA that accurately reflects the transcriptome at the point of harvest, raw biological samples should be processed by freezing in liquid nitrogen, immersing in RNA stabilization reagent or lysing and homogenizing in RNA lysis buffer containing guanidine thiocyanate as soon as possible. As the number of samples handled simultaneously increases, the time until the RNA is protected can increase. Here, to evaluate the effect of different lag times in RNA protection on RNA-Seq data, we harvested CHO-S cells after 3, 5, 6, and 7 days of cultivation, added RNA lysis buffer in a time course of 15, 30, 45, and 60 min after harvest, and conducted RNA-Seq. These RNA samples showed high RNA integrity number (RIN) values indicating non-degraded RNA, and sequence data from libraries prepared with these RNA samples was of high quality according to FastQC. We observed that, at the same cultivation day, global trends of gene expression were similar across the time course of addition of RNA lysis buffer; however, the expression of some genes was significantly different between the time-course samples of the same cultivation day; most of these differentially expressed genes were related to apoptosis. We conclude that the time lag between sample harvest and RNA protection influences gene expression of specific genes. It is, therefore, necessary to know not only RIN values of RNA and the quality of the sequence data but also how the experiment was performed when acquiring RNA-Seq data from the database.


Sign in / Sign up

Export Citation Format

Share Document