Trimming and Validation of Illumina Short Reads Using Trimmomatic, Trinity Assembly, and Assessment of RNA-Seq Data

Author(s):  
Steven O. Sewe ◽  
Gonçalo Silva ◽  
Paulo Sicat ◽  
Susan E. Seal ◽  
Paul Visendi
Keyword(s):  
Rna Seq ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Grzegorz M. Boratyn ◽  
Jean Thierry-Mieg ◽  
Danielle Thierry-Mieg ◽  
Ben Busby ◽  
Thomas L. Madden
Keyword(s):  
Rna Seq ◽  

2021 ◽  
Author(s):  
Ridvan Eksi ◽  
Daiyao Yi ◽  
Hongyang Li ◽  
Bradley Godfrey ◽  
Lisa R. Mathew ◽  
...  

AbstractStudying isoform expression at the microscopic level has always been a challenging task. A classical example is kidney, where glomerular and tubulo-insterstitial compartments carry out drastically different physiological functions and thus presumably their isoform expression also differs. We aim at developing an experimental and computational pipeline for identifying isoforms at microscopic structure-level. We microdissed glomerular and tubulo-interstitial compartments from healthy human kidney tissues from two cohorts. The two compartments were separately sequenced with the PacBio RS II platform. These transcripts were then validated using transcripts of the same samples by the traditional Illumina RNA-Seq protocol, distinct Illumina RNA-Seq short reads from European Renal cDNA Bank (ERCB) samples, and annotated GENCODE transcript list, thus identifying novel transcripts. We identified 14,739 and 14,259 annotated transcripts, and 17,268 and 13,118 potentially novel transcripts in the glomerular and tubulo-interstitial compartments, respectively. Of note, relying solely on either short or long reads would have resulted in many erroneous identifications. We identified distinct pathways involved in glomerular and tubulointerstitial compartments at the isoform level.We demonstrated the possibility of micro-dissecting a tissue, incorporating both long- and short-read sequencing to identify isoforms for each compartment.


2020 ◽  
Author(s):  
Li Hou ◽  
Yadong Wang

Abstract BackgroundIn recent years, because of the development of sequencing technology, long reads were widely used in many studies, include transcriptomics studies. Obviously, Long reads have more advantages than short reads. And long reads align also different from short reads align. Until now Lots of tools can process long RNA-Seq, but there still have some problems need to solve. ResultsWe developed Deep-Long to process long RNA-Seq, Deep-Long is a fast and accurate tool. Deep-Long can handle troubles come from complicated gene structures and sequencing errors well, Deep-Long does well especially on alternative splicing and small exons. When sequencing error rate is low, Deep-Long can rapidly get more accurate results. While sequencing error rate rising, Deep-Long will use more time, but still more fast and accurate than most other tools.ConclusionsDeep-Long is an useful tool to align long RNA-Seq to genome, and Deep-Long can find more exons and splices.


2019 ◽  
Vol 21 (2) ◽  
pp. 676-686 ◽  
Author(s):  
Siyuan Chen ◽  
Chengzhi Ren ◽  
Jingjing Zhai ◽  
Jiantao Yu ◽  
Xuyang Zhao ◽  
...  

Abstract A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.


2017 ◽  
pp. 123-134
Author(s):  
Jun Yao ◽  
Chen Jiang ◽  
Chao Li ◽  
Qifan Zeng ◽  
Zhanjiang Liu

2019 ◽  
Author(s):  
Camille Sessegolo ◽  
Corinne Cruaud ◽  
Corinne Da Silva ◽  
Audric Cologne ◽  
Marion Dubarry ◽  
...  

AbstractOur vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T’s. This bias is marked for runs of at least 15 T’s, but is already detectable for runs of at least 9 T’s and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene. The entire dataset is available from http://www.genoscope.cns.fr/externe/ONT_mouse_RNA.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Camille Sessegolo ◽  
Corinne Cruaud ◽  
Corinne Da Silva ◽  
Audric Cologne ◽  
Marion Dubarry ◽  
...  

Abstract Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T’s. This bias is marked for runs of at least 15 T’s, but is already detectable for runs of at least 9 T’s and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.


2017 ◽  
Author(s):  
Wen He ◽  
Shanrong Zhao ◽  
Chi Zhang ◽  
Michael S. Vincent ◽  
Baohong Zhang

i.Summary/AbstractSequencing of transcribed RNA molecules (RNA-seq) has been used wildly for studying cell transcriptomes in bulk or at the single-cell level (1, 2, 3) and is becoming the de facto technology for investigating gene expression level changes in various biological conditions, on the time course, and under drug treatments. Furthermore, RNA-Seq data helped identify fusion genes that are related to certain cancers (4). Differential gene expression before and after drug treatments provides insights to mechanism of action, pharmacodynamics of the drugs, and safety concerns (5). Because each RNA-seq run generates tens to hundreds of millions of short reads with size ranging from 50bp-200bp, a tool that deciphers these short reads to an integrated and digestible analysis report is in high demand. QuickRNASeq (6) is an application for large-scale RNA-seq data analysis and real-time interactive visualization of complex data sets. This application automates the use of several of the best open-source tools to efficiently generate user friendly, easy to share, and ready to publish report. Figure 1 illustrates some of the interactive plots produced by QuickRNASeq. The visualization features of the application have been further improved since its first publication in early 2016. The original QuickRNASeq publication (6) provided details of background, software selection, and implementation. Here, we outline the steps required to implement QuickRNASeq in user’s own environment, as well as demonstrate some basic yet powerful utilities of the advanced interactive visualization modules in the report.


2015 ◽  
Author(s):  
Nadia M Davidson ◽  
Ian J Majewski ◽  
Alicia Oshlack

Genomic instability is a hallmark of cancer and, as such, structural alterations and fusion genes are common events in the cancer landscape. RNA sequencing (RNA-Seq) is a powerful method for profiling cancers, but current methods for identifying fusion genes are optimized for short reads. JAFFA (https://code.google.com/p/jaffa-project/) is a sensitive fusion detection method that clearly out-performs other methods with reads of 100bp or greater. JAFFA compares a cancer transcriptome to the reference transcriptome, rather than the genome, where the cancer transcriptome is inferred using long reads directly or by de novo assembling short reads.


Sign in / Sign up

Export Citation Format

Share Document