expression quantification
Recently Published Documents


TOTAL DOCUMENTS

69
(FIVE YEARS 31)

H-INDEX

14
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Marek Svoboda ◽  
Hildreth R Frost ◽  
Giovanni Bosco

Significant advances in RNA sequencing have been recently made possible by the use of oligo(dT) primers for simultaneous mRNA enrichment and reverse transcription priming. The associated increase in efficiency has enabled more economical bulk RNA sequencing methods as well as the advent of high throughput single cell RNA sequencing, now already one of the most widely adopted new methods in the study of transcriptomics. However, the effects of off-target oligo(dT) priming on gene expression quantification have not been fully appreciated. In the present study, we describe the extent, the possible causes, and the consequences of internal oligo(dT) priming across multiple publicly available datasets obtained from a variety of bulk and single cell RNA sequencing platforms. In order to explore and address this issue, we developed a computational algorithm for identification of sequencing read alignments that likely resulted from internal oligo(dT) priming and their subsequent removal from the data. Directly comparing filtered datasets to those obtained by an alternative method reveals significant improvements in gene expression measurement. Finally, we infer a list of genes whose expression quantification is most likely to be affected by internal oligo(dT) priming.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Cong Ma ◽  
Hongyu Zheng ◽  
Carl Kingsford

Abstract Background The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (Bioinformatics 30:2447–55, 2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. Results We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. Conclusion We propose a new approach for graph quantification, which is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis.


2021 ◽  
Author(s):  
David Chisanga ◽  
Yang Liao ◽  
Wei Shi

Abstract Background: RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis.Results: In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from >800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.Conclusion: In conclusion, our study found that the use of the conservative RefSeq gene annotation yields better RNA-seq quantification results than the more comprehensive Ensembl annotation. We also found that, surprisingly, the recent expansion of the RefSeq database, which was primarily driven by the incorporation of sequencing data into the gene annotation process, resulted in a reduction in the accuracy of RNA-seq quantification.


2021 ◽  
Author(s):  
Jonas A. Sibbesen ◽  
Jordan M. Eizenga ◽  
Adam M. Novak ◽  
Jouni Sirén ◽  
Xian Chang ◽  
...  

AbstractPangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our novel toolchain can construct spliced pangenome graphs, map RNA-seq data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. This workflow improves accuracy over state-of-the-art RNA-seq mapping methods, and it can efficiently quantify haplotype-specific transcript expression without needing to characterize a sample’s haplotypes beforehand.


2021 ◽  
Author(s):  
David Chisanga ◽  
Yang Liao ◽  
Wei Shi

RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis. In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from $>$800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.


Author(s):  
Silvia Casale ◽  
◽  
Chandra Bortolotto ◽  
Giulia Maria Stella ◽  
Andrea Riccardo Filippi ◽  
...  

2020 ◽  
Author(s):  
Simon Xi ◽  
Lauren Gibilisco ◽  
Markus Kummer ◽  
Knut Biber ◽  
Astrid Wachter ◽  
...  

AbstractSingle-nucleus RNA sequencing (sNuc-RNAseq) is an emerging powerful genomics technology that combines droplet microfluidics with next-generation sequencing to interrogate transcriptome changes at single nucleus resolution. Here we developed Abacus, a flexible UMI counter software for sNuc-RNAseq analysis. Abacus draws extra information from sequencing reads mapped to introns of pre-mRNAs (~60% of total data) that are ignored by many single-cell RNAseq analysis pipelines. When applied to our pilot human brain sNuc-RNAseq data, ABACUS nearly doubled the number of nuclei identified by the CellRanger workflow, recovering a large number of nuclei from non-neuronal cells. By incorporating intronic reads into gene expression quantification, we showed that they encoded additional and valid transcription features of individual cells and could be used to improve cluster resolution of different cell types. By separately counting UMIs derived from forward and reverse intronic reads and from exonic reads, Abacus gives users flexibility in representing genes expressed at different abundance levels. In summary, Abacus represents a flexible, improved workflow for sNuc-RNAseq data processing and analysis.


Author(s):  
Shanwen Sun ◽  
Lei Xu ◽  
Quan Zou ◽  
Guohua Wang

Abstract Summary Processing raw reads of RNA-sequencing (RNA-seq) data, no matter public or newly sequenced data, involves a lot of specialized tools and technical configurations that are often unfamiliar and time-consuming to learn for non-bioinformatics researchers. Here, we develop the R package BP4RNAseq, which integrates the state-of-art tools from both alignment-based and alignment-free quantification workflows. The BP4RNAseq package is a highly automated tool using an optimized pipeline to improve the sensitivity and accuracy of RNA-seq analyses. It can take only two non-technical parameters and output six formatted gene expression quantification at gene and transcript levels. The package applies to both retrospective and newly generated bulk RNA-seq data analyses and is also applicable for single-cell RNA-seq analyses. It, therefore, greatly facilitates the application of RNA-seq. Availability and implementation The BP4RNAseq package for R and its documentation are freely available at https://github.com/sunshanwen/BP4RNAseq. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document