scholarly journals PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

2013 ◽  
Vol 42 (3) ◽  
pp. e20-e20 ◽  
Author(s):  
Yu Hu ◽  
Yichuan Liu ◽  
Xianyun Mao ◽  
Cheng Jia ◽  
Jane F. Ferguson ◽  
...  
2021 ◽  
Author(s):  
Weixu Wang ◽  
Jun Yao ◽  
Yi Wang ◽  
Chao Zhang ◽  
Wei Tao ◽  
...  

Cell type-specific gene expression (CSE) brings novel biological insights into both physiological and pathological processes compared with bulk tissue gene expression. Although fluorescence-activated cell sorting (FACS) and single-cell RNA sequencing (scRNA-seq) are two widely used techniques to detect gene expression in a cell type-specific manner, the constraints of cost and labor force make it impractical as a routine on large patient cohorts. Here, we present ENIGMA, an algorithm that deconvolutes bulk RNA-seq into cell type-specific expression matrices and cell type fraction matrices without the need of physical sorting or sequencing of single cells. ENIGMA used cell type signature matrix generated from either FACS RNA-seq or scRNA-seq as reference, and applied matrix completion technique to achieve fast and accurate deconvolution. We demonstrated the superior performance of ENIGMA to previously published algorithms (TCA, bMIND and CIBERSORTx) while requiring much less running time on both simulated and realistic datasets. To prove its value in biological discovery, we applied ENIGMA to bulk RNA-seq from arthritis patients and revealed a pseudo-differentiation trajectory that could reflect monocyte to macrophage transition. We also applied ENIGMA to bulk RNA-seq data of pancreatic islet tissue from type 2 diabetes (T2D) patients and discovered a beta cell-specific gene co-expression module related to senescence and apoptosis that possibly contributed to the pathogenesis of T2D. Together, ENIGMA provides a new framework to improve the CSE estimation by integrating FACS RNA-seq and scRNA-seq with tissue bulk RNA-seq data, and will extend our understandings about cell heterogeneity on population level with no need for experimental tissue disaggregation.


2021 ◽  
Author(s):  
David Chisanga ◽  
Yang Liao ◽  
Wei Shi

Abstract Background: RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis.Results: In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from >800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.Conclusion: In conclusion, our study found that the use of the conservative RefSeq gene annotation yields better RNA-seq quantification results than the more comprehensive Ensembl annotation. We also found that, surprisingly, the recent expansion of the RefSeq database, which was primarily driven by the incorporation of sequencing data into the gene annotation process, resulted in a reduction in the accuracy of RNA-seq quantification.


2021 ◽  
Author(s):  
David Chisanga ◽  
Yang Liao ◽  
Wei Shi

RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis. In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from $>$800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.


2017 ◽  
Vol 7 (7) ◽  
pp. 2227-2234 ◽  
Author(s):  
Yasuaki Takada ◽  
Ryutaro Miyagi ◽  
Aya Takahashi ◽  
Toshinori Endo ◽  
Naoki Osada

Abstract Joint quantification of genetic and epigenetic effects on gene expression is important for understanding the establishment of complex gene regulation systems in living organisms. In particular, genomic imprinting and maternal effects play important roles in the developmental process of mammals and flowering plants. However, the influence of these effects on gene expression are difficult to quantify because they act simultaneously with cis-regulatory mutations. Here we propose a simple method to decompose cis-regulatory (i.e., allelic genotype), genomic imprinting [i.e., parent-of-origin (PO)], and maternal [i.e., maternal genotype (MG)] effects on allele-specific gene expression using RNA-seq data obtained from reciprocal crosses. We evaluated the efficiency of method using a simulated dataset and applied the method to whole-body Drosophila and mouse trophoblast stem cell (TSC) and liver RNA-seq data. Consistent with previous studies, we found little evidence of PO and MG effects in adult Drosophila samples. In contrast, we identified dozens and hundreds of mouse genes with significant PO and MG effects, respectively. Interestingly, a similar number of genes with significant PO effect were detect in mouse TSCs and livers, whereas more genes with significant MG effect were observed in livers. Further application of this method will clarify how these three effects influence gene expression levels in different tissues and developmental stages, and provide novel insight into the evolution of gene expression regulation.


2019 ◽  
Author(s):  
Reto Caldelari ◽  
Sunil Dogga ◽  
Marc W. Schmid ◽  
Blandine Franke-Fayard ◽  
Chris J Janse ◽  
...  

SummaryThe complex life cycle of malaria parasites requires well-orchestrated stage specific gene expression. In the vertebrate host the parasites grow and multiply by schizogony in two different environments: within erythrocytes and within hepatocytes. Whereas erythrocytic parasites are rather well-studied in this respect, relatively little is known about the exo-erythrocytic stages. In an attempt to fill this gap, we performed genome wide RNA-seq analyses of various exo-erythrocytic stages of Plasmodium berghei including sporozoites, samples from a time-course of liver stage development and detached cells, which contain infectious merozoites and represent the final step in exo-erythrocytic development. The analysis represents the completion of the transcriptome of the entire life cycle of P. berghei parasites with temporal detailed analysis of the liver stage allowing segmentation of the transcriptome across the progression of the life cycle. We have used these RNA-seq data from different developmental stages to cluster genes with similar expression profiles, in order to infer their functions. A comparison with published data of other parasite stages confirmed stage-specific gene expression and revealed numerous genes that are expressed differentially in blood and exo-erythrocytic stages. One of the most exo-erythrocytic stage-specific genes was PBANKA_1003900, which has previously been annotated as a “gametocyte specific protein”. The promoter of this gene drove high GFP expression in exo-erythrocytic stages, confirming its expression profile seen by RNA-seq. The comparative analysis of the genome wide mRNA expression profiles of erythrocytic and different exo-erythrocytic stages improves our understanding of gene regulation of Plasmodium parasites and can be used to model exo-erythrocytic stage metabolic networks and identify differences in metabolic processes during schizogony in erythrocytes and hepatocytes.


2019 ◽  
Vol 18 (1) ◽  
Author(s):  
Reto Caldelari ◽  
Sunil Dogga ◽  
Marc W. Schmid ◽  
Blandine Franke-Fayard ◽  
Chris J. Janse ◽  
...  

Abstract Background The complex life cycle of malaria parasites requires well-orchestrated stage specific gene expression. In the vertebrate host the parasites grow and multiply by schizogony in two different environments: within erythrocytes and within hepatocytes. Whereas erythrocytic parasites are well-studied in this respect, relatively little is known about the exo-erythrocytic stages. Methods In an attempt to fill this gap, genome wide RNA-seq analyses of various exo-erythrocytic stages of Plasmodium berghei including sporozoites, samples from a time-course of liver stage development and detached cells were performed. These latter contain infectious merozoites and represent the final step in exo-erythrocytic development. Results The analysis represents the complete transcriptome of the entire life cycle of P. berghei parasites with temporal detailed analysis of the liver stage allowing comparison of gene expression across the progression of the life cycle. These RNA-seq data from different developmental stages were used to cluster genes with similar expression profiles, in order to infer their functions. A comparison with published data from other parasite stages confirmed stage-specific gene expression and revealed numerous genes that are expressed differentially in blood and exo-erythrocytic stages. One of the most exo-erythrocytic stage-specific genes was PBANKA_1003900, which has previously been annotated as a “gametocyte specific protein”. The promoter of this gene drove high GFP expression in exo-erythrocytic stages, confirming its expression profile seen by RNA-seq. Conclusions The comparative analysis of the genome wide mRNA expression profiles of erythrocytic and different exo-erythrocytic stages could be used to improve the understanding of gene regulation in Plasmodium parasites and can be used to model exo-erythrocytic stage metabolic networks toward the identification of differences in metabolic processes during schizogony in erythrocytes and hepatocytes.


Sign in / Sign up

Export Citation Format

Share Document