PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

Cell type-specific gene expression (CSE) brings novel biological insights into both physiological and pathological processes compared with bulk tissue gene expression. Although fluorescence-activated cell sorting (FACS) and single-cell RNA sequencing (scRNA-seq) are two widely used techniques to detect gene expression in a cell type-specific manner, the constraints of cost and labor force make it impractical as a routine on large patient cohorts. Here, we present ENIGMA, an algorithm that deconvolutes bulk RNA-seq into cell type-specific expression matrices and cell type fraction matrices without the need of physical sorting or sequencing of single cells. ENIGMA used cell type signature matrix generated from either FACS RNA-seq or scRNA-seq as reference, and applied matrix completion technique to achieve fast and accurate deconvolution. We demonstrated the superior performance of ENIGMA to previously published algorithms (TCA, bMIND and CIBERSORTx) while requiring much less running time on both simulated and realistic datasets. To prove its value in biological discovery, we applied ENIGMA to bulk RNA-seq from arthritis patients and revealed a pseudo-differentiation trajectory that could reflect monocyte to macrophage transition. We also applied ENIGMA to bulk RNA-seq data of pancreatic islet tissue from type 2 diabetes (T2D) patients and discovered a beta cell-specific gene co-expression module related to senescence and apoptosis that possibly contributed to the pathogenesis of T2D. Together, ENIGMA provides a new framework to improve the CSE estimation by integrating FACS RNA-seq and scRNA-seq with tissue bulk RNA-seq data, and will extend our understandings about cell heterogeneity on population level with no need for experimental tissue disaggregation.

Download Full-text

Impact of Gene Annotation Choice on the Quantification of RNA-Seq Data

10.21203/rs.3.rs-421080/v1 ◽

2021 ◽

Author(s):

David Chisanga ◽

Yang Liao ◽

Wei Shi

Keyword(s):

Gene Expression ◽

Gene Annotation ◽

Expression Data ◽

Refseq Gene ◽

Rna Seq ◽

Sequencing Data ◽

Microarray Expression Data ◽

Sequencing Quality ◽

Gene Expression Quantification ◽

Expression Quantification

Abstract Background: RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis.Results: In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from >800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.Conclusion: In conclusion, our study found that the use of the conservative RefSeq gene annotation yields better RNA-seq quantification results than the more comprehensive Ensembl annotation. We also found that, surprisingly, the recent expansion of the RefSeq database, which was primarily driven by the incorporation of sequencing data into the gene annotation process, resulted in a reduction in the accuracy of RNA-seq quantification.

Download Full-text

Impact of gene annotation choice on the quantification of RNA-seq data

10.1101/2021.01.07.425794 ◽

2021 ◽

Author(s):

David Chisanga ◽

Yang Liao ◽

Wei Shi

Keyword(s):

Gene Expression ◽

Gene Annotation ◽

Expression Data ◽

Rna Seq ◽

Microarray Expression Data ◽

Refseq Annotation ◽

Sequencing Quality ◽

Gene Expression Quantification ◽

Microarray Expression ◽

Expression Quantification

RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis. In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from $>$800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.

Download Full-text

A Generalized Linear Model for Decomposing Cis-regulatory, Parent-of-Origin, and Maternal Effects on Allele-Specific Gene Expression

G3 Genes|Genome|Genetics ◽

10.1534/g3.117.042895 ◽

2017 ◽

Vol 7 (7) ◽

pp. 2227-2234 ◽

Cited By ~ 2

Author(s):

Yasuaki Takada ◽

Ryutaro Miyagi ◽

Aya Takahashi ◽

Toshinori Endo ◽

Naoki Osada

Keyword(s):

Gene Expression ◽

Maternal Effects ◽

Genomic Imprinting ◽

Whole Body ◽

Specific Gene ◽

Rna Seq ◽

Simple Method ◽

Specific Gene Expression ◽

Allele Specific ◽

Parent Of Origin

Abstract Joint quantification of genetic and epigenetic effects on gene expression is important for understanding the establishment of complex gene regulation systems in living organisms. In particular, genomic imprinting and maternal effects play important roles in the developmental process of mammals and flowering plants. However, the influence of these effects on gene expression are difficult to quantify because they act simultaneously with cis-regulatory mutations. Here we propose a simple method to decompose cis-regulatory (i.e., allelic genotype), genomic imprinting [i.e., parent-of-origin (PO)], and maternal [i.e., maternal genotype (MG)] effects on allele-specific gene expression using RNA-seq data obtained from reciprocal crosses. We evaluated the efficiency of method using a simulated dataset and applied the method to whole-body Drosophila and mouse trophoblast stem cell (TSC) and liver RNA-seq data. Consistent with previous studies, we found little evidence of PO and MG effects in adult Drosophila samples. In contrast, we identified dozens and hundreds of mouse genes with significant PO and MG effects, respectively. Interestingly, a similar number of genes with significant PO effect were detect in mouse TSCs and livers, whereas more genes with significant MG effect were observed in livers. Further application of this method will clarify how these three effects influence gene expression levels in different tissues and developmental stages, and provide novel insight into the evolution of gene expression regulation.

Download Full-text

Transcriptome analysis of Plasmodium berghei during exo-erythrocytic development

10.1101/543207 ◽

2019 ◽

Author(s):

Reto Caldelari ◽

Sunil Dogga ◽

Marc W. Schmid ◽

Blandine Franke-Fayard ◽

Chris J Janse ◽

...

Keyword(s):

Gene Expression ◽

Life Cycle ◽

Plasmodium Berghei ◽

Expression Profiles ◽

Specific Gene ◽

Rna Seq ◽

Erythrocytic Stage ◽

Liver Stage ◽

Specific Gene Expression ◽

Genome Wide

SummaryThe complex life cycle of malaria parasites requires well-orchestrated stage specific gene expression. In the vertebrate host the parasites grow and multiply by schizogony in two different environments: within erythrocytes and within hepatocytes. Whereas erythrocytic parasites are rather well-studied in this respect, relatively little is known about the exo-erythrocytic stages. In an attempt to fill this gap, we performed genome wide RNA-seq analyses of various exo-erythrocytic stages of Plasmodium berghei including sporozoites, samples from a time-course of liver stage development and detached cells, which contain infectious merozoites and represent the final step in exo-erythrocytic development. The analysis represents the completion of the transcriptome of the entire life cycle of P. berghei parasites with temporal detailed analysis of the liver stage allowing segmentation of the transcriptome across the progression of the life cycle. We have used these RNA-seq data from different developmental stages to cluster genes with similar expression profiles, in order to infer their functions. A comparison with published data of other parasite stages confirmed stage-specific gene expression and revealed numerous genes that are expressed differentially in blood and exo-erythrocytic stages. One of the most exo-erythrocytic stage-specific genes was PBANKA_1003900, which has previously been annotated as a “gametocyte specific protein”. The promoter of this gene drove high GFP expression in exo-erythrocytic stages, confirming its expression profile seen by RNA-seq. The comparative analysis of the genome wide mRNA expression profiles of erythrocytic and different exo-erythrocytic stages improves our understanding of gene regulation of Plasmodium parasites and can be used to model exo-erythrocytic stage metabolic networks and identify differences in metabolic processes during schizogony in erythrocytes and hepatocytes.

Download Full-text

Principles of transcriptome analysis and gene expression quantification: an RNA ‐seq tutorial

Molecular Ecology Resources ◽

10.1111/1755-0998.12109 ◽

2013 ◽

Vol 13 (4) ◽

pp. 559-572 ◽

Cited By ~ 104

Author(s):

Jochen B. W. Wolf

Keyword(s):

Gene Expression ◽

Transcriptome Analysis ◽

Rna Seq ◽

Gene Expression Quantification ◽

Expression Quantification

Download Full-text

Transcriptome analysis of Plasmodium berghei during exo-erythrocytic development

Malaria Journal ◽

10.1186/s12936-019-2968-7 ◽

2019 ◽

Vol 18 (1) ◽

Cited By ~ 10

Author(s):

Reto Caldelari ◽

Sunil Dogga ◽

Marc W. Schmid ◽

Blandine Franke-Fayard ◽

Chris J. Janse ◽

...

Keyword(s):

Gene Expression ◽

Life Cycle ◽

Plasmodium Berghei ◽

Expression Profiles ◽

Specific Gene ◽

Rna Seq ◽

Erythrocytic Stage ◽

Liver Stage ◽

Specific Gene Expression ◽

Genome Wide

Abstract Background The complex life cycle of malaria parasites requires well-orchestrated stage specific gene expression. In the vertebrate host the parasites grow and multiply by schizogony in two different environments: within erythrocytes and within hepatocytes. Whereas erythrocytic parasites are well-studied in this respect, relatively little is known about the exo-erythrocytic stages. Methods In an attempt to fill this gap, genome wide RNA-seq analyses of various exo-erythrocytic stages of Plasmodium berghei including sporozoites, samples from a time-course of liver stage development and detached cells were performed. These latter contain infectious merozoites and represent the final step in exo-erythrocytic development. Results The analysis represents the complete transcriptome of the entire life cycle of P. berghei parasites with temporal detailed analysis of the liver stage allowing comparison of gene expression across the progression of the life cycle. These RNA-seq data from different developmental stages were used to cluster genes with similar expression profiles, in order to infer their functions. A comparison with published data from other parasite stages confirmed stage-specific gene expression and revealed numerous genes that are expressed differentially in blood and exo-erythrocytic stages. One of the most exo-erythrocytic stage-specific genes was PBANKA_1003900, which has previously been annotated as a “gametocyte specific protein”. The promoter of this gene drove high GFP expression in exo-erythrocytic stages, confirming its expression profile seen by RNA-seq. Conclusions The comparative analysis of the genome wide mRNA expression profiles of erythrocytic and different exo-erythrocytic stages could be used to improve the understanding of gene regulation in Plasmodium parasites and can be used to model exo-erythrocytic stage metabolic networks toward the identification of differences in metabolic processes during schizogony in erythrocytes and hepatocytes.

Download Full-text