LiBiNorm: an htseq-count analogue with improved normalisation of Smart-seq2 data and library preparation diagnostics

Protocols for preparing RNA sequencing (RNA-seq) libraries, most prominently “Smart-seq” variations, introduce global biases that can have a significant impact on the quantification of gene expression levels. This global bias can lead to drastic over- or under-representation of RNA in non-linear length-dependent fashion due to enzymatic reactions during cDNA production. It is currently not corrected by any RNA-seq software, which mostly focus on local bias in coverage along RNAs. This paper describes LiBiNorm, a simple command line program that mimics the popular htseq-count software and allows diagnostics, quantification, and global bias removal. LiBiNorm outputs gene expression data that has been normalized to correct for global bias introduced by the Smart-seq2 protocol. In addition, it produces data and several plots that allow insights into the experimental history underlying library preparation. The LiBiNorm package includes an R script that allows visualization of the main results. LiBiNorm is the first software application to correct for the global bias that is introduced by the Smart-seq2 protocol. It is freely downloadable at http://www2.warwick.ac.uk/fac/sci/lifesci/research/libinorm.

Download Full-text

A Computational Framework for Prediction and Analysis of Cancer Signaling Dynamics from RNA Sequencing Data—Application to the ErbB Receptor Signaling Pathway

Cancers ◽

10.3390/cancers12102878 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2878

Author(s):

Hiroaki Imoto ◽

Suxiang Zhang ◽

Mariko Okada

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Cancer Cell ◽

Gene Expression Data ◽

Cancer Cell Line ◽

Erbb Receptor ◽

Expression Data ◽

Rna Seq ◽

Signaling Dynamics ◽

Gene Expression Levels

A current challenge in systems biology is to predict dynamic properties of cell behaviors from public information such as gene expression data. The temporal dynamics of signaling molecules is critical for mammalian cell commitment. We hypothesized that gene expression levels are tightly linked with and quantitatively control the dynamics of signaling networks regardless of the cell type. Based on this idea, we developed a computational method to predict the signaling dynamics from RNA sequencing (RNA-seq) gene expression data. We first constructed an ordinary differential equation model of ErbB receptor → c-Fos induction using a newly developed modeling platform BioMASS. The model was trained with kinetic parameters against multiple breast cancer cell lines using autologous RNA-seq data obtained from the Cancer Cell Line Encyclopedia (CCLE) as the initial values of the model components. After parameter optimization, the model proceeded to prediction in another untrained breast cancer cell line. As a result, the model learned the parameters from other cells and was able to accurately predict the dynamics of the untrained cells using only the gene expression data. Our study suggests that gene expression levels of components within the ErbB network, rather than rate constants, can explain the cell-specific signaling dynamics, therefore playing an important role in regulating cell fate.

Download Full-text

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

Human Genomics ◽

10.1186/s40246-021-00308-5 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Weitong Cui ◽

Huaru Xue ◽

Lei Wei ◽

Jinghua Jin ◽

Xuewen Tian ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Small Sample ◽

Differentially Expressed ◽

Cancer Type ◽

Rna Seq ◽

Sample Sizes ◽

Large Sample ◽

Expression Levels ◽

Gene Expression Levels

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

Download Full-text

Comparing RNA-Seq and microarray gene expression data in two zones of the Arabidopsis root apex relevant to spaceflight

Applications in Plant Sciences ◽

10.1002/aps3.1197 ◽

2018 ◽

Vol 6 (11) ◽

pp. e01197 ◽

Cited By ~ 3

Author(s):

Aparna Krishnamurthy ◽

Robert J. Ferl ◽

Anna-Lisa Paul

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Root Apex ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Microarray Gene Expression ◽

Arabidopsis Root ◽

Microarray Gene

Download Full-text

IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis

PLoS Computational Biology ◽

10.1371/journal.pcbi.1006792 ◽

2019 ◽

Vol 15 (2) ◽

pp. e1006792 ◽

Cited By ~ 11

Author(s):

Brandon Monier ◽

Adam McDermaid ◽

Cankun Wang ◽

Jing Zhao ◽

Allison Miller ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Gene Expression Data Analysis ◽

Interpretation System

Download Full-text

Exploring gene expression levels in Pancreatic Ductal Adenocarcinoma (PDAC) using RNA-Seq data

2018 International Conference on Bioinformatics and Systems Biology (BSB) ◽

10.1109/bsb.2018.8770567 ◽

2018 ◽

Author(s):

Alokita Jaiswal ◽

Imlimaong Aier

Keyword(s):

Gene Expression ◽

Pancreatic Ductal Adenocarcinoma ◽

Ductal Adenocarcinoma ◽

Rna Seq ◽

Expression Levels ◽

Gene Expression Levels

Download Full-text

Identifying inaccuracies in gene expression estimates from unstranded RNA-seq data

Scientific Reports ◽

10.1038/s41598-019-52584-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Mikhail Pomaznoy ◽

Ashu Sethi ◽

Jason Greenbaum ◽

Bjoern Peters

Keyword(s):

Gene Expression ◽

Differential Expression Analysis ◽

Cell Types ◽

Library Preparation ◽

Rna Seq ◽

Protein Coding ◽

Protein Coding Genes ◽

Machine Learning Model ◽

Specific Manner ◽

Library Preparation Protocol

Abstract RNA-seq methods are widely utilized for transcriptomic profiling of biological samples. However, there are known caveats of this technology which can skew the gene expression estimates. Specifically, if the library preparation protocol does not retain RNA strand information then some genes can be erroneously quantitated. Although strand-specific protocols have been established, a significant portion of RNA-seq data is generated in non-strand-specific manner. We used a comprehensive stranded RNA-seq dataset of 15 blood cell types to identify genes for which expression would be erroneously estimated if strand information was not available. We found that about 10% of all genes and 2.5% of protein coding genes have a two-fold or higher difference in estimated expression when strand information of the reads was ignored. We used parameters of read alignments of these genes to construct a machine learning model that can identify which genes in an unstranded dataset might have incorrect expression estimates and which ones do not. We also show that differential expression analysis of genes with biased expression estimates in unstranded read data can be recovered by limiting the reads considered to those which span exonic boundaries. The resulting approach is implemented as a package available at https://github.com/mikpom/uslcount.

Download Full-text

Association of intronic DNA methylation and hydroxymethylation alterations in the epigenetic etiology of dilated cardiomyopathy

AJP Heart and Circulatory Physiology ◽

10.1152/ajpheart.00758.2018 ◽

2019 ◽

Vol 317 (1) ◽

pp. H168-H180 ◽

Cited By ~ 2

Author(s):

Ali M. Tabish ◽

Mohammed Arif ◽

Taejeong Song ◽

Zaher Elbeck ◽

Richard C. Becker ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Dilated Cardiomyopathy ◽

Cardiac Remodeling ◽

Rna Seq ◽

Wild Type ◽

Kegg Pathways ◽

Wild Type Phenotype ◽

Gene Expression Levels

In this study, we investigated the role of DNA methylation [5-methylcytosine (5mC)] and 5-hydroxymethylcytosine (5hmC), epigenetic modifications that regulate gene activity, in dilated cardiomyopathy (DCM). A MYBPC3 mutant mouse model of DCM was compared with wild type and used to profile genomic 5mC and 5hmC changes by Chip-seq, and gene expression levels were analyzed by RNA-seq. Both 5mC-altered genes (957) and 5hmC-altered genes (2,022) were identified in DCM hearts. Diverse gene ontology and KEGG pathways were enriched for DCM phenotypes, such as inflammation, tissue fibrosis, cell death, cardiac remodeling, cardiomyocyte growth, and differentiation, as well as sarcomere structure. Hierarchical clustering of mapped genes affected by 5mC and 5hmC clearly differentiated DCM from wild-type phenotype. Based on these data, we propose that genomewide 5mC and 5hmC contents may play a major role in DCM pathogenesis. NEW & NOTEWORTHY Our data demonstrate that development of dilated cardiomyopathy in mice is associated with significant epigenetic changes, specifically in intronic regions, which, when combined with gene expression profiling data, highlight key signaling pathways involved in pathological cardiac remodeling and heart contractile dysfunction.

Download Full-text

Abstract P3-04-10: Comparison between RNA-Seq and Affymetrix gene expression data

10.1158/0008-5472.sabcs12-p3-04-10 ◽

2012 ◽

Cited By ~ 1

Author(s):

D Fumagalli ◽

B Haibe-Kains ◽

S Michiels ◽

DN Brown ◽

D Gacquer ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Affymetrix Gene Expression

Download Full-text

Transcriptome Analysis Reveals the Molecular Mechanisms Underlying Adenosine Biosynthesis in Anamorph Strain of Caterpillar Fungus

BioMed Research International ◽

10.1155/2019/1864168 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12

Author(s):

Shan Lin ◽

Zhicheng Zou ◽

Cuibing Zhou ◽

Hancheng Zhang ◽

Zhiming Cai

Keyword(s):

Gene Expression ◽

Transcriptome Analysis ◽

Molecular Mechanisms ◽

Purine Metabolism ◽

Expression Data ◽

Rna Seq ◽

Metabolism Pathway ◽

Caterpillar Fungus ◽

Regulated Gene Expression ◽

Late Stages

Caterpillar fungus is a well-known fungal Chinese medicine. To reveal molecular changes during early and late stages of adenosine biosynthesis, transcriptome analysis was performed with the anamorph strain of caterpillar fungus. A total of 2,764 differentially expressed genes (DEGs) were identified (p≤0.05, |log2 Ratio| ≥ 1), of which 1,737 were up-regulated and 1,027 were down-regulated. Gene expression profiling on 4–10 d revealed a distinct shift in expression of the purine metabolism pathway. Differential expression of 17 selected DEGs which involved in purine metabolism (map00230) were validated by qPCR, and the expression trends were consistent with the RNA-Seq results. Subsequently, the predicted adenosine biosynthesis pathway combined with qPCR and gene expression data of RNA-Seq indicated that the increased adenosine accumulation is a result of down-regulation of ndk, ADK, and APRT genes combined with up-regulation of AK gene. This study will be valuable for understanding the molecular mechanisms of the adenosine biosynthesis in caterpillar fungus.

Download Full-text

From whole-mount to single-cell spatial assessment of gene expression in 3D

Communications Biology ◽

10.1038/s42003-020-01341-1 ◽

2020 ◽

Vol 3 (1) ◽

Author(s):

Lisa N. Waylen ◽

Hieu T. Nim ◽

Luciano G. Martelotto ◽

Mirana Ramialison

Keyword(s):

Gene Expression ◽

Cell Types ◽

Medical Diagnostics ◽

Three Dimensions ◽

Expression Data ◽

Developmental Defects ◽

Ongoing Research ◽

Spatially Resolved ◽

Spatio Temporal ◽

Gene Expression Levels

Abstract Unravelling spatio-temporal patterns of gene expression is crucial to understanding core biological principles from embryogenesis to disease. Here we review emerging technologies, providing automated, high-throughput, spatially resolved quantitative gene expression data. Novel techniques expand on current benchmark protocols, expediting their incorporation into ongoing research. These approaches digitally reconstruct patterns of embryonic expression in three dimensions, and have successfully identified novel domains of expression, cell types, and tissue features. Such technologies pave the way for unbiased and exhaustive recapitulation of gene expression levels in spatial and quantitative terms, promoting understanding of the molecular origin of developmental defects, and improving medical diagnostics.

Download Full-text