scholarly journals LiBiNorm: an htseq-count analogue with improved normalisation of Smart-seq2 data and library preparation diagnostics

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6222
Author(s):  
Nigel P. Dyer ◽  
Vahid Shahrezaei ◽  
Daniel Hebenstreit

Protocols for preparing RNA sequencing (RNA-seq) libraries, most prominently “Smart-seq” variations, introduce global biases that can have a significant impact on the quantification of gene expression levels. This global bias can lead to drastic over- or under-representation of RNA in non-linear length-dependent fashion due to enzymatic reactions during cDNA production. It is currently not corrected by any RNA-seq software, which mostly focus on local bias in coverage along RNAs. This paper describes LiBiNorm, a simple command line program that mimics the popular htseq-count software and allows diagnostics, quantification, and global bias removal. LiBiNorm outputs gene expression data that has been normalized to correct for global bias introduced by the Smart-seq2 protocol. In addition, it produces data and several plots that allow insights into the experimental history underlying library preparation. The LiBiNorm package includes an R script that allows visualization of the main results. LiBiNorm is the first software application to correct for the global bias that is introduced by the Smart-seq2 protocol. It is freely downloadable at http://www2.warwick.ac.uk/fac/sci/lifesci/research/libinorm.

Cancers ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 2878
Author(s):  
Hiroaki Imoto ◽  
Suxiang Zhang ◽  
Mariko Okada

A current challenge in systems biology is to predict dynamic properties of cell behaviors from public information such as gene expression data. The temporal dynamics of signaling molecules is critical for mammalian cell commitment. We hypothesized that gene expression levels are tightly linked with and quantitatively control the dynamics of signaling networks regardless of the cell type. Based on this idea, we developed a computational method to predict the signaling dynamics from RNA sequencing (RNA-seq) gene expression data. We first constructed an ordinary differential equation model of ErbB receptor → c-Fos induction using a newly developed modeling platform BioMASS. The model was trained with kinetic parameters against multiple breast cancer cell lines using autologous RNA-seq data obtained from the Cancer Cell Line Encyclopedia (CCLE) as the initial values of the model components. After parameter optimization, the model proceeded to prediction in another untrained breast cancer cell line. As a result, the model learned the parameters from other cells and was able to accurately predict the dynamics of the untrained cells using only the gene expression data. Our study suggests that gene expression levels of components within the ErbB network, rather than rate constants, can explain the cell-specific signaling dynamics, therefore playing an important role in regulating cell fate.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Weitong Cui ◽  
Huaru Xue ◽  
Lei Wei ◽  
Jinghua Jin ◽  
Xuewen Tian ◽  
...  

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.


2019 ◽  
Vol 15 (2) ◽  
pp. e1006792 ◽  
Author(s):  
Brandon Monier ◽  
Adam McDermaid ◽  
Cankun Wang ◽  
Jing Zhao ◽  
Allison Miller ◽  
...  

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Mikhail Pomaznoy ◽  
Ashu Sethi ◽  
Jason Greenbaum ◽  
Bjoern Peters

Abstract RNA-seq methods are widely utilized for transcriptomic profiling of biological samples. However, there are known caveats of this technology which can skew the gene expression estimates. Specifically, if the library preparation protocol does not retain RNA strand information then some genes can be erroneously quantitated. Although strand-specific protocols have been established, a significant portion of RNA-seq data is generated in non-strand-specific manner. We used a comprehensive stranded RNA-seq dataset of 15 blood cell types to identify genes for which expression would be erroneously estimated if strand information was not available. We found that about 10% of all genes and 2.5% of protein coding genes have a two-fold or higher difference in estimated expression when strand information of the reads was ignored. We used parameters of read alignments of these genes to construct a machine learning model that can identify which genes in an unstranded dataset might have incorrect expression estimates and which ones do not. We also show that differential expression analysis of genes with biased expression estimates in unstranded read data can be recovered by limiting the reads considered to those which span exonic boundaries. The resulting approach is implemented as a package available at https://github.com/mikpom/uslcount.


2019 ◽  
Vol 317 (1) ◽  
pp. H168-H180 ◽  
Author(s):  
Ali M. Tabish ◽  
Mohammed Arif ◽  
Taejeong Song ◽  
Zaher Elbeck ◽  
Richard C. Becker ◽  
...  

In this study, we investigated the role of DNA methylation [5-methylcytosine (5mC)] and 5-hydroxymethylcytosine (5hmC), epigenetic modifications that regulate gene activity, in dilated cardiomyopathy (DCM). A MYBPC3 mutant mouse model of DCM was compared with wild type and used to profile genomic 5mC and 5hmC changes by Chip-seq, and gene expression levels were analyzed by RNA-seq. Both 5mC-altered genes (957) and 5hmC-altered genes (2,022) were identified in DCM hearts. Diverse gene ontology and KEGG pathways were enriched for DCM phenotypes, such as inflammation, tissue fibrosis, cell death, cardiac remodeling, cardiomyocyte growth, and differentiation, as well as sarcomere structure. Hierarchical clustering of mapped genes affected by 5mC and 5hmC clearly differentiated DCM from wild-type phenotype. Based on these data, we propose that genomewide 5mC and 5hmC contents may play a major role in DCM pathogenesis. NEW & NOTEWORTHY Our data demonstrate that development of dilated cardiomyopathy in mice is associated with significant epigenetic changes, specifically in intronic regions, which, when combined with gene expression profiling data, highlight key signaling pathways involved in pathological cardiac remodeling and heart contractile dysfunction.


Author(s):  
D Fumagalli ◽  
B Haibe-Kains ◽  
S Michiels ◽  
DN Brown ◽  
D Gacquer ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-12
Author(s):  
Shan Lin ◽  
Zhicheng Zou ◽  
Cuibing Zhou ◽  
Hancheng Zhang ◽  
Zhiming Cai

Caterpillar fungus is a well-known fungal Chinese medicine. To reveal molecular changes during early and late stages of adenosine biosynthesis, transcriptome analysis was performed with the anamorph strain of caterpillar fungus. A total of 2,764 differentially expressed genes (DEGs) were identified (p≤0.05, |log2 Ratio| ≥ 1), of which 1,737 were up-regulated and 1,027 were down-regulated. Gene expression profiling on 4–10 d revealed a distinct shift in expression of the purine metabolism pathway. Differential expression of 17 selected DEGs which involved in purine metabolism (map00230) were validated by qPCR, and the expression trends were consistent with the RNA-Seq results. Subsequently, the predicted adenosine biosynthesis pathway combined with qPCR and gene expression data of RNA-Seq indicated that the increased adenosine accumulation is a result of down-regulation of ndk, ADK, and APRT genes combined with up-regulation of AK gene. This study will be valuable for understanding the molecular mechanisms of the adenosine biosynthesis in caterpillar fungus.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Lisa N. Waylen ◽  
Hieu T. Nim ◽  
Luciano G. Martelotto ◽  
Mirana Ramialison

Abstract Unravelling spatio-temporal patterns of gene expression is crucial to understanding core biological principles from embryogenesis to disease. Here we review emerging technologies, providing automated, high-throughput, spatially resolved quantitative gene expression data. Novel techniques expand on current benchmark protocols, expediting their incorporation into ongoing research. These approaches digitally reconstruct patterns of embryonic expression in three dimensions, and have successfully identified novel domains of expression, cell types, and tissue features. Such technologies pave the way for unbiased and exhaustive recapitulation of gene expression levels in spatial and quantitative terms, promoting understanding of the molecular origin of developmental defects, and improving medical diagnostics.


Sign in / Sign up

Export Citation Format

Share Document