scholarly journals ABioTrans: A Biostatistical tool for Transcriptomics Analysis

2019 ◽  
Author(s):  
Zou Yutong ◽  
Bui Thuy Tien ◽  
Kumar Selvarajoo

AbstractHere we report a bio-statistical/informatics tool, ABioTrans, developed in R for gene expression analysis. The tool allows the user to directly read RNA-Seq data files deposited in the Gene Expression Omnibus or GEO database. Operated using any web browser application, ABioTrans provides easy options for multiple statistical distribution fitting, Pearson and Spearman rank correlations, PCA, k-means and hierarchical clustering, differential expression analysis, Shannon entropy and noise (square of coefficient of variation) analyses, as well as Gene ontology classifications.Availability and implementationABioTrans is available at https://github.com/buithuytien/ABioTransOperating system(s): Platform independent (web browser)Programming language: R (R studio)Other requirements: Bioconductor genome wide annotation databases, R-packages (shiny, LSD, fitdistrplus, actuar, entropy, moments, RUVSeq, edgeR, DESeq2, NOISeq, AnnotationDbi, ComplexHeatmap, circlize, clusterProfiler, reshape2, DT, plotly, shinycssloaders, dplyr, ggplot2). These packages will automatically be installed when the ABioTrans.R is executed in R studio.No restriction of usage for non-academic.

BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Yingying Cao ◽  
Simo Kitanovski ◽  
Daniel Hoffmann

Abstract Background RNA-Seq, the high-throughput sequencing (HT-Seq) of mRNAs, has become an essential tool for characterizing gene expression differences between different cell types and conditions. Gene expression is regulated by several mechanisms, including epigenetically by post-translational histone modifications which can be assessed by ChIP-Seq (Chromatin Immuno-Precipitation Sequencing). As more and more biological samples are analyzed by the combination of ChIP-Seq and RNA-Seq, the integrated analysis of the corresponding data sets becomes, theoretically, a unique option to study gene regulation. However, technically such analyses are still in their infancy. Results Here we introduce intePareto, a computational tool for the integrative analysis of RNA-Seq and ChIP-Seq data. With intePareto we match RNA-Seq and ChIP-Seq data at the level of genes, perform differential expression analysis between biological conditions, and prioritize genes with consistent changes in RNA-Seq and ChIP-Seq data using Pareto optimization. Conclusion intePareto facilitates comprehensive understanding of high dimensional transcriptomic and epigenomic data. Its superiority to a naive differential gene expression analysis with RNA-Seq and available integrative approach is demonstrated by analyzing a public dataset.


Oncotarget ◽  
2017 ◽  
Vol 8 (65) ◽  
pp. 108392-108405 ◽  
Author(s):  
Qi-Lin Zhang ◽  
Zheng-Qing Xie ◽  
Ming-Zhong Liang ◽  
Bang Luo ◽  
Xiu-Qiang Wang ◽  
...  

2017 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Katrijn De Paepe ◽  
Celine Everaert ◽  
Pieter Mestdagh ◽  
Olivier Thas ◽  
...  

ABSTRACTBackgroundProtein-coding RNAs (mRNA) have been the primary target of most transcriptome studies in the past, but in recent years, attention has expanded to include long non-coding RNAs (lncRNA). lncRNAs are typically expressed at low levels, and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 14 popular tools for testing DE in RNA-seq data along with their normalization methods is comprehensively evaluated, with a particular focus on lncRNAs and low abundant mRNAs.ResultsThirteen performance metrics were used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. Non-parametric procedures are used to simulate gene expression data in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, we kept track of the results for mRNA and lncRNA separately. All statistical models exhibited inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and analysis of benchmark RNA-seq datasets. No single tool uniformly outperformed the others.ConclusionOverall, the linear modeling with empirical Bayes moderation (limma) and the nonparametric approach (SAMSeq) showed best performance: good control of the false discovery rate (FDR) and reasonable sensitivity. However, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in a realistic clinical settings such as in cancer research. About half of the methods showed severe excess of false discoveries, making these methods unreliable for differential expression analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, http://statapps.ugent.be/tools/AppDGE/


2021 ◽  
Author(s):  
Anish M.S. Shrestha ◽  
Joyce Emlyn B. Guiao ◽  
Kyle Christian R. Santiago

AbstractRNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. Conventional differential expression analysis for organisms without reference sequences requires performing computationally expensive and error-prone de-novo transcriptome assembly, followed by homology search against a high-confidence protein database for functional annotation. We propose a shortcut, where we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the protein database. Through experiments on simulated and real data, we show drastic reductions in run-time and memory usage, with no loss in accuracy. A Snakemake implementation of our workflow is available at:https://bitbucket.org/project_samar/samar


2015 ◽  
Vol 9 ◽  
pp. BBI.S30884 ◽  
Author(s):  
Peter R. LoVerso ◽  
Feng Cui

RNA sequencing (RNA-seq) has revolutionized transcriptome analysis through profiling the expression of thousands of genes at the same time. Systematic analysis of orthologous transcripts across species is critical for understanding the evolution of gene expression and uncovering important information in animal models of human diseases. Several computational methods have been published for analyzing gene expression between species, but they often lack crucial details and therefore cannot serve as a practical guide. Here, we present the first step-by-step protocol for cross-species RNA-seq analysis with a concise workflow that is largely based on the free open-source R language and Bioconductor packages. This protocol covers the entire process from short-read mapping, gene expression quantification, differential expression analysis to pathway enrichment. Many useful utilities for data visualization are included. This complete and easy-to-follow protocol provides hands-on guidance for users who are new to cross-species gene expression analysis.


GigaScience ◽  
2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Holly C Beale ◽  
Jacquelyn M Roger ◽  
Matthew A Cattle ◽  
Liam T McKay ◽  
Drew K A Thompson ◽  
...  

Abstract Background The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. Findings In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1–77% of all reads (median [IQR], 3% [3–6%]); duplicate reads constitute 3–100% of mapped reads (median [IQR], 27% [13–43%]); and non-exonic reads constitute 4–97% of mapped, non-duplicate reads (median [IQR], 25% [16–37%]). MEND reads constitute 0–79% of total reads (median [IQR], 50% [30–61%]). Conclusions Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.


2021 ◽  
Author(s):  
Aedan G. K. Roberts ◽  
Daniel R. Catchpoole ◽  
Paul J. Kennedy

AbstractBackgroundDifferential expression analysis of RNA-seq data has advanced rapidly since the introduction of the technology, and methods such as edgeR and DESeq2 have become standard parts of analysis pipelines. However, there is a growing body of research showing that differences in variability of gene expression or overall differences in the distribution of expression values – differential distribution – are also important both in normal biology and in diseases including cancer. Genes whose expression differs in distribution without a difference in mean expression level are ignored by differential expression methods.ResultsWe have developed a Bayesian hierarchical model which improves on existing methods for identifying differential dispersion in RNA-seq data, and provides an overall test for differential distribution. We have applied these methods to investigate differential dispersion and distribution in cancer using RNA-seq datasets from The Cancer Genome Atlas. Our results show that differential dispersion and distribution are able to identify cancer-related genes. Further, we find that differential dispersion identifies cancer-related genes that are missed by differential expression analysis, and that differential expression and differential dispersion identify functionally distinct sets of genes.ConclusionThis work highlights the importance of considering changes beyond differences in mean in the analysis of gene expression data, and suggests that analysis of expression variability may provide insights into genetic aspects of cancer that would not be revealed by differential expression analysis alone. For identification of cancer-related genes, differential distribution analysis allows the identification of genes whose expression is disrupted in terms of either mean or variability.


2021 ◽  
Author(s):  
Mengqi Zhang ◽  
Si Liu ◽  
Zhen Miao ◽  
Fang Han ◽  
Raphael Gottardo ◽  
...  

Bulk RNA-seq data quantify the expression of a gene in an individual by one number (e.g., fragment count). In contrast, single cell RNA-seq (scRNA-seq) data provide much richer information: the distribution of gene expression across many cells. To assess differential expression across individuals using scRNA-seq data, a straightforward solution is to create ''pseudo'' bulk RNA-seq data by adding up the fragment counts of a gene across cells for each individual, and then apply methods designed for differential expression using bulk RNA-seq data. This pseudo-bulk solution reduces the distribution of gene expression across cells to a single number and thus loses a good amount of information. We propose to assess differential expression using the gene expression distribution measured by cell level data. We find denoising cell level data can substantially improve the power of this approach. We apply our method, named IDEAS (Individual level Differential Expression Analysis for scRNA-seq), to study the gene expression difference between autism subjects and controls. We find neurogranin-expressing neurons harbor a high proportion of differentially expressed genes, and ERBB signals in microglia are associated with autism.


Sign in / Sign up

Export Citation Format

Share Document