BMDx: a graphical Shiny application to perform Benchmark Dose analysis for transcriptomics data

Angela Serra; Laura Aliisa Saarimäki; Michele Fratello; Veer Singh Marwah; Dario Greco

doi:10.1093/bioinformatics/btaa030

BMDx: a graphical Shiny application to perform Benchmark Dose analysis for transcriptomics data

Bioinformatics ◽

10.1093/bioinformatics/btaa030 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2932-2933 ◽

Cited By ~ 3

Author(s):

Angela Serra ◽

Laura Aliisa Saarimäki ◽

Michele Fratello ◽

Veer Singh Marwah ◽

Dario Greco

Keyword(s):

Gene Expression ◽

Information Criterion ◽

Benchmark Dose ◽

Functional Enrichment ◽

Supplementary Information ◽

Gene Expression Matrix ◽

R Shiny ◽

Transcriptomics Data ◽

Dose Dependent ◽

Shiny Application

Abstract Motivation The analysis of dose-dependent effects on the gene expression is gaining attention in the field of toxicogenomics. Currently available computational methods are usually limited to specific omics platforms or biological annotations and are able to analyse only one experiment at a time. Results We developed the software BMDx with a graphical user interface for the Benchmark Dose (BMD) analysis of transcriptomics data. We implemented an approach based on the fitting of multiple models and the selection of the optimal model based on the Akaike Information Criterion. The BMDx tool takes as an input a gene expression matrix and a phenotype table, computes the BMD, its related values, and IC50/EC50 estimations. It reports interactive tables and plots that the user can investigate for further details of the fitting, dose effects and functional enrichment. BMDx allows a fast and convenient comparison of the BMD values of a transcriptomics experiment at different time points and an effortless way to interpret the results. Furthermore, BMDx allows to analyse and to compare multiple experiments at once. Availability and implementation BMDx is implemented as an R/Shiny software and is available at https://github.com/Greco-Lab/BMDx/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

WASP: a versatile, web-accessible single cell RNA-Seq processing platform

BMC Genomics ◽

10.1186/s12864-021-07469-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Andreas Hoek ◽

Katharina Maibach ◽

Ebru Özmen ◽

Ana Ivonne Vazquez-Armendariz ◽

Jan Philipp Mengel ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Modular Design ◽

Cellular Heterogeneity ◽

Biological Research ◽

Post Processing ◽

Gene Expression Matrix ◽

R Shiny ◽

Initial Processing ◽

Shiny Application

Abstract Background The technology of single cell RNA sequencing (scRNA-seq) has gained massively in popularity as it allows unprecedented insights into cellular heterogeneity as well as identification and characterization of (sub-)cellular populations. Furthermore, scRNA-seq is almost ubiquitously applicable in medical and biological research. However, these new opportunities are accompanied by additional challenges for researchers regarding data analysis, as advanced technical expertise is required in using bioinformatic software. Results Here we present WASP, a software for the processing of Drop-Seq-based scRNA-Seq data. Our software facilitates the initial processing of raw reads generated with the ddSEQ or 10x protocol and generates demultiplexed gene expression matrices including quality metrics. The processing pipeline is realized as a Snakemake workflow, while an R Shiny application is provided for interactive result visualization. WASP supports comprehensive analysis of gene expression matrices, including detection of differentially expressed genes, clustering of cellular populations and interactive graphical visualization of the results. The R Shiny application can be used with gene expression matrices generated by the WASP pipeline, as well as with externally provided data from other sources. Conclusions With WASP we provide an intuitive and easy-to-use tool to process and explore scRNA-seq data. To the best of our knowledge, it is currently the only freely available software package that combines pre- and post-processing of ddSEQ- and 10x-based data. Due to its modular design, it is possible to use any gene expression matrix with WASP’s post-processing R Shiny application. To simplify usage, WASP is provided as a Docker container. Alternatively, pre-processing can be accomplished via Conda, and a standalone version for Windows is available for post-processing, requiring only a web browser.

Download Full-text

RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters

Bioinformatics ◽

10.1093/bioinformatics/btaa630 ◽

2020 ◽

Vol 36 (20) ◽

pp. 5054-5060

Author(s):

Xiangyu Liu ◽

Di Li ◽

Juntao Liu ◽

Zhengchang Su ◽

Guojun Li

Keyword(s):

Gene Expression ◽

Biological Data ◽

Supplementary Information ◽

Gene Expression Matrix ◽

Real Gene ◽

Powerful Approach ◽

Number Of Genes ◽

Functional Patterns ◽

Robustness To Noise ◽

Expression Matrix

Abstract Motivation Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets. Results We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets. Availability and implementation Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa128 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3582-3584

Author(s):

Nathan Lawlor ◽

Eladio J Marquez ◽

Donghyung Lee ◽

Duygu Ucar

Keyword(s):

Single Cell ◽

Gene Annotation ◽

Supplementary Information ◽

Surrogate Variable Analysis ◽

Batch Correction ◽

Surrogate Variable ◽

R Shiny ◽

Sources Of Variation ◽

Shiny Application ◽

Variable Analysis

Abstract Summary Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including ‘unwanted’ variation that needs to be removed in downstream analyses (e.g. batch effects) and ‘wanted’ or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying ‘wanted’ variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods. Availability and implementation The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genetic Neural Networks: an artificial neural network architecture for capturing gene expression relationships

Bioinformatics ◽

10.1093/bioinformatics/bty945 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2226-2234 ◽

Cited By ~ 5

Author(s):

Ameen Eetemadi ◽

Ilias Tagkopoulos

Keyword(s):

Neural Network ◽

Gene Expression ◽

Neural Networks ◽

Artificial Neural Network ◽

Network Architecture ◽

Gene Networks ◽

Supplementary Information ◽

Genome Wide ◽

Transcriptomics Data ◽

Artificial Neural

Abstract Motivation Gene expression prediction is one of the grand challenges in computational biology. The availability of transcriptomics data combined with recent advances in artificial neural networks provide an unprecedented opportunity to create predictive models of gene expression with far reaching applications. Results We present the Genetic Neural Network (GNN), an artificial neural network for predicting genome-wide gene expression given gene knockouts and master regulator perturbations. In its core, the GNN maps existing gene regulatory information in its architecture and it uses cell nodes that have been specifically designed to capture the dependencies and non-linear dynamics that exist in gene networks. These two key features make the GNN architecture capable to capture complex relationships without the need of large training datasets. As a result, GNNs were 40% more accurate on average than competing architectures (MLP, RNN, BiRNN) when compared on hundreds of curated and inferred transcription modules. Our results argue that GNNs can become the architecture of choice when building predictors of gene expression from exponentially growing corpus of genome-wide transcriptomics data. Availability and implementation https://github.com/IBPA/GNN Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study

Bioinformatics ◽

10.1093/bioinformatics/btaa483 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4301-4308

Author(s):

Stephan Seifert ◽

Sven Gundlach ◽

Olaf Junge ◽

Silke Szymczak

Keyword(s):

Gene Expression ◽

Computational Models ◽

Hybrid Approach ◽

Disease Status ◽

R Package ◽

Gene Expression Omnibus ◽

Functional Enrichment ◽

Supplementary Information ◽

Biological Knowledge ◽

Functional Relationships

Abstract Motivation High-throughput technologies allow comprehensive characterization of individuals on many molecular levels. However, training computational models to predict disease status based on omics data is challenging. A promising solution is the integration of external knowledge about structural and functional relationships into the modeling process. We compared four published random forest-based approaches using two simulation studies and nine experimental datasets. Results The self-sufficient prediction error approach should be applied when large numbers of relevant pathways are expected. The competing methods hunting and learner of functional enrichment should be used when low numbers of relevant pathways are expected or the most strongly associated pathways are of interest. The hybrid approach synthetic features is not recommended because of its high false discovery rate. Availability and implementation An R package providing functions for data analysis and simulation is available at GitHub (https://github.com/szymczak-lab/PathwayGuidedRF). An accompanying R data package (https://github.com/szymczak-lab/DataPathwayGuidedRF) stores the processed and quality controlled experimental datasets downloaded from Gene Expression Omnibus (GEO). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Inferring TF activities and activity regulators from gene expression data with constraints from TF perturbation data

Bioinformatics ◽

10.1093/bioinformatics/btaa947 ◽

2020 ◽

Author(s):

Cynthia Z Ma ◽

Michael R Brent

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Supplementary Information ◽

Activity Levels ◽

Expression Data ◽

Gene Expression Matrix ◽

Perturbation Data ◽

Carry Over ◽

Necessary And Sufficient

Abstract Motivation The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now. Results We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2. Availability and implementation Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Adversarial generation of gene expression data

Bioinformatics ◽

10.1093/bioinformatics/btab035 ◽

2021 ◽

Author(s):

Ramon Viñas ◽

Helena Andrés-Terré ◽

Pietro Liò ◽

Kevin Bryson

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Synthetic Data ◽

Gene Clusters ◽

Supplementary Information ◽

Expression Data ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Wide Range ◽

Transcriptomics Data

Abstract Motivation High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. Results We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. Availability and implementation Code is available at: https://github.com/rvinas/adversarial-gene-expression. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MetaFunc: Taxonomic and Functional Analyses of High Throughput Sequencing for Microbiomes

10.1101/2020.09.02.271098 ◽

2020 ◽

Author(s):

Arielle Kae Sulit ◽

Tyler Kolisnik ◽

Frank A Frizelle ◽

Rachel Purcell ◽

Sebastian Schmeier

Keyword(s):

Gene Expression ◽

High Throughput Sequencing ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

R Shiny ◽

One Stop ◽

Taxonomic Studies ◽

User Friendly ◽

Host Genes ◽

Shiny Application

AbstractBackgroundThe identification of functional processes taking place in microbiome communities augment traditional microbiome taxonomic studies, giving a more complete picture of interactions taking place within the community. While there are applications that perform functional annotation on metagenome or metatranscriptomes, very few of these are able to link taxonomic identity to function and are limited by their input types or databases used.ResultsHere we present MetaFunc, a workflow which takes input reads, and from these 1) identifies species present in the microbiome sample and 2) provides gene ontology (GO) annotations associated with the species identified. MetaFunc can also provide a differential abundance analysis step comparing species between sample conditions. In addition, MetaFunc allows mapping of reads to a host genome, and separates these reads, before proceeding with the microbiome analyses. From the host reads, MetaFunc is able to identify host genes, perform differential gene expression analysis, and gene-set enrichment analysis. A final correlation analysis between microbial species and host genes can also be performed. Finally, MetaFunc builds an R shiny application that allows users to view and interact with the microbiome results. In this paper we show how MetaFunc can be applied to metatranscriptomic datasets of colorectal cancer.ConclusionMetaFunc is a one-stop shop microbiome analysis pipeline that can identify taxonomies and their respective functional contributions in a microbiome sample through GO annotations. It can also analyse host reads in a microbiome sample, providing information on host gene expression, and allowing for correlations between the microbiome and host genes. MetaFunc comes with a user-friendly R shiny application that allows for easier visualisation and exploration of its results. MetaFunc is freely available through https://gitlab.com/schmeierlab/workflows/metafunc.git.

Download Full-text

GeneQC: A quality control tool for gene expression estimation based on RNA-sequencing reads mapping

10.1101/266445 ◽

2018 ◽

Cited By ~ 3

Author(s):

Adam McDermaid ◽

Xin Chen ◽

Yiran Zhang ◽

Juan Xie ◽

Cankun Wang ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Model Fitting ◽

Functional Enrichment ◽

Supplementary Information ◽

Rna Seq ◽

Quality Control Tool ◽

Differential Gene ◽

Supplementary Material ◽

The Impact

AbstractMotivationOne of the main benefits of using modern RNA-sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses.ResultsOur investigation into 95 RNA-Seq datasets from seven species (totaling 1,951GB) indicates an average of roughly 22% of all reads are MMRs for plant and animal species. Here we present a tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene’s expression level. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability.AvailabilityGeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

cgCorrect: A method to correct for confounding cell-cell variation due to cell growth in single-cell transcriptomics

10.1101/057463 ◽

2016 ◽

Author(s):

Thomas Blasi ◽

Florian Buettner ◽

Michael K. Strasser ◽

Carsten Marr ◽

Fabian J. Theis

Keyword(s):

Gene Expression ◽

Steady State ◽

Cell Growth ◽

Single Cell ◽

Cell Size ◽

Computational Analysis ◽

Simulated Data ◽

Supplementary Information ◽

Mrna Transcript ◽

Transcriptomics Data

AbstractMotivation: Accessing gene expression at the single cell level has unraveled often large heterogeneity among seemingly homogeneous cells, which remained obscured in traditional population based approaches. The computational analysis of single-cell transcriptomics data, however, still imposes unresolved challenges with respect to normalization, visualization and modeling the data. One such issue are differences in cell size, which introduce additional variability into the data, for which appropriate normalization techniques are needed. Otherwise, these differences in cell size may obscure genuine heterogeneities among cell populations and lead to overdispersed steady-state distributions of mRNA transcript numbers.Results: We present cgCorrect, a statistical framework to correct for differences in cell size that are due to cell growth in single-cell transcriptomics data. We derive the probability for the cell growth corrected mRNA transcript number given the measured, cell size dependent mRNA transcript number, based on the assumption that the average number of transcripts in a cell increases proportional to the cell’s volume during cell cycle. cgCorrect can be used for both data normalization, and to analyze steady-state distributions used to infer the gene expression mechanism. We demonstrate its applicability on both simulated data and single-cell quantitative real-time PCR data from mouse blood stem and progenitor cells. We show that correcting for differences in cell size affects the interpretation of the data obtained by typically performed computational analysis.Availability: A Matlab implementation of cgCorrect is available at http://icb.helmholtz-muenchen.de/cgCorrectSupplementary information: Supplementary information are available online. The simulated data set is available at http://icb.helmholtz-muenchen.de/cgCorrect

Download Full-text