Network- and Enrichment-based Inference of Phenotypes and Targets from large-scale Disease Maps

Disease maps have emerged as computational knowledge bases for exploring and modeling disease-specific molecular processes. By capturing molecular interactions, disease-associated processes, and phenotypes in standardized representations, disease maps provide a platform for applying bioinformatics and systems biology approaches. Applications range from simple map exploration to algorithm-driven target discovery and network perturbation. The web-based MINERVA environment for disease maps provides a platform to develop tools not only for mapping experimental data but also to identify, analyze and simulate disease-specific regulatory networks. We have developed a MINERVA plugin suite based on network topology and enrichment analyses that facilitate multi-omics data integration and enable in silico perturbation experiments on disease maps. We demonstrate workflows by analyzing two RNA-seq datasets on the Atlas of Inflammation Resolution (AIR). Our approach improves usability and increases the functionality of disease maps by providing easy access to available data and integration of self-generated data. It supports efficient and intuitive analysis of omics data, with a focus on disease maps.

Download Full-text

TIMEOR: a web-based tool to uncover temporal regulatory mechanisms from multi-omics data

10.1101/2020.09.14.296418 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ashley Mae Conard ◽

Nathaniel Goodman ◽

Yanhui Hu ◽

Norbert Perrimon ◽

Ritambhara Singh ◽

...

Keyword(s):

Time Series ◽

Regulatory Networks ◽

Omics Data ◽

Rna Seq ◽

Web Based ◽

Protein Levels ◽

Binding Data ◽

Adaptive Time ◽

Gene Regulatory ◽

The Relationship

SummaryUncovering how transcription factors (TFs) regulate their targets at the DNA, RNA and protein levels over time is critical to define gene regulatory networks (GRNs) in normal and diseased states. RNA-seq has become a standard method to measure gene regulation using an established set of analysis steps. However, none of the currently available pipeline methods for interpreting ordered genomic data (in time or space) use time series models to assign cause and effect relationships within GRNs, are adaptive to diverse experimental designs, or enable user interpretation through a web-based platform. Furthermore, methods which integrate ordered RNA-seq data with transcription factor binding data are urgently needed. Here, we present TIMEOR (Trajectory Inference and Mechanism Exploration with Omics data in R), the first web-based and adaptive time series multi-omics pipeline method which infers the relationship between gene regulatory events across time. TIMEOR addresses the critical need for methods to predict causal regulatory mechanism networks between TFs from time series multi-omics data. We used TIMEOR to identify a new link between insulin stimulation and the circadian rhythm cycle. TIMEOR is available at https://github.com/ashleymaeconard/TIMEOR.git.

Download Full-text

Multiple Alu exonization in 3’UTR of a primate specific isoform of CYP20A1 creates a potential miRNA sponge

Genome Biology and Evolution ◽

10.1093/gbe/evaa233 ◽

2020 ◽

Author(s):

Aniket Bhattacharya ◽

Vineet Jha ◽

Khushboo Singhal ◽

Mahar Fatima ◽

Dayanidhi Singh ◽

...

Keyword(s):

Heat Shock ◽

Cortical Neurons ◽

Regulatory Networks ◽

Large Scale ◽

Neuronal Development ◽

Random Sets ◽

Rna Seq ◽

Orphan Gene ◽

Mirna Sponge ◽

Human Neurons

Abstract Alu repeats contribute to phylogenetic novelties in conserved regulatory networks in primates. Our study highlights how exonized Alus could nucleate large-scale mRNA-miRNA interactions. Using a functional genomics approach, we characterize a transcript isoform of an orphan gene, CYP20A1 (CYP20A1_Alu-LT) that has exonization of 23 Alus in its 3’UTR. CYP20A1_Alu-LT, confirmed by 3’RACE, is an outlier in length (9 kb 3’UTR) and widely expressed. Using publically available datasets, we demonstrate its expression in higher primates and presence in single nucleus RNA-seq of 15928 human cortical neurons. miRanda predicts ∼4700 miRNA recognition elements (MREs) for ∼1000 miRNAs, primarily originated within these 3’UTR-Alus. CYP20A1_Alu-LT could be a potential multi-miRNA sponge as it harbors ≥10 MREs for 140 miRNAs and has cytosolic localization. We further tested whether expression of CYP20A1_Alu-LT correlates with mRNAs harboring similar MRE targets. RNA-seq with conjoint miRNA-seq analysis was done in primary human neurons where we observed CYP20A1_Alu-LT to be downregulated during heat shock response and upregulated in HIV1-Tat treatment. 380 genes were positively correlated with its expression (significantly downregulated in heat shock and upregulated in Tat) and they harbored MREs for nine expressed miRNAs which were also enriched in CYP20A1_Alu-LT. MREs were significantly enriched in these 380 genes compared to random sets of differentially expressed genes (p = 8.134e-12). Gene ontology suggested involvement of these genes in neuronal development and hemostasis pathways thus proposing a novel component of Alu-miRNA mediated transcriptional modulation that could govern specific physiological outcomes in higher primates.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

ASAP 2020 update: an open, scalable and interactive web-based portal for (single-cell) omics analyses

Nucleic Acids Research ◽

10.1093/nar/gkaa412 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W403-W414

Author(s):

Fabrice P A David ◽

Maria Litovchenko ◽

Bart Deplancke ◽

Vincent Gardeux

Keyword(s):

Big Data ◽

Single Cell ◽

Single Cell Analysis ◽

Omics Data ◽

Cell Analysis ◽

Rna Seq ◽

Analysis Pipeline ◽

Web Based ◽

Data Analyses ◽

The Web

Abstract Single-cell omics enables researchers to dissect biological systems at a resolution that was unthinkable just 10 years ago. However, this analytical revolution also triggered new demands in ‘big data’ management, forcing researchers to stay up to speed with increasingly complex analytical processes and rapidly evolving methods. To render these processes and approaches more accessible, we developed the web-based, collaborative portal ASAP (Automated Single-cell Analysis Portal). Our primary goal is thereby to democratize single-cell omics data analyses (scRNA-seq and more recently scATAC-seq). By taking advantage of a Docker system to enhance reproducibility, and novel bioinformatics approaches that were recently developed for improving scalability, ASAP meets challenging requirements set by recent cell atlasing efforts such as the Human (HCA) and Fly (FCA) Cell Atlas Projects. Specifically, ASAP can now handle datasets containing millions of cells, integrating intuitive tools that allow researchers to collaborate on the same project synchronously. ASAP tools are versioned, and researchers can create unique access IDs for storing complete analyses that can be reproduced or completed by others. Finally, ASAP does not require any installation and provides a full and modular single-cell RNA-seq analysis pipeline. ASAP is freely available at https://asap.epfl.ch.

Download Full-text

lncRNATargets: A platform for lncRNA target prediction based on nucleic acid thermodynamics

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500165 ◽

2016 ◽

Vol 14 (04) ◽

pp. 1650016 ◽

Cited By ~ 17

Author(s):

Ruifeng Hu ◽

Xiaobo Sun

Keyword(s):

Free Energy ◽

Nucleic Acid ◽

Large Scale ◽

Nearest Neighbor ◽

Binding Free Energy ◽

Target Prediction ◽

Easy Access ◽

Web Based ◽

Size Limitation ◽

User Friendly

Many studies have supported that long noncoding RNAs (lncRNAs) perform various functions in various critical biological processes. Advanced experimental and computational technologies allow access to more information on lncRNAs. Determining the functions and action mechanisms of these RNAs on a large scale is urgently needed. We provided lncRNATargets, which is a web-based platform for lncRNA target prediction based on nucleic acid thermodynamics. The nearest-neighbor (NN) model was used to calculate binging-free energy. The main principle of NN model for nucleic acid assumes that identity and orientation of neighbor base pairs determine stability of a given base pair. lncRNATargets features the following options: setting of a specific temperature that allow use not only for human but also for other animals or plants; processing all lncRNAs in high throughput without RNA size limitation that is superior to any other existing tool; and web-based, user-friendly interface, and colored result displays that allow easy access for nonskilled computer operators and provide better understanding of results. This technique could provide accurate calculation on the binding-free energy of lncRNA-target dimers to predict if these structures are well targeted together. lncRNATargets provides high accuracy calculations, and this user-friendly program is available for free at http://www.herbbol.org:8001/lrt/ .

Download Full-text

EchinoDB: An update to the web-based application for genomic and transcriptomic data on Echinoderms.

10.1101/2022.01.03.474134 ◽

2022 ◽

Author(s):

Varnika Mittal ◽

Robert W. Reid ◽

Denis Jacob Machado ◽

Vladimir Mashanov ◽

Dan A Janies

Keyword(s):

Regulatory Networks ◽

Sequence Similarity ◽

Lytechinus Variegatus ◽

Rna Seq ◽

Web Based ◽

Transcriptomic Data ◽

Green Sea Urchin ◽

R Shiny ◽

Gene Regulatory ◽

Keyword Searches

Here we release a new version of EchinoDB (https://echinodb.uncc.edu). EchinoDB is a database of genomic and transcriptomic data on echinoderms. The initial database consisted of groups of 749,397 orthologous and paralogous transcripts arranged in orthoclusters by sequence similarity. The new version of EchinoDB includes RNA-seq data of the brittle star Ophioderma brevispinum and high-quality genomic assembly data of the green sea urchin Lytechinus variegatus. In addition, we enabled keyword searches for annotated data and installed an updated version of Sequenceserver to allow BLAST searches. The data are downloadable in FASTA format. The first version of EchinoDB appeared in 2016 and was implemented in GO on a local server. The new version has been updated using R Shiny to include new features and improvements in the application. Furthermore, EchinoDB now runs entirely in the cloud for increased reliability and scaling. EchinoDB enjoys a user base drawn from the fields of phylogenetics, developmental biology, genomics, physiology, neurobiology, and regeneration. As use cases, we illustrate how EchinoDB is used in discovering pathways and gene regulatory networks involved in the tissue regeneration process.

Download Full-text

Integrating –omics data into genome-scale metabolic network models: principles and challenges

Essays in Biochemistry ◽

10.1042/ebc20180011 ◽

2018 ◽

Vol 62 (4) ◽

pp. 563-574 ◽

Cited By ~ 10

Author(s):

Charlotte Ramon ◽

Mattia G. Gollub ◽

Jörg Stelling

Keyword(s):

Data Integration ◽

Large Scale ◽

Network Models ◽

Omics Data ◽

Scale Models ◽

Common Framework ◽

Genome Scale ◽

Constraint Based Models ◽

Omics Data Integration

At genome scale, it is not yet possible to devise detailed kinetic models for metabolism because data on the in vivo biochemistry are too sparse. Predictive large-scale models for metabolism most commonly use the constraint-based framework, in which network structures constrain possible metabolic phenotypes at steady state. However, these models commonly leave many possibilities open, making them less predictive than desired. With increasingly available –omics data, it is appealing to increase the predictive power of constraint-based models (CBMs) through data integration. Many corresponding methods have been developed, but data integration is still a challenge and existing methods perform less well than expected. Here, we review main approaches for the integration of different types of –omics data into CBMs focussing on the methods’ assumptions and limitations. We argue that key assumptions – often derived from single-enzyme kinetics – do not generally apply in the context of networks, thereby explaining current limitations. Emerging methods bridging CBMs and biochemical kinetics may allow for –omics data integration in a common framework to provide more accurate predictions.

Download Full-text

i2dash: Creation of Flexible, Interactive and Web-based Dashboards for Visualization of Omics-pipeline Results

10.1101/2020.07.06.189563 ◽

2020 ◽

Author(s):

Arsenij Ustjanzew ◽

Jens Preussner ◽

Mette Bentsen ◽

Carsten Kuenne ◽

Mario Looso

Keyword(s):

Single Cell ◽

Data Visualization ◽

Large Scale ◽

R Package ◽

Cloud Services ◽

Sequencing Analysis ◽

Omics Data ◽

Web Based ◽

Automated Data Processing ◽

Generic Design

AbstractData visualization and interactive data exploration are important aspects of illustrating complex concepts and results from analyses of omics data. A suitable visualization has to be intuitive and accessible. Web-based dashboards have become popular tools for the arrangement, consolidation and display of such visualizations. However, the combination of automated data processing pipelines handling omics data and dynamically generated, interactive dashboards is poorly solved. Here, we present i2dash, an R package intended to encapsulate functionality for programmatic creation of customized dashboards. It supports interactive and responsive (linked) visualizations across a set of predefined graphical layouts. i2dash addresses the needs of data analysts for a tool that is compatible and attachable to any R-based analysis pipeline, thereby fostering the separation of data visualization on one hand and data analysis tasks on the other hand. In addition, the generic design of i2dash enables data analysts to generate modular extensions for specific needs. As a proof of principle, we provide an extension of i2dash optimized for single-cell RNA-sequencing analysis, supporting the creation of dashboards for the visualization needs of single-cell sequencing experiments. Equipped with these features, i2dash is suitable for extensive use in large scale sequencing/bioinformatics facilities. Along this line, we provide i2dash as a containerized solution, enabling a straightforward large-scale deployment and sharing of dashboards using cloud services.i2dash is freely available via the R package archive CRAN.

Download Full-text

Network reconstruction for trans acting genetic loci using multi-omics data and prior information

10.1101/2020.05.19.101592 ◽

2020 ◽

Cited By ~ 1

Author(s):

Johann S. Hawe ◽

Ashis Saha ◽

Melanie Waldenberger ◽

Sonja Kunze ◽

Simone Wahl ◽

...

Keyword(s):

Regulatory Networks ◽

Large Scale ◽

Prior Information ◽

Network Inference ◽

Model Systems ◽

Molecular Networks ◽

Biological Knowledge ◽

Omics Data ◽

Cohort Data ◽

Inference Methods

AbstractBackgroundMolecular multi-omics data provide an in-depth view on biological systems, and their integration is crucial to gain insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans -QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information has been proposed to alleviate network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans -QTL hotspots using human cohort data and data-driven prior information.ResultsWe devised a strategy to integrate QTL with human population scale multi-omics data and comprehensively curated prior information from large-scale biological databases. State-of-the art network inference methods applied to these data and priors were used to recover the regulatory networks underlying trans -QTL hotspots. We benchmarked inference methods and showed, that Bayesian strategies using biologically-informed priors outperform methods without prior data in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses.ConclusionWe demonstrate, that existing biological knowledge can be leveraged for the integrative analysis of networks underlying trans associations to deduce novel hypotheses on cell regulatory mechanisms.

Download Full-text

PRESTO, a new tool for integrating large-scale -omics data and discovering disease-specific signatures

10.1101/302604 ◽

2018 ◽

Cited By ~ 4

Author(s):

Sara McArdle ◽

Konrad Buscher ◽

Erik Ehinger ◽

Akula Bala Pramod ◽

Nicole Riley ◽

...

Keyword(s):

User Interface ◽

Dimensionality Reduction ◽

Gene Networks ◽

Large Scale ◽

Omics Data ◽

Time Points ◽

Conventional Methods ◽

Disease Specific ◽

Interactive User ◽

Interactive User Interface

AbstractBackgroundCohesive visualization and interpretation of hyperdimensional, large-scale -omics data is an ongoing challenge, particularly for biologists and clinicians involved in current highly complex sequencing studies. Multivariate studies are often better suited towards non-linear network analysis than differential expression testing. Here, we present PRESTO, a ‘PREdictive Stochastic neighbor embedding Tool for Omics’, which allows unsupervised dimensionality reduction of multivariate data matrices with thousands of subjects or conditions. PRESTO is intuitively integrated into an interactive user interface that helps to visualize the multidimensional patterns in genome-wide transcriptomic data from basic science and clinical studies.ResultsPRESTO was tested with multiple input omics’ platforms, including microarray and proteomics from both mouse and human clinical datasets. PRESTO can analyze up to tens of thousands of genes and shows no increase in processing time with a large number of samples or patients. In complex datasets, such as those with multiple time points, several patient groups, or diverse mouse strains, PRESTO outperformed conventional methods. Core co-expressed gene networks were intuitively grouped in clusters, or gates, after dimensionality reduction and remained consistent across users. Networks were identified and assigned to physiological and pathological functions that cannot be gleaned from conventional bioinformatics analyses. PRESTO detected gene networks from the natural variations among mouse macrophages and human blood leukocytes. We applied PRESTO to clinical transcriptomic and proteomic data from large patient cohorts and detected disease-defining signatures in antibody-mediated kidney transplant rejection, renal cell carcinoma, and relapsing acute myeloid leukemia (AML). In AML, PRESTO confirmed a previously described gene signature and found a new signature of 10 genes that is highly predictive of patient outcome.ConclusionsPRESTO offers an important integration of powerful bioinformatics tools with an interactive user interface that increases data analysis accessibility beyond bioinformaticians and ‘coders’. Here, we show that PRESTO out performs conventional methods, such as DE analysis, in multi-dimensional datasets and can identify biologically relevant co-expression gene networks. In paired samples or time points, co-expression networks could be compared for insight into longitudinal regulatory mechanisms. Additionally, PRESTO identified disease-specific signatures in clinical datasets with highly significant diagnostic and prognostic potential.

Download Full-text