Learning Dysregulated Pathways in Cancers from Differential Variability Analysis

Analysis of gene sets can implicate activity in signaling pathways that is responsible for cancer initiation and progression, but is not discernible from the analysis of individual genes. Multiple methods and software packages have been developed to infer pathway activity from expression measurements for set of genes targeted by that pathway. Broadly, three major methodologies have been proposed: over-representation, enrichment, and differential variability. Both over-representation and enrichment analyses are effective techniques to infer differentially regulated pathways from gene sets with relatively consistent differentially expressed (DE) genes. Specifically, these algorithms aggregate statistics from each gene in the pathway. However, they overlook multivariate patterns related to gene interactions and variations in expression. Therefore, the analysis of differential variability of multigene expression patterns can be essential to pathway inference in cancers. The corresponding methodologies and software packages for such multivariate variability analysis of pathways are reviewed here. We also introduce a new, computationally efficient algorithm, expression variation analysis (EVA), which has been implemented along with a previously proposed algorithm, Differential Rank Conservation (DIRAC), in an open source R package, gene set regulation (GSReg). EVA inferred similar pathways as DIRAC at reduced computational costs. Moreover, EVA also inferred different dysregulated pathways than those identified by enrichment analysis.

Download Full-text

NoRCE: non-coding RNA sets cis enrichment tool

BMC Bioinformatics ◽

10.1186/s12859-021-04112-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gulden Olgun ◽

Afshan Nabi ◽

Oznur Tastan

Keyword(s):

Expression Patterns ◽

Target Prediction ◽

Enrichment Analysis ◽

Fruit Fly ◽

Relevant Information ◽

R Package ◽

Data Repository ◽

Biologically Relevant ◽

Gene Sets ◽

Data Files

Abstract Background While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint at a functional association. Results We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. Conclusions NoRCE is a platform-independent, user-friendly, comprehensive R package that can be used to gain insight into the functional importance of a list of ncRNAs of any type. The tool offers flexibility to conduct the users’ preferred set of analyses by designing their own pipeline of analysis. NoRCE is available in Bioconductor and https://github.com/guldenolgun/NoRCE.

Download Full-text

Identifying gene-specific subgroups: an alternative to biclustering

BMC Bioinformatics ◽

10.1186/s12859-019-3289-0 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Vincent Branders ◽

Pierre Schaus ◽

Pierre Dupont

Keyword(s):

Gene Expression ◽

Expression Patterns ◽

Enrichment Analysis ◽

R Package ◽

Additional Contribution ◽

Computationally Efficient ◽

Statistical Validation ◽

Experimental Conditions ◽

Large Gene ◽

Significant Gene

Abstract Background Transcriptome analysis aims at gaining insight into cellular processes through discovering gene expression patterns across various experimental conditions. Biclustering is a standard approach to discover genes subsets with similar expression across subgroups of samples to be identified. The result is a set of biclusters, each forming a specific submatrix of rows (e.g. genes) and columns (e.g. samples). Relevant biclusters can, however, be missed when, due to the presence of a few outliers, they lack the assumed homogeneity of expression values among a few gene/sample combinations. The Max-Sum SubMatrix problem addresses this issue by looking at highly expressed subsets of genes and of samples, without enforcing such homogeneity. Results We present here the algorithm to identify K relevant submatrices. Our main contribution is to show that this approach outperforms biclustering algorithms to identify several gene subsets representative of specific subgroups of samples. Experiments are conducted on 35 gene expression datasets from human tissues and yeast samples. We report comparative results with those obtained by several biclustering algorithms, including , , , , and . Gene enrichment analysis demonstrates the benefits of the proposed approach to identify more statistically significant gene subsets. The most significant Gene Ontology terms identified with are shown consistent with the controlled conditions of each dataset. This analysis supports the biological relevance of the identified gene subsets. An additional contribution is the statistical validation protocol proposed here to assess the relative performances of biclustering algorithms and of the proposed method. It relies on a Friedman test and the Hochberg’s sequential procedure to report critical differences of ranks among all algorithms. Conclusions We propose here the method, a computationally efficient algorithm to identify K max-sum submatrices in a large gene expression matrix. Comparisons show that it identifies more significantly enriched subsets of genes and specific subgroups of samples which are easily interpretable by biologists. Experiments also show its ability to identify more reliable GO terms. These results illustrate the benefits of the proposed approach in terms of interpretability and of biological enrichment quality. Open implementation of this algorithm is available as an R package.

Download Full-text

NoRCE: Non-coding RNA Sets Cis Enrichment Tool

10.1101/663765 ◽

2019 ◽

Author(s):

Gulden Olgun ◽

Afshan Nabi ◽

Oznur Tastan

Keyword(s):

Expression Patterns ◽

Target Prediction ◽

Enrichment Analysis ◽

Fruit Fly ◽

Relevant Information ◽

R Package ◽

Spatial Proximity ◽

Biologically Relevant ◽

Gene Sets ◽

Data Files

AbstractSummaryWhile some non-coding RNAs (ncRNAs) are assigned to critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic spatial proximity can lead to a functional association. Based on this idea, we present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. NoRCE allows incorporating other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information. NoRCE repository provides several data, such as cell-line specific TAD boundaries, functional gene sets, and expression data for coding and ncRNAs specific to cancer for the analysis. Additionally, users can utilize their custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. NoRCE is a platform-independent, user-friendly, comprehensive R package that could be used to gain insight into the functional importance of a list of any type of interesting ncRNAs. Users can run the pipeline in a single function; also, the tool offers flexibility to conduct the users’ preferred analysis in a single base and design their pipeline. It is available in Bioconductor and https://github.com/guldenolgun/NoRCE.

Download Full-text

TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets

Bioinformatics ◽

10.1093/bioinformatics/btz573 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5339-5340 ◽

Cited By ~ 8

Author(s):

Laura Puente-Santamaria ◽

Wyeth W Wasserman ◽

Luis del Peso

Keyword(s):

Genomic Analysis ◽

Enrichment Analysis ◽

R Package ◽

Supplementary Information ◽

Web Based ◽

Factor Binding Site ◽

Gene Sets ◽

Transcription Regulators ◽

Computational Identification ◽

On Chip

Abstract Summary The computational identification of the transcription factors (TFs) [more generally, transcription regulators, (TR)] responsible for the co-regulation of a specific set of genes is a common problem found in genomic analysis. Herein, we describe TFEA.ChIP, a tool that makes use of ChIP-seq datasets to estimate and visualize TR enrichment in gene lists representing transcriptional profiles. We validated TFEA.ChIP using a wide variety of gene sets representing signatures of genetic and chemical perturbations as input and found that the relevant TR was correctly identified in 126 of a total of 174 analyzed. Comparison with other TR enrichment tools demonstrates that TFEA.ChIP is an highly customizable package with an outstanding performance. Availability and implementation TFEA.ChIP is implemented as an R package available at Bioconductor https://www.bioconductor.org/packages/devel/bioc/html/TFEA.ChIP.html and github https://github.com/LauraPS1/TFEA.ChIP_downloads. A web-based GUI to the package is also available at https://www.iib.uam.es/TFEA.ChIP/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PhenoExam: an R package and Web application for the examination of phenotypes linked to genes and gene sets

10.1101/2021.06.29.450324 ◽

2021 ◽

Author(s):

Alejandro Cisterna García ◽

Aurora González-Vidal ◽

Daniel Ruiz Villa ◽

Jordi Ortiz Murillo ◽

Alicia Gómez-Pascual ◽

...

Keyword(s):

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Web Interface ◽

Gene Set ◽

New Genes ◽

Gene Sets ◽

Phenotype Analysis ◽

New Gene ◽

Early Onset Parkinson’S Disease

Gene set based phenotype enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) can improve the rate of genetic diagnoses amongst other research purposes. To facilitate diverse phenotype analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases. PhenoExam achieves these tasks by integrating databases or resources such as the HPO, MGD, CRISPRbrain, CTD, ClinGen, CGI, OrphaNET, UniProt, PsyGeNET, and Genomics England Panel App. PhenoExam accepts both human and mouse genes as input. We developed PhenoExam to assist a variety of users, including clinicians, computational biologists and geneticists. It can be used to support the validation of new gene-to-disease discoveries, and in the detection of differential phenotypes between two gene sets (a phenotype linked to one of the gene set but no to the other) that are useful for differential diagnosis and to improve genetic panels. We validated PhenoExam performance through simulations and its application to real cases. We demonstrate that PhenoExam is effective in distinguishing gene sets or Mendelian diseases with very similar phenotypes through projecting the disease-causing genes into their annotation-based phenotypic spaces. We also tested the tool with early onset Parkinson's disease and dystonia genes, to show phenotype-level similarities but also potentially interesting differences. More specifically, we used PhenoExam to validate computationally predicted new genes potentially associated with epilepsy. Therefore, PhenoExam effectively discovers links between phenotypic terms across annotation databases through effective integration. The R package is available at https://github.com/alexcis95/PhenoExam and the Web tool is accessible at https://snca.atica.um.es/PhenoExamWeb/.

Download Full-text

mCSEA: Detecting subtle differentially methylated regions

10.1101/293381 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jordi Martorell-Marugán ◽

Víctor González-Rumayor ◽

Pedro Carmona-Sáez

Keyword(s):

Gene Expression ◽

Expression Patterns ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Differentially Methylated Regions ◽

Gene Set Enrichment ◽

Sibling Pairs ◽

Complex Disorders ◽

Obesity And Diabetes

AbstractMotivationThe identification of differentially methylated regions (DMRs) among phenotypes is one of the main goals of epigenetic analysis. Although there are several methods developed to detect DMRs, most of them are focused on detecting relatively large differences in methylation levels and fail to detect moderate, but consistent, methylation changes that might be associated to complex disorders.ResultsWe present mCSEA, an R package that implements a Gene Set Enrichment Analysis method to identify differentially methylated regions from Illumina 450K and EPIC array data. It is especially useful for detecting subtle, but consistent, methylation differences in complex phenotypes. mCSEA also implements functions to integrate gene expression data and to detect genes with significant correlations among methylation and gene expression patterns. Using simulated datasets, we show that mCSEA outperforms other tools in detecting DMRs. In addition, we applied mCSEA to a previously published dataset of sibling pairs discordant for intrauterine hyperglycemia exposure. We found several differentially methylated promoters in genes related to metabolic disorders like obesity and diabetes, demonstrating the potential of mCSEA to identify differentially methylated regions not detected by other methods.AvailabilitymCSEA is freely available from the Bioconductor [email protected]

Download Full-text

GSEA-InContext: Identifying novel and common patterns in expression experiments

10.1101/259440 ◽

2018 ◽

Author(s):

Rani K. Powers ◽

Andrew Goodspeed ◽

Harrison Pielke-Lombardo ◽

Aik-Choon Tan ◽

James C. Costello

Keyword(s):

Expression Patterns ◽

Null Distribution ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Enrichment Score ◽

Specific Gene ◽

Gene Set Enrichment ◽

Single Experiment ◽

Gene Set ◽

Gene Sets

AbstractMotivationGene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate changes in transcriptomics experiments. For an experiment where less than seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment-specific patterns of gene set enrichment.ResultsWe compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment-specific gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression patterns within a user-defined background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA-InContext on experiments using small molecules with known targets and show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis.Availability and ImplementationGSEA-InContext is implemented in Python. Code, the background expression compendium, and results are available at: https://github.com/CostelloLab/GSEA-InContext

Download Full-text

VAN: an R package for identifying biologically perturbed networks via differential variability analysis

BMC Research Notes ◽

10.1186/1756-0500-6-430 ◽

2013 ◽

Vol 6 (1) ◽

pp. 430 ◽

Cited By ~ 4

Author(s):

Vivek Jayaswal ◽

Sarah-Jane Schramm ◽

Graham J Mann ◽

Marc R Wilkins ◽

Yee Yang

Keyword(s):

R Package ◽

Variability Analysis ◽

Differential Variability

Download Full-text

AEGS: identifying aberrantly expressed gene sets for differential variability analysis

Bioinformatics ◽

10.1093/bioinformatics/btx646 ◽

2017 ◽

Vol 34 (5) ◽

pp. 881-883 ◽

Cited By ~ 3

Author(s):

Jinting Guan ◽

Moliang Chen ◽

Congting Ye ◽

James J Cai ◽

Guoli Ji

Keyword(s):

Variability Analysis ◽

Gene Sets ◽

Differential Variability

Download Full-text

GSOAP: a tool for visualization of gene set over-representation analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa001 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2923-2925 ◽

Cited By ~ 1

Author(s):

Tomas Tokar ◽

Chiara Pastrello ◽

Igor Jurisica

Keyword(s):

Enrichment Analysis ◽

R Package ◽

Closeness Centrality ◽

Visual Exploration ◽

Gene Set ◽

Gene Sets ◽

Common Technique

Abstract Motivation Gene sets over-representation analysis (GSOA) is a common technique of enrichment analysis that measures the overlap between a gene set and selected instances (e.g. pathways). Despite its popularity, there is currently no established standard for visualization of GSOA results. Results Here, we propose a visual exploration of the GSOA results by showing the relationships among the enriched instances, while highlighting important instance attributes, such as significance, closeness (centrality) and clustering. Availability and implementation GSOAP is implemented as an R package and is available at https://github.com/tomastokar/gsoap.

Download Full-text