Random forests-based differential analysis of gene sets for gene expression data

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

A FC-GSEA Approach to Identify Significant Gene-Sets Using Microarray Gene Expression Data

Advances in Computational Science and Engineering - Communications in Computer and Information Science ◽

10.1007/978-3-642-10238-7_10 ◽

2009 ◽

pp. 115-128 ◽

Cited By ~ 1

Author(s):

Jaeyoung Kim ◽

Miyoung Shin

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Gene Sets ◽

Significant Gene ◽

Microarray Gene

Download Full-text

Discovering negative correlated gene sets from integrative gene expression data for cancer prognosis

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2010.5706615 ◽

2010 ◽

Author(s):

Tao Zeng ◽

Xuan Guo ◽

Juan Liu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Cancer Prognosis ◽

Expression Data ◽

Gene Sets

Download Full-text

Differential analysis of DNA microarray gene expression data

Molecular Microbiology ◽

10.1046/j.1365-2958.2003.03298.x ◽

2003 ◽

Vol 47 (4) ◽

pp. 871-877 ◽

Cited By ~ 79

Author(s):

G. Wesley Hatfield ◽

She-pin Hung ◽

Pierre Baldi

Keyword(s):

Gene Expression ◽

Dna Microarray ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Differential Analysis ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes

Cancer Informatics ◽

10.4137/cin.s606 ◽

2008 ◽

Vol 6 ◽

pp. CIN.S606 ◽

Cited By ~ 23

Author(s):

Attila Frigyesi ◽

Mattias Höglund

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Biological Significance ◽

Data Sets ◽

Expression Data ◽

Microarray Expression Data ◽

Tumor Subtypes ◽

Gene Sets ◽

Non Negative Matrix Factorization

Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.

Download Full-text

VarMixt: efficient variance modelling for the differential analysis of replicated gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bti023 ◽

2004 ◽

Vol 21 (4) ◽

pp. 502-508 ◽

Cited By ~ 70

Author(s):

P. Delmar ◽

S. p. Robin ◽

J. J. Daudin

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Differential Analysis

Download Full-text

Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

BMC Medical Genomics ◽

10.1186/1755-8794-6-2 ◽

2013 ◽

Vol 6 (1) ◽

Cited By ~ 12

Author(s):

Kristina M Hettne ◽

André Boorsma ◽

Dorien A M van Dartel ◽

Jelle J Goeman ◽

Esther de Jong ◽

...

Keyword(s):

Gene Expression ◽

Text Mining ◽

Gene Expression Data ◽

Specific Gene ◽

Expression Data ◽

Next Generation ◽

Gene Sets ◽

Chemical Response

Download Full-text

A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

PLoS ONE ◽

10.1371/journal.pone.0122103 ◽

2015 ◽

Vol 10 (5) ◽

pp. e0122103 ◽

Cited By ~ 3

Author(s):

Junhee Seok ◽

Ronald W. Davis ◽

Wenzhong Xiao

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Hybrid Approach ◽

Expression Data ◽

Gene Sets ◽

Prediction Of Survival

Download Full-text

Novel gene sets improve set-level classification of prokaryotic gene expression data

BMC Bioinformatics ◽

10.1186/s12859-015-0786-7 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 1

Author(s):

Matěj Holec ◽

Ondřej Kuželka ◽

Filip železný

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Novel Gene ◽

Gene Sets

Download Full-text

Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1055 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-29 ◽

Cited By ~ 58

Author(s):

Jörg Rahnenführer ◽

Francisco S Domingues ◽

Jochen Maydt ◽

Thomas Lengauer

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Biological Networks ◽

Permutation Test ◽

Statistical Significance ◽

Data Sets ◽

Expression Data ◽

Biologically Relevant ◽

Gene Sets ◽

Best Fitting

We present a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Based on gene expression data alone, only local structures of genetic networks can be recovered. Instead of inferring such a network, we propose a hypothesis-based approach. We use given knowledge about biological networks to improve sensitivity and interpretability of findings from microarray experiments.Recently introduced methods test if members of predefined gene sets are enriched in a list of top-ranked genes in a microarray study. We improve this approach by defining scores that depend on all members of the gene set and that also take pairwise co-regulation of these genes into account. We calculate the significance of co-regulation of gene sets with a nonparametric permutation test. On two data sets the method is validated and its biological relevance is discussed. It turns out that useful measures for co-regulation of genes in a pathway can be identified adaptively.We refine our method in two aspects specific to pathways. First, to overcome the ambiguity of enzyme-to-gene mappings for a fixed pathway, we introduce algorithms for selecting the best fitting gene for a specific enzyme in a specific condition. In selected cases, functional assignment of genes to pathways is feasible. Second, the sensitivity of detecting relevant pathways is improved by integrating information about pathway topology. The distance of two enzymes is measured by the number of reactions needed to connect them, and enzyme pairs with a smaller distance receive a higher weight in the score calculation.

Download Full-text