Discovering negative correlated gene sets from integrative gene expression data for cancer prognosis

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

A FC-GSEA Approach to Identify Significant Gene-Sets Using Microarray Gene Expression Data

Advances in Computational Science and Engineering - Communications in Computer and Information Science ◽

10.1007/978-3-642-10238-7_10 ◽

2009 ◽

pp. 115-128 ◽

Cited By ~ 1

Author(s):

Jaeyoung Kim ◽

Miyoung Shin

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Gene Sets ◽

Significant Gene ◽

Microarray Gene

Download Full-text

Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes

Cancer Informatics ◽

10.4137/cin.s606 ◽

2008 ◽

Vol 6 ◽

pp. CIN.S606 ◽

Cited By ~ 23

Author(s):

Attila Frigyesi ◽

Mattias Höglund

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Biological Significance ◽

Data Sets ◽

Expression Data ◽

Microarray Expression Data ◽

Tumor Subtypes ◽

Gene Sets ◽

Non Negative Matrix Factorization

Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.

Download Full-text

Breast cancer prognosis by combinatorial analysis of gene expression data

Breast Cancer Research ◽

10.1186/bcr1512 ◽

2006 ◽

Vol 8 (4) ◽

Cited By ~ 44

Author(s):

Gabriela Alexe ◽

Sorin Alexe ◽

David E Axelrod ◽

Tibérius O Bonates ◽

Irina I Lozina ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Gene Expression Data ◽

Breast Cancer Prognosis ◽

Cancer Prognosis ◽

Combinatorial Analysis ◽

Expression Data

Download Full-text

Random forests-based differential analysis of gene sets for gene expression data

Gene ◽

10.1016/j.gene.2012.11.034 ◽

2013 ◽

Vol 518 (1) ◽

pp. 179-186 ◽

Cited By ~ 13

Author(s):

Huey-Miin Hsueh ◽

Da-Wei Zhou ◽

Chen-An Tsai

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Random Forests ◽

Expression Data ◽

Differential Analysis ◽

Gene Sets

Download Full-text

Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

BMC Medical Genomics ◽

10.1186/1755-8794-6-2 ◽

2013 ◽

Vol 6 (1) ◽

Cited By ~ 12

Author(s):

Kristina M Hettne ◽

André Boorsma ◽

Dorien A M van Dartel ◽

Jelle J Goeman ◽

Esther de Jong ◽

...

Keyword(s):

Gene Expression ◽

Text Mining ◽

Gene Expression Data ◽

Specific Gene ◽

Expression Data ◽

Next Generation ◽

Gene Sets ◽

Chemical Response

Download Full-text

A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

PLoS ONE ◽

10.1371/journal.pone.0122103 ◽

2015 ◽

Vol 10 (5) ◽

pp. e0122103 ◽

Cited By ~ 3

Author(s):

Junhee Seok ◽

Ronald W. Davis ◽

Wenzhong Xiao

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Hybrid Approach ◽

Expression Data ◽

Gene Sets ◽

Prediction Of Survival

Download Full-text

Novel gene sets improve set-level classification of prokaryotic gene expression data

BMC Bioinformatics ◽

10.1186/s12859-015-0786-7 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 1

Author(s):

Matěj Holec ◽

Ondřej Kuželka ◽

Filip železný

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Novel Gene ◽

Gene Sets

Download Full-text

Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1055 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-29 ◽

Cited By ~ 58

Author(s):

Jörg Rahnenführer ◽

Francisco S Domingues ◽

Jochen Maydt ◽

Thomas Lengauer

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Biological Networks ◽

Permutation Test ◽

Statistical Significance ◽

Data Sets ◽

Expression Data ◽

Biologically Relevant ◽

Gene Sets ◽

Best Fitting

We present a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Based on gene expression data alone, only local structures of genetic networks can be recovered. Instead of inferring such a network, we propose a hypothesis-based approach. We use given knowledge about biological networks to improve sensitivity and interpretability of findings from microarray experiments.Recently introduced methods test if members of predefined gene sets are enriched in a list of top-ranked genes in a microarray study. We improve this approach by defining scores that depend on all members of the gene set and that also take pairwise co-regulation of these genes into account. We calculate the significance of co-regulation of gene sets with a nonparametric permutation test. On two data sets the method is validated and its biological relevance is discussed. It turns out that useful measures for co-regulation of genes in a pathway can be identified adaptively.We refine our method in two aspects specific to pathways. First, to overcome the ambiguity of enzyme-to-gene mappings for a fixed pathway, we introduce algorithms for selecting the best fitting gene for a specific enzyme in a specific condition. In selected cases, functional assignment of genes to pathways is feasible. Second, the sensitivity of detecting relevant pathways is improved by integrating information about pathway topology. The distance of two enzymes is measured by the number of reactions needed to connect them, and enzyme pairs with a smaller distance receive a higher weight in the score calculation.

Download Full-text

Cluster Analysis of Gene Expression Data

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch045 ◽

2011 ◽

pp. 289-296

Author(s):

Alan Wee-Chung Liew ◽

Ngai-Fong Law ◽

Hong Yan

Keyword(s):

Gene Expression ◽

Cluster Analysis ◽

Gene Expression Data ◽

Biological Networks ◽

Microarray Experiment ◽

Hybridization Experiment ◽

Cancer Prognosis ◽

Expression Data ◽

Functional Roles ◽

Exploratory Technique

Important insights into gene function can be gained by gene expression analysis. For example, some genes are turned on (expressed) or turned off (repressed) when there is a change in external conditions or stimuli. The expression of one gene is often regulated by the expression of other genes. A detail analysis of gene expression information will provide an understanding about the inter-networking of different genes and their functional roles. DNA microarray technology allows massively parallel, high throughput genome-wide profiling of gene expression in a single hybridization experiment [Lockhart & Winzeler, 2000]. It has been widely used in numerous studies over a broad range of biological disciplines, such as cancer classification (Armstrong et al., 2002), identification of genes relevant to a certain diagnosis or therapy (Muro et al., 2003), investigation of the mechanism of drug action and cancer prognosis (Kim et al., 2000; Duggan et al., 1999). Due to the large number of genes involved in microarray experiment study and the complexity of biological networks, clustering is an important exploratory technique for gene expression data analysis. In this article, we present a succinct review of some of our work in cluster analysis of gene expression data.

Download Full-text