rPAC: Route based Pathway Analysis for Cohorts of Gene Expression Data Sets

Analyzing Large Gene Expression Data Sets

Computational Text Analysis ◽

10.1093/oso/9780198567400.003.0014 ◽

2006 ◽

Author(s):

Soumya Raychaudhuri

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Data Sets ◽

Expression Data ◽

Clustering Methods ◽

Biologically Relevant ◽

Large Gene ◽

Functional Coherence

The most interesting and challenging gene expression data sets to analyze are large multidimensional data sets that contain expression values for many genes across multiple conditions. In these data sets the use of scientific text can be particularly useful, since there are a myriad of genes examined under vastly different conditions, each of which may induce or repress expression of the same gene for different reasons. There is an enormous complexity to the data that we are examining—each gene is associated with dozens if not hundreds of expression values as well as multiple documents built up from vocabularies consisting of thousands of words. In Section 2.4 we reviewed common gene expression strategies, most of which revolve around defining groups of genes based on common profiles. A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present computational methods that leverage the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in gene expression data analysis offers an opportunity to incorporate background functional information about the genes when defining expression clusters. In Chapter 5 we saw how literature- based approaches could help in the analysis of single condition experiments. Here we will apply the strategies introduced in Chapter 6 to assess the coherence of groups of genes to enhance gene expression analysis approaches. The methods proposed here could, in fact, be applied to any multivariate genomics data type. The key concepts discussed in this chapter are listed in the frame box. We begin with a discussion of gene groups and their role in expression analysis; we briefly discuss strategies to assign keywords to groups and strategies to assess their functional coherence. We apply functional coherence measures to gene expression analysis; for examples we focus on a yeast expression data set. We first demonstrate how functional coherence can be used to focus in on the key biologically relevant gene groups derived by clustering methods such as self-organizing maps and k-means clustering.

Download Full-text

Genome Expression Pathway Analysis Tool – Analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context

BMC Bioinformatics ◽

10.1186/1471-2105-8-179 ◽

2007 ◽

Vol 8 (1) ◽

Cited By ~ 15

Author(s):

Markus Weniger ◽

Julia C Engelmann ◽

Jörg Schultz

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Pathway Analysis ◽

Microarray Gene Expression Data ◽

Analysis Tool ◽

Expression Data ◽

Microarray Gene Expression ◽

Genome Expression ◽

Pathway Analysis Tool ◽

Microarray Gene

Download Full-text

ExAtlas: An interactive online tool for meta-analysis of gene expression data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015500195 ◽

2015 ◽

Vol 13 (06) ◽

pp. 1550019 ◽

Cited By ~ 37

Author(s):

Alexei A. Sharov ◽

David Schlessinger ◽

Minoru S. H. Ko

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Gene Expression Data ◽

Fixed Effects ◽

Expression Profiles ◽

Meta Analysis ◽

Data Sets ◽

Expression Data ◽

Gene Set ◽

Public Data

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.

Download Full-text

Abstract 328: An integrated analysis of three distinct IBC/non-IBC Affymetrix gene expression data sets to study the transcriptional heterogeneity both between IBC and non-IBC and within IBC

10.1158/1538-7445.am2011-328 ◽

2011 ◽

Author(s):

Steven J. Van Laere ◽

Naoto Ueno ◽

Pascal Finetti ◽

Peter B. Vermeulen ◽

Anthony Lucci ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Integrated Analysis ◽

Data Sets ◽

Expression Data ◽

Affymetrix Gene Expression

Download Full-text

GENOME-WIDE PATHWAY ANALYSIS AND VISUALIZATION USING GENE EXPRESSION DATA

Biocomputing 2002 ◽

10.1142/9789812799623_0043 ◽

2001 ◽

Cited By ~ 2

Author(s):

M. P. KURHEKAR ◽

S. ADAK ◽

S. JHUNJHUNWALA ◽

K. RAGHUPATHY

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Pathway Analysis ◽

Expression Data ◽

Genome Wide

Download Full-text

HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data

Genes ◽

10.3390/genes10110931 ◽

2019 ◽

Vol 10 (11) ◽

pp. 931 ◽

Cited By ~ 4

Author(s):

Mok ◽

Kim ◽

Lee ◽

Choi ◽

Lee ◽

...

Keyword(s):

Gene Expression ◽

Pancreatic Cancer ◽

Gene Expression Data ◽

Pathway Analysis ◽

Structural Component ◽

Biological Data ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Global Test ◽

Causal Pathways

Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.

Download Full-text

CLASSIFYING TEMPORAL MICROARRAY DATA BY SELECTING INFORMATIVE GENES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013410060 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1341006

Author(s):

QIANG LOU ◽

ZORAN OBRADOVIC

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Microarray Data ◽

Data Sets ◽

Temporal Data ◽

Expression Data ◽

Selection Methods ◽

Temporal Gene Expression ◽

Single Matrix

In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.

Download Full-text

Genome-wide Pathway Analysis Using Gene Expression Data of Colonic Mucosa in Patients with Inflammatory Bowel Disease

Inflammatory Bowel Diseases ◽

10.1097/mib.0000000000000370 ◽

2015 ◽

pp. 1 ◽

Cited By ~ 5

Author(s):

Orazio Palmieri ◽

Teresa M. Creanza ◽

Fabrizio Bossa ◽

Orazio Palumbo ◽

Rosalia Maglietta ◽

...

Keyword(s):

Gene Expression ◽

Inflammatory Bowel Disease ◽

Gene Expression Data ◽

Pathway Analysis ◽

Bowel Disease ◽

Colonic Mucosa ◽

Expression Data ◽

Genome Wide ◽

Inflammatory Bowel

Download Full-text

INTERRELATED TWO-WAY CLUSTERING AND ITS APPLICATION ON GENE EXPRESSION DATA

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213005002272 ◽

2005 ◽

Vol 14 (04) ◽

pp. 577-597 ◽

Cited By ~ 6

Author(s):

CHUN TANG ◽

AIDONG ZHANG

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Domain Knowledge ◽

Gene Clusters ◽

Data Sets ◽

Messenger Rnas ◽

Expression Data ◽

Large Numbers ◽

Clustering Approach ◽

Mrna Expression Profiling

Microarray technologies are capable of simultaneously measuring the signals for thousands of messenger RNAs and large numbers of proteins from single samples. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasingly being used to explore patterns of gene expression in clinical research. Most research has focused on the interpretation of the meaning of the microarray data which are transformed into gene expression matrices where usually the rows represent genes, the columns represent various samples. Clustering samples can be done by analyzing and eliminating of irrelevant genes. However, majority methods are supervised (or assisted by domain knowledge), less attention has been paid on unsupervised approaches which are important when little domain knowledge is available. In this paper, we present a new framework for unsupervised analysis of gene expression data, which applies an interrelated two-way clustering approach on the gene expression matrices. The goal of clustering is to identify important genes and perform cluster discovery on samples. The advantage of this approach is that we can dynamically manipulate the relationship between the gene clusters and sample groups while conducting an iterative clustering through both of them. The performance of the proposed method with various gene expression data sets is also illustrated.

Download Full-text

Defining Immune Response Signatures in DLBCL As Potential Predictive Biomarkers for Outcome to Immunotherapy

Blood ◽

10.1182/blood.v126.23.2663.2663 ◽

2015 ◽

Vol 126 (23) ◽

pp. 2663-2663

Author(s):

Matthew A Care ◽

Stephen M Thirdborough ◽

Andrew J Davies ◽

Peter W.M. Johnson ◽

Andrew Jack ◽

...

Keyword(s):

Gene Expression ◽

Immune Response ◽

Network Analysis ◽

Gene Expression Data ◽

Research Funding ◽

Data Sets ◽

Expression Data ◽

Data Set ◽

Gene Correlation ◽

Cancer Types

Abstract Purpose To assess whether comparative gene network analysis can reveal characteristic immune response signatures that predict clinical response in Diffuse large B-cell lymphoma (DLBCL). Background The wealth of available gene expression data sets for DLBCL and other cancer types provides a resource to define recurrent pathological processes at the level of gene expression and gene correlation neighbourhoods. This is of particular relevance in the context of cancer immune responses, where convergence onto common patterns may drive shared gene expression profiles. Where existing and novel immunotherapies harness the immune response for therapeutic benefit such responses may provide predictive biomarkers. Methods We independently analysed publically available DLBCL gene expression data sets and a wide compendium of gene expression data from diverse cancer types, and then asked whether common elements of cancer host response could be identified from resulting networks. Using 10 DLBCL gene expression data sets, encompassing 2030 cases, we established pairwise gene correlation matrices per data set, which were merged to generate median correlations of gene pairs across all data sets. Gene network analysis and unsupervised clustering was then applied to define global representations of DLBCL gene expression neighbourhoods. In parallel a diverse range of solid and lymphoid malignancies including; breast, colorectal, oesophageal, head and neck, non-small cell lung, prostate, pancreatic cancer, Hodgkin lymphoma, Follicular lymphoma and DLBCL were independently analysed using an orthogonal weighted gene correlation network analysis of gene expression data sets from which correlated modules across diverse cancer types were identified. The biology of resulting gene neighbourhoods was assessed by signature and ontology enrichment, and the overlap between gene correlation neighbourhoods and WGCNA derived modules associated with immune/host responses was analysed. Results Amongst DLBCL data, we identified distinct gene correlation neighbourhoods associated with the immune response. These included both elements of IFN-polarised responses, core T-cell, and cytotoxic signatures as well as distinct macrophage responses. Neighbourhoods linked to macrophages separated CD163 from CD68 and CD14. In the WGCNA analysis of diverse cancer types clusters corresponding to these immune response neighbourhoods were independently identified including a highly similar cluster related to CD163. The overlapping CD163 clusters in both analyses linked to diverse Fc-Receptors, complement pathway components and patterns of scavenger receptors potentially linked to alternative macrophage activation. The relationship between the CD163 macrophage gene expression cluster and outcome was tested in DLBCL data sets, identifying a poor response in CD163 -cluster high patients, which reached statistical significance in one data set (GSE10846). Notably, the effect of the CD163-associated gene neighbourhood which correlates with poor outcome post rituximab containing immunochemotherapy is distinct from the effect of IFNG-STAT1-IRF1 polarised cytotoxic responses. The latter represents the predominant immune response pattern separating cell of origin unclassifiable (Type-III) DLBCL from either ABC or GCB DLBCL subsets, and is associated with a trend toward positive outcome. Conclusion Comparative gene expression network analysis identifies common immune response signatures shared between DLBCL and other cancer types. Gene expression clusters linked to CD163 macrophage responses and IFNG-STAT1-IRF1 polarised cytotoxic responses are common patterns with apparent divergent outcome association. Disclosures Davies: CTI: Honoraria; GIlead: Consultancy, Honoraria, Research Funding; Mundipharma: Honoraria, Research Funding; Bayer: Research Funding; Takeda: Honoraria, Research Funding; Janssen: Honoraria, Research Funding; Roche: Honoraria, Research Funding; GSK: Research Funding; Pfizer: Honoraria; Celgene: Honoraria, Research Funding. Jack:Jannsen: Research Funding.

Download Full-text