rPAC: Route based Pathway Analysis for Cohorts of Gene Expression Data Sets

Methods ◽  
2021 ◽  
Author(s):  
Pujan Joshi ◽  
Brent Basso ◽  
Honglin Wang ◽  
Seung-Hyun Hong ◽  
Charles Giardina ◽  
...  
Author(s):  
Soumya Raychaudhuri

The most interesting and challenging gene expression data sets to analyze are large multidimensional data sets that contain expression values for many genes across multiple conditions. In these data sets the use of scientific text can be particularly useful, since there are a myriad of genes examined under vastly different conditions, each of which may induce or repress expression of the same gene for different reasons. There is an enormous complexity to the data that we are examining—each gene is associated with dozens if not hundreds of expression values as well as multiple documents built up from vocabularies consisting of thousands of words. In Section 2.4 we reviewed common gene expression strategies, most of which revolve around defining groups of genes based on common profiles. A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present computational methods that leverage the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in gene expression data analysis offers an opportunity to incorporate background functional information about the genes when defining expression clusters. In Chapter 5 we saw how literature- based approaches could help in the analysis of single condition experiments. Here we will apply the strategies introduced in Chapter 6 to assess the coherence of groups of genes to enhance gene expression analysis approaches. The methods proposed here could, in fact, be applied to any multivariate genomics data type. The key concepts discussed in this chapter are listed in the frame box. We begin with a discussion of gene groups and their role in expression analysis; we briefly discuss strategies to assign keywords to groups and strategies to assess their functional coherence. We apply functional coherence measures to gene expression analysis; for examples we focus on a yeast expression data set. We first demonstrate how functional coherence can be used to focus in on the key biologically relevant gene groups derived by clustering methods such as self-organizing maps and k-means clustering.


2015 ◽  
Vol 13 (06) ◽  
pp. 1550019 ◽  
Author(s):  
Alexei A. Sharov ◽  
David Schlessinger ◽  
Minoru S. H. Ko

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 931 ◽  
Author(s):  
Mok ◽  
Kim ◽  
Lee ◽  
Choi ◽  
Lee ◽  
...  

Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.


2013 ◽  
Vol 11 (03) ◽  
pp. 1341006
Author(s):  
QIANG LOU ◽  
ZORAN OBRADOVIC

In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.


2005 ◽  
Vol 14 (04) ◽  
pp. 577-597 ◽  
Author(s):  
CHUN TANG ◽  
AIDONG ZHANG

Microarray technologies are capable of simultaneously measuring the signals for thousands of messenger RNAs and large numbers of proteins from single samples. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasingly being used to explore patterns of gene expression in clinical research. Most research has focused on the interpretation of the meaning of the microarray data which are transformed into gene expression matrices where usually the rows represent genes, the columns represent various samples. Clustering samples can be done by analyzing and eliminating of irrelevant genes. However, majority methods are supervised (or assisted by domain knowledge), less attention has been paid on unsupervised approaches which are important when little domain knowledge is available. In this paper, we present a new framework for unsupervised analysis of gene expression data, which applies an interrelated two-way clustering approach on the gene expression matrices. The goal of clustering is to identify important genes and perform cluster discovery on samples. The advantage of this approach is that we can dynamically manipulate the relationship between the gene clusters and sample groups while conducting an iterative clustering through both of them. The performance of the proposed method with various gene expression data sets is also illustrated.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 2663-2663
Author(s):  
Matthew A Care ◽  
Stephen M Thirdborough ◽  
Andrew J Davies ◽  
Peter W.M. Johnson ◽  
Andrew Jack ◽  
...  

Abstract Purpose To assess whether comparative gene network analysis can reveal characteristic immune response signatures that predict clinical response in Diffuse large B-cell lymphoma (DLBCL). Background The wealth of available gene expression data sets for DLBCL and other cancer types provides a resource to define recurrent pathological processes at the level of gene expression and gene correlation neighbourhoods. This is of particular relevance in the context of cancer immune responses, where convergence onto common patterns may drive shared gene expression profiles. Where existing and novel immunotherapies harness the immune response for therapeutic benefit such responses may provide predictive biomarkers. Methods We independently analysed publically available DLBCL gene expression data sets and a wide compendium of gene expression data from diverse cancer types, and then asked whether common elements of cancer host response could be identified from resulting networks. Using 10 DLBCL gene expression data sets, encompassing 2030 cases, we established pairwise gene correlation matrices per data set, which were merged to generate median correlations of gene pairs across all data sets. Gene network analysis and unsupervised clustering was then applied to define global representations of DLBCL gene expression neighbourhoods. In parallel a diverse range of solid and lymphoid malignancies including; breast, colorectal, oesophageal, head and neck, non-small cell lung, prostate, pancreatic cancer, Hodgkin lymphoma, Follicular lymphoma and DLBCL were independently analysed using an orthogonal weighted gene correlation network analysis of gene expression data sets from which correlated modules across diverse cancer types were identified. The biology of resulting gene neighbourhoods was assessed by signature and ontology enrichment, and the overlap between gene correlation neighbourhoods and WGCNA derived modules associated with immune/host responses was analysed. Results Amongst DLBCL data, we identified distinct gene correlation neighbourhoods associated with the immune response. These included both elements of IFN-polarised responses, core T-cell, and cytotoxic signatures as well as distinct macrophage responses. Neighbourhoods linked to macrophages separated CD163 from CD68 and CD14. In the WGCNA analysis of diverse cancer types clusters corresponding to these immune response neighbourhoods were independently identified including a highly similar cluster related to CD163. The overlapping CD163 clusters in both analyses linked to diverse Fc-Receptors, complement pathway components and patterns of scavenger receptors potentially linked to alternative macrophage activation. The relationship between the CD163 macrophage gene expression cluster and outcome was tested in DLBCL data sets, identifying a poor response in CD163 -cluster high patients, which reached statistical significance in one data set (GSE10846). Notably, the effect of the CD163-associated gene neighbourhood which correlates with poor outcome post rituximab containing immunochemotherapy is distinct from the effect of IFNG-STAT1-IRF1 polarised cytotoxic responses. The latter represents the predominant immune response pattern separating cell of origin unclassifiable (Type-III) DLBCL from either ABC or GCB DLBCL subsets, and is associated with a trend toward positive outcome. Conclusion Comparative gene expression network analysis identifies common immune response signatures shared between DLBCL and other cancer types. Gene expression clusters linked to CD163 macrophage responses and IFNG-STAT1-IRF1 polarised cytotoxic responses are common patterns with apparent divergent outcome association. Disclosures Davies: CTI: Honoraria; GIlead: Consultancy, Honoraria, Research Funding; Mundipharma: Honoraria, Research Funding; Bayer: Research Funding; Takeda: Honoraria, Research Funding; Janssen: Honoraria, Research Funding; Roche: Honoraria, Research Funding; GSK: Research Funding; Pfizer: Honoraria; Celgene: Honoraria, Research Funding. Jack:Jannsen: Research Funding.


Sign in / Sign up

Export Citation Format

Share Document