Gene Set linkage analysis: a tool for interpreting the overall functional impacts of observed transcriptomic changes

Abstract Background To date, the majority of software tools developed for high-level interpretation of transcriptomics data were based on annotation enrichment. These tools use existing biological concepts to describe the observed omics changes. However, if an observation cannot be accurately described by an existing concept, these tools can not report any term or will report very general terms (such as “biological process”, GO:0008150), which provides limited assistance for researchers to understand the data and design further investigation. Results We present the gene set linkage analysis (GSLA) tool for interpretation of the collective functional impacts of a set of changed genes. The GSLA algorithm relies on a functional association network to evaluate whether the observed omics changes collectively interfere with functions of known biological processes. Although an omics change may not be accurately described by an existing concept, its functional impact may still be described by well-established concepts. GSLA has been shown useful in several previous studies. It derived novel insights into high-level coordination of physiological processes, where conventional annotation enrichment-based tools did not provide similar insights. This standalone version of GSLA tool integrates interaction networks for four species (i.e., A. thaliana, D. melanogaster, H. sapiens and S. cerevisiae) and using four kinds of annotation gene sets (i.e., Gene Ontology, Reactome pathway, Panther pathway and Wikipathways) for each species. Conclusions The GSLA tool is designed to interpret the collective functional impacts of a set of changed genes. Its usefulness has been demonstrated in a series of previous researches analyzing transcriptomic changes. GSLA is freely available at https://github.com/synergy-zju/gsla.

Download Full-text

HIR V2: a human interactome resource for the biological interpretation of differentially expressed genes via gene set linkage analysis

10.21203/rs.3.rs-26127/v1 ◽

2020 ◽

Author(s):

Yu-Tian Tao ◽

Xiao-Bao Ding ◽

Jie Jin ◽

Hai-bo Zhang ◽

Qiao-lei Yang ◽

...

Keyword(s):

Linkage Analysis ◽

Protein Interactions ◽

Biological Significance ◽

High Quality ◽

Human Interactome ◽

Gene Set ◽

Human Genes ◽

Protein Interactome ◽

Biological Interpretation ◽

Transcriptomic Changes

Abstract Background To facilitate biomedical studies of disease mechanisms, a high-quality interactome that connects functionally related genes is needed to help investigators formulate pathway hypotheses and to interpret the biological logic of a phenotype at the biological process level. Results Interactions in the updated version of the human interactome resource (HIR V2) were inferred from 36 mathematical characterizations of 6 types of data that suggest functional associations between genes. This update of the HIR consists of 88,069 pairs of genes representing functional associations that are of strengths similar to those between well-studied protein interactions. Among these functional interactions, 57.04% may represent protein interactions, which are expected to cover 32.48% of the true human protein interactome. The gene set linkage analysis (GSLA) tool is developed based on the high-quality HIR V2 to identify the potential functional impacts of observed transcriptomic changes, helping to elucidate their biological significance and complementing the currently widely used enrichment-based gene set interpretation tools. Conclusions We present the HIR V2, a high-quality functional interactome of human genes, along with the gene set linkage analysis (GSLA) webtool, which utilizes the HIR V2 to interpret the biological significance of transcriptionally changed genes. A case study shows that the annotations reported by the HIR V2/GSLA system are more comprehensive and concise compared to those obtained by widely used gene set annotation tools such as PANTHER and DAVID. The HIR V2 and GSLA are available at http://human.biomedtzc.cn .

Download Full-text

Human interactome resource and gene set linkage analysis for the functional interpretation of biologically meaningful gene sets

Bioinformatics ◽

10.1093/bioinformatics/btt353 ◽

2013 ◽

Vol 29 (16) ◽

pp. 2024-2031 ◽

Cited By ~ 6

Author(s):

Xi Zhou ◽

Pengcheng Chen ◽

Qiang Wei ◽

Xueling Shen ◽

Xin Chen

Keyword(s):

Linkage Analysis ◽

Human Interactome ◽

Functional Interpretation ◽

Gene Set ◽

Gene Sets

Download Full-text

Utilizing Cancer - Functional Gene Set - Compound Networks to Identify Putative Drugs for Breast Cancer

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1574888x13666180105125347 ◽

2018 ◽

Vol 21 (2) ◽

pp. 74-83

Author(s):

Tzu-Hung Hsiao ◽

Yu-Chiao Chiu ◽

Yu-Heng Chen ◽

Yu-Ching Hsu ◽

Hung-I Harry Chen ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Cancer Therapy ◽

Cancer Treatment ◽

Cancer Survival ◽

Expression Profiles ◽

Functional Gene ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.

Download Full-text

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Current Bioinformatics ◽

10.2174/1574893615999200629124444 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chen-An Tsai ◽

James J. Chen

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Gene Expression Data ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Gene Sets ◽

Set Correlation

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations

BMC Bioinformatics ◽

10.1186/s12859-021-04105-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yi Chen ◽

Fons. J. Verbeek ◽

Katherine Wolstencroft

Keyword(s):

Gene Ontology ◽

Enrichment Analysis ◽

Biological Data ◽

Hallmarks Of Cancer ◽

High Throughput Analysis ◽

Knowledge Resources ◽

Gene Set ◽

Cancer Hallmarks ◽

Starting Point ◽

High Level

Abstract Background The hallmarks of cancer provide a highly cited and well-used conceptual framework for describing the processes involved in cancer cell development and tumourigenesis. However, methods for translating these high-level concepts into data-level associations between hallmarks and genes (for high throughput analysis), vary widely between studies. The examination of different strategies to associate and map cancer hallmarks reveals significant differences, but also consensus. Results Here we present the results of a comparative analysis of cancer hallmark mapping strategies, based on Gene Ontology and biological pathway annotation, from different studies. By analysing the semantic similarity between annotations, and the resulting gene set overlap, we identify emerging consensus knowledge. In addition, we analyse the differences between hallmark and gene set associations using Weighted Gene Co-expression Network Analysis and enrichment analysis. Conclusions Reaching a community-wide consensus on how to identify cancer hallmark activity from research data would enable more systematic data integration and comparison between studies. These results highlight the current state of the consensus and offer a starting point for further convergence. In addition, we show how a lack of consensus can lead to large differences in the biological interpretation of downstream analyses and discuss the challenges of annotating changing and accumulating biological data, using intermediate knowledge resources that are also changing over time.

Download Full-text

Higher Acid-Base Imbalance Associated with Respiratory Failure Could Decrease the Survival of Patients with Scrub Typhus during Intensive Care Unit Stay: A Gene Set Enrichment Analysis

Journal of Clinical Medicine ◽

10.3390/jcm8101580 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1580 ◽

Cited By ~ 1

Author(s):

Kyoung Min Moon ◽

Kyueng-Whan Min ◽

Mi-Hye Kim ◽

Dong-Hoon Kim ◽

Byoung Kwan Son ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Respiratory Failure ◽

Scrub Typhus ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Acid Base ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Download Full-text

Transcriptional survey of alveolar macrophages in a murine model of chronic granulomatous inflammation reveals common themes with human sarcoidosis

AJP Lung Cellular and Molecular Physiology ◽

10.1152/ajplung.00289.2017 ◽

2018 ◽

Vol 314 (4) ◽

pp. L617-L625 ◽

Cited By ~ 8

Author(s):

Arjun Mohan ◽

Anagha Malur ◽

Matthew McPeek ◽

Barbara P. Barna ◽

Lynn M. Schnapp ◽

...

Keyword(s):

Differentially Expressed Genes ◽

Murine Model ◽

Alveolar Macrophages ◽

Gene Set Enrichment Analysis ◽

Differentially Expressed ◽

Multiwall Carbon Nanotube ◽

Gene Set Enrichment ◽

Integrated Network ◽

Gene Set ◽

Gene Sets

To advance our understanding of the pathobiology of sarcoidosis, we developed a multiwall carbon nanotube (MWCNT)-based murine model that shows marked histological and inflammatory signal similarities to this disease. In this study, we compared the alveolar macrophage transcriptional signatures of our animal model with human sarcoidosis to identify overlapping molecular programs. Whole genome microarrays were used to assess gene expression of alveolar macrophages in six MWCNT-exposed and six control animals. The results were compared with the transcriptional profiles of alveolar immune cells in 15 sarcoidosis patients and 12 healthy humans. Rigorous statistical methods were used to identify differentially expressed genes. To better elucidate activated pathways, integrated network and gene set enrichment analysis (GSEA) was performed. We identified over 1,000 differentially expressed between control and MWCNT mice. Gene ontology functional analysis showed overrepresentation of processes primarily involved in immunity and inflammation in MCWNT mice. Applying GSEA to both mouse and human samples revealed upregulation of 92 gene sets in MWCNT mice and 142 gene sets in sarcoidosis patients. Commonly activated pathways in both MWCNT mice and sarcoidosis included adaptive immunity, T-cell signaling, IL-12/IL-17 signaling, and oxidative phosphorylation. Differences in gene set enrichment between MWCNT mice and sarcoidosis patients were also observed. We applied network analysis to differentially expressed genes common between the MWCNT model and sarcoidosis to identify key drivers of disease. In conclusion, an integrated network and transcriptomics approach revealed substantial functional similarities between a murine model and human sarcoidosis particularly with respect to activation of immune-specific pathways.

Download Full-text

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

BMC Bioinformatics ◽

10.1186/s12859-020-03929-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mike Fang ◽

Brian Richardson ◽

Cheryl M. Cameron ◽

Jean-Eudes Dazard ◽

Mark J. Cameron

Keyword(s):

Drug Targets ◽

T Regulatory Cells ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Regulatory Cells ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Gastroenteropancreatic Neuroendocrine Tumor ◽

Public Datasets

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

Download Full-text

A novel method to quantify gene set functional association based on gene ontology

Journal of The Royal Society Interface ◽

10.1098/rsif.2011.0551 ◽

2011 ◽

Vol 9 (70) ◽

pp. 1063-1072 ◽

Cited By ~ 22

Author(s):

Sali Lv ◽

Yan Li ◽

Qianghu Wang ◽

Shangwei Ning ◽

Teng Huang ◽

...

Keyword(s):

Gene Ontology ◽

Genetic Basis ◽

Protein Complexes ◽

Sequence Similarity ◽

Functional Association ◽

Future Studies ◽

Gene Set ◽

Complex Disorders ◽

Gene Sets ◽

Novel Method

Numerous gene sets have been used as molecular signatures for exploring the genetic basis of complex disorders. These gene sets are distinct but related to each other in many cases; therefore, efforts have been made to compare gene sets for studies such as those evaluating the reproducibility of different experiments. Comparison in terms of biological function has been demonstrated to be helpful to biologists. We improved the measurement of semantic similarity to quantify the functional association between gene sets in the context of gene ontology and developed a web toolkit named Gene Set Functional Similarity (GSFS; http://bioinfo.hrbmu.edu.cn/GSFS ). Validation based on protein complexes for which the functional associations are known demonstrated that the GSFS scores tend to be correlated with sequence similarity scores and that complexes with high GSFS scores tend to be involved in the same functional catalogue. Compared with the pairwise method and the annotation method, the GSFS shows better discrimination and more accurately reflects the known functional catalogues shared between complexes. Case studies comparing differentially expressed genes of prostate tumour samples from different microarray platforms and identifying coronary heart disease susceptibility pathways revealed that the method could contribute to future studies exploring the molecular basis of complex disorders.

Download Full-text

T1000: A reduced toxicogenomics gene set for improved decision making

10.7287/peerj.preprints.27839 ◽

2019 ◽

Author(s):

Othman Soufan ◽

Jessica Ewald ◽

Charles Viau ◽

Doug Crump ◽

Markus Hecker ◽

...

Keyword(s):

Dose Response ◽

Gene Clusters ◽

Gene Set ◽

Chemical Hazards ◽

Gene Sets ◽

Toxicological Research ◽

Multiple Species ◽

Response Modeling

There is growing interest within regulatory agencies and toxicological research communities to develop, test, and apply new approaches, such as toxicogenomics, to more efficiently evaluate chemical hazards. Given the complexity of analyzing thousands of genes simultaneously, there is a need to identify reduced gene sets.Though several gene sets have been defined for toxicological applications, few of these were purposefully derived using toxicogenomics data. Here, we developed and applied a systematic approach to identify 1000 genes (called Toxicogenomics-1000 or T1000) highly responsive to chemical exposures. First, a co-expression network of 11,210genes was built by leveraging microarray data from the Open TG-GATEs program. This network was then re-weighted based on prior knowledge of their biological (KEGG, MSigDB) and toxicological (CTD) relevance. Finally, weighted correlation network analysis was applied to identify 258 gene clusters. T1000 was defined by selecting genes from each cluster that were most associated with outcome measures. For model evaluation, we compared the performance of T1000 to that of other gene sets (L1000, S1500, Genes selected by Limma, and random set) using two external datasets. Additionally, a smaller (T384) and a larger version (T1500) of T1000 were used for dose-response modeling to test the effect of gene set size. Our findings demonstrated that the T1000 gene set is predictive of apical outcomes across a range of conditions (e.g.,in vitroand in vivo, dose-response, multiple species, tissues, and chemicals), and generally performs as well, or better than other gene sets available.

Download Full-text