Human interactome resource and gene set linkage analysis for the functional interpretation of biologically meaningful gene sets

2013 ◽  
Vol 29 (16) ◽  
pp. 2024-2031 ◽  
Author(s):  
Xi Zhou ◽  
Pengcheng Chen ◽  
Qiang Wei ◽  
Xueling Shen ◽  
Xin Chen
2020 ◽  
Author(s):  
Yu-Tian Tao ◽  
Xiao-Bao Ding ◽  
Jie Jin ◽  
Hai-bo Zhang ◽  
Qiao-lei Yang ◽  
...  

Abstract Background To facilitate biomedical studies of disease mechanisms, a high-quality interactome that connects functionally related genes is needed to help investigators formulate pathway hypotheses and to interpret the biological logic of a phenotype at the biological process level. Results Interactions in the updated version of the human interactome resource (HIR V2) were inferred from 36 mathematical characterizations of 6 types of data that suggest functional associations between genes. This update of the HIR consists of 88,069 pairs of genes representing functional associations that are of strengths similar to those between well-studied protein interactions. Among these functional interactions, 57.04% may represent protein interactions, which are expected to cover 32.48% of the true human protein interactome. The gene set linkage analysis (GSLA) tool is developed based on the high-quality HIR V2 to identify the potential functional impacts of observed transcriptomic changes, helping to elucidate their biological significance and complementing the currently widely used enrichment-based gene set interpretation tools. Conclusions We present the HIR V2, a high-quality functional interactome of human genes, along with the gene set linkage analysis (GSLA) webtool, which utilizes the HIR V2 to interpret the biological significance of transcriptionally changed genes. A case study shows that the annotations reported by the HIR V2/GSLA system are more comprehensive and concise compared to those obtained by widely used gene set annotation tools such as PANTHER and DAVID. The HIR V2 and GSLA are available at http://human.biomedtzc.cn .


2019 ◽  
Author(s):  
Pengcheng Chen ◽  
Xi Liang ◽  
Yun Li ◽  
Xiaoxuan Wang ◽  
Xin Chen

Abstract Background To date, the majority of software tools developed for high-level interpretation of transcriptomics data were based on annotation enrichment. These tools use existing biological concepts to describe the observed omics changes. However, if an observation cannot be accurately described by an existing concept, these tools can not report any term or will report very general terms (such as “biological process”, GO:0008150), which provides limited assistance for researchers to understand the data and design further investigation. Results We present the gene set linkage analysis (GSLA) tool for interpretation of the collective functional impacts of a set of changed genes. The GSLA algorithm relies on a functional association network to evaluate whether the observed omics changes collectively interfere with functions of known biological processes. Although an omics change may not be accurately described by an existing concept, its functional impact may still be described by well-established concepts. GSLA has been shown useful in several previous studies. It derived novel insights into high-level coordination of physiological processes, where conventional annotation enrichment-based tools did not provide similar insights. This standalone version of GSLA tool integrates interaction networks for four species (i.e., A. thaliana, D. melanogaster, H. sapiens and S. cerevisiae) and using four kinds of annotation gene sets (i.e., Gene Ontology, Reactome pathway, Panther pathway and Wikipathways) for each species. Conclusions The GSLA tool is designed to interpret the collective functional impacts of a set of changed genes. Its usefulness has been demonstrated in a series of previous researches analyzing transcriptomic changes. GSLA is freely available at https://github.com/synergy-zju/gsla.


2019 ◽  
Author(s):  
Gregory Linkowski ◽  
Charles Blatti ◽  
Krishna Kalari ◽  
Saurabh Sinha ◽  
Shobha Vasudevan

ABSTRACTHigh throughput assays allow researchers to identify sets of genes related to experimental conditions or phenotypes of interest. These gene sets are frequently subjected to functional interpretation using databases of gene annotations. Recent approaches have extended this approach to also consider networks of gene-gene relationships and interactions when attempting to characterize properties of a gene set. We present here a supervised learning algorithm for gene set analysis, called ‘GeneSet MAPR’, that for the first time explicitly considers the patterns of direct as well as indirect relationships present in the network to quantify gene-gene similarities and then report shared properties of the gene set. Our extensive evaluations show that GeneSet MAPR performs better than other network-based methods for the task of identifying genes related to a given gene set, enabling more reliable functional characterizations of the gene set. When applied to the set of response-associated genes from a triple negative breast cancer study, GeneSet MAPR uncovers gene families such as claudins, kallikreins, and collagen type alpha chains related to patient’s response to treatment, and which are not uncovered with traditional analysis.


2008 ◽  
Vol 2 ◽  
pp. BBI.S1018 ◽  
Author(s):  
Manuela Hummel ◽  
Klaus H. Metzeler ◽  
Christian Buske ◽  
Stefan K. Bohlander ◽  
Ulrich Mansmann

Background The development of expression-based gene signatures for predicting prognosis or class membership is a popular and challenging task. Besides their stringent validation, signatures need a functional interpretation and must be placed in a biological context. Popular tools such as Gene Set Enrichment have drawbacks because they are restricted to annotated genes and are unable to capture the information hidden in the signature's non-annotated genes. Methodology We propose concepts to relate a signature with functional gene sets like pathways or Gene Ontology categories. The connection between single signature genes and a specific pathway is explored by hierarchical variable selection and gene association networks. The risk score derived from an individual patient's signature is related to expression patterns of pathways and Gene Ontology categories. Global tests are useful for these tasks, and they adjust for other factors. GlobalAncova is used to explore the effect on gene expression in specific functional groups from the interaction of the score and selected mutations in the patient's genome. Results We apply the proposed methods to an expression data set and a corresponding gene signature for predicting survival in Acute Myeloid Leukemia (AML). The example demonstrates strong relations between the signature and cancer-related pathways. The signature-based risk score was found to be associated with development-related biological processes. Conclusions Many authors interpret the functional aspects of a gene signature by linking signature genes to pathways or relevant functional gene groups. The method of gene set enrichment is preferred to annotating signature genes to specific Gene Ontology categories. The strategies proposed in this paper go beyond the restriction of annotation and deepen the insights into the biological mechanisms reflected in the information given by a signature.


2018 ◽  
Vol 21 (2) ◽  
pp. 74-83
Author(s):  
Tzu-Hung Hsiao ◽  
Yu-Chiao Chiu ◽  
Yu-Heng Chen ◽  
Yu-Ching Hsu ◽  
Hung-I Harry Chen ◽  
...  

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chen-An Tsai ◽  
James J. Chen

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.


2019 ◽  
Vol 8 (10) ◽  
pp. 1580 ◽  
Author(s):  
Kyoung Min Moon ◽  
Kyueng-Whan Min ◽  
Mi-Hye Kim ◽  
Dong-Hoon Kim ◽  
Byoung Kwan Son ◽  
...  

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.


2018 ◽  
Vol 314 (4) ◽  
pp. L617-L625 ◽  
Author(s):  
Arjun Mohan ◽  
Anagha Malur ◽  
Matthew McPeek ◽  
Barbara P. Barna ◽  
Lynn M. Schnapp ◽  
...  

To advance our understanding of the pathobiology of sarcoidosis, we developed a multiwall carbon nanotube (MWCNT)-based murine model that shows marked histological and inflammatory signal similarities to this disease. In this study, we compared the alveolar macrophage transcriptional signatures of our animal model with human sarcoidosis to identify overlapping molecular programs. Whole genome microarrays were used to assess gene expression of alveolar macrophages in six MWCNT-exposed and six control animals. The results were compared with the transcriptional profiles of alveolar immune cells in 15 sarcoidosis patients and 12 healthy humans. Rigorous statistical methods were used to identify differentially expressed genes. To better elucidate activated pathways, integrated network and gene set enrichment analysis (GSEA) was performed. We identified over 1,000 differentially expressed between control and MWCNT mice. Gene ontology functional analysis showed overrepresentation of processes primarily involved in immunity and inflammation in MCWNT mice. Applying GSEA to both mouse and human samples revealed upregulation of 92 gene sets in MWCNT mice and 142 gene sets in sarcoidosis patients. Commonly activated pathways in both MWCNT mice and sarcoidosis included adaptive immunity, T-cell signaling, IL-12/IL-17 signaling, and oxidative phosphorylation. Differences in gene set enrichment between MWCNT mice and sarcoidosis patients were also observed. We applied network analysis to differentially expressed genes common between the MWCNT model and sarcoidosis to identify key drivers of disease. In conclusion, an integrated network and transcriptomics approach revealed substantial functional similarities between a murine model and human sarcoidosis particularly with respect to activation of immune-specific pathways.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mike Fang ◽  
Brian Richardson ◽  
Cheryl M. Cameron ◽  
Jean-Eudes Dazard ◽  
Mark J. Cameron

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chengshu Xie ◽  
Shaurya Jauhari ◽  
Antonio Mora

Abstract Background Gene Set Analysis (GSA) is arguably the method of choice for the functional interpretation of omics results. The following paper explores the popularity and the performance of all the GSA methodologies and software published during the 20 years since its inception. "Popularity" is estimated according to each paper's citation counts, while "performance" is based on a comprehensive evaluation of the validation strategies used by papers in the field, as well as the consolidated results from the existing benchmark studies. Results Regarding popularity, data is collected into an online open database ("GSARefDB") which allows browsing bibliographic and method-descriptive information from 503 GSA paper references; regarding performance, we introduce a repository of jupyter workflows and shiny apps for automated benchmarking of GSA methods (“GSA-BenchmarKING”). After comparing popularity versus performance, results show discrepancies between the most popular and the best performing GSA methods. Conclusions The above-mentioned results call our attention towards the nature of the tool selection procedures followed by researchers and raise doubts regarding the quality of the functional interpretation of biological datasets in current biomedical studies. Suggestions for the future of the functional interpretation field are made, including strategies for education and discussion of GSA tools, better validation and benchmarking practices, reproducibility, and functional re-analysis of previously reported data.


Sign in / Sign up

Export Citation Format

Share Document