Network properties of cancer prognostic genes and their gene sets in the human protein interactome

Abstract Background Identifying prognostic genes (PG) is crucial for estimating survival time and providing pinpoint treatments for patients with cancer. However, prognostic genes sets (PGS) reported in most existing research have low reproducibility and overlap ever between the same cancers or their subtypes. Their common characteristic as well as the molecular mechanism of action is still elusive. Methods Here, we obtained nine prognostic gene sets (including 1,439 prognostic genes) of different types of cancer from 23 high quality literatures, and systemically investigated eight network topological properties for PG and PGS compared with background and four other gene sets (cancer gene set CA, essential gene set ES, housekeeping gene set HK, and metastasis-angiogenesis gene set MA) based on the HPRD and String networks. Results The results showed that PG did not occupy key positions in the human protein interactome network, and were more similar to ES rather than CA. Also, PGS had significantly small intraset distance (IAD) and interset distance (IED) in comparison with random sets. Further, we also found that PGS tended to have be distributed within network modules rather than between modules, the functional intersection of the modules enriched with PGS was closely related to cancer. Conclusions Our research reveals the common properties of cancer PG and PGS in the human protein interactome network, and can help us understand and discover cancer prognostic biomarkers.

Download Full-text

Network Properties of Cancer Prognostic Gene Signatures in the Human Protein Interactome

Genes ◽

10.3390/genes11030247 ◽

2020 ◽

Vol 11 (3) ◽

pp. 247

Author(s):

Jifeng Zhang ◽

Shoubao Yan ◽

Cheng Jiang ◽

Zhicheng Ji ◽

Chenrun Wang ◽

...

Keyword(s):

Clustering Coefficient ◽

Human Protein ◽

Random Sets ◽

P Value ◽

Gene Signatures ◽

Prognostic Gene ◽

Gene Sets ◽

Protein Interactome ◽

Network Properties ◽

Set Distance

Prognostic gene signatures are critical in cancer prognosis assessments and their pinpoint treatments. However, their network properties remain unclear. Here, we obtained nine prognostic gene sets including 1439 prognostic genes of different cancers from related publications. Four network centralities were used to examine the network properties of prognostic genes (PG) compared with other gene sets based on the Human Protein Reference Database (HPRD) and String networks. We also proposed three novel network measures for further investigating the network properties of prognostic gene sets (PGS) besides clustering coefficient. The results showed that PG did not occupy key positions in the human protein interaction network and were more similar to essential genes rather than cancer genes. However, PGS had significantly smaller intra-set distance (IAD) and inter-set distance (IED) in comparison with random sets (p-value < 0.001). Moreover, we also found that PGS tended to be distributed within network modules rather than between modules (p-value < 0.01), and the functional intersection of the modules enriched with PGS was closely related to cancer development and progression. Our research reveals the common network properties of cancer prognostic gene signatures in the human protein interactome. We argue that these are biologically meaningful and useful for understanding their molecular mechanism.

Download Full-text

Elucidating essential genes in plant-associated Pseudomonas protegens Pf-5 using transposon insertion sequencing

Journal of Bacteriology ◽

10.1128/jb.00432-20 ◽

2020 ◽

Author(s):

Belinda K Fabian ◽

Christie Foster ◽

Amy J Asher ◽

Liam D. H. Elbourne ◽

Amy K Cain ◽

...

Keyword(s):

Essential Gene ◽

Bacterial Pathogens ◽

Essential Genes ◽

Associated Bacteria ◽

Gene Essentiality ◽

Gene Set ◽

Transposon Insertion ◽

Gene Sets ◽

Genome Wide ◽

Transposon Insertion Sequencing

Gene essentiality studies have been performed on numerous bacterial pathogens, but essential gene sets have been determined for only a few plant-associated bacteria. Pseudomonas protegens Pf-5 is a plant-commensal, biocontrol bacteria that can control disease-causing pathogens on a wide range of crops. Work on Pf-5 has mostly focused on secondary metabolism and biocontrol genes, but genome-wide approaches such as high-throughput transposon mutagenesis have not yet been used in this species. Here we generated a dense P. protegens Pf-5 transposon mutant library and used transposon-directed insertion site sequencing (TraDIS) to identify 446 genes essential for growth on rich media. Genes required for fundamental cellular machinery were enriched in the essential gene set, while genes related to nutrient biosynthesis, stress responses and transport were under-represented. The majority of Pf-5 essential genes were part of the P. protegens core genome. Comparison of the essential gene set of Pf-5 with those of two plant associated pseudomonads, P. simiae and P. syringae, and the well-studied opportunistic human pathogen P. aeruginosa PA14 showed the four species share a large number of essential genes, but each species also had uniquely essential genes. Comparison of the Pf-5 in silico predicted and in vitro determined essential gene sets highlighted the essential cellular functions that are over- and underestimated by each method. Expanding essentiality studies into bacteria with a range of lifestyles may improve our understanding of the biological processes important for bacterial survival and growth. Importance Essential genes are those crucial for survival or normal growth rates in an organism. Essential gene sets have been identified in numerous bacterial pathogens, but only a few plant-associated bacteria. Employing genome-wide approaches, such as transposon insertion sequencing, allows for the concurrent analysis of all genes of a bacterial species and rapid determination of essential gene sets. We have used transposon insertion sequencing to systematically analyze thousands of Pseudomonas protegens Pf-5 genes and gain insights into gene functions and interactions that are not readily available using traditional methods. Comparing Pf-5 essential genes with those of three other pseudomonads highlights how gene essentiality varies between closely related species.

Download Full-text

Utilizing Cancer - Functional Gene Set - Compound Networks to Identify Putative Drugs for Breast Cancer

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1574888x13666180105125347 ◽

2018 ◽

Vol 21 (2) ◽

pp. 74-83

Author(s):

Tzu-Hung Hsiao ◽

Yu-Chiao Chiu ◽

Yu-Heng Chen ◽

Yu-Ching Hsu ◽

Hung-I Harry Chen ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Cancer Therapy ◽

Cancer Treatment ◽

Cancer Survival ◽

Expression Profiles ◽

Functional Gene ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.

Download Full-text

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Current Bioinformatics ◽

10.2174/1574893615999200629124444 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chen-An Tsai ◽

James J. Chen

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Gene Expression Data ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Gene Sets ◽

Set Correlation

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

Higher Acid-Base Imbalance Associated with Respiratory Failure Could Decrease the Survival of Patients with Scrub Typhus during Intensive Care Unit Stay: A Gene Set Enrichment Analysis

Journal of Clinical Medicine ◽

10.3390/jcm8101580 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1580 ◽

Cited By ~ 1

Author(s):

Kyoung Min Moon ◽

Kyueng-Whan Min ◽

Mi-Hye Kim ◽

Dong-Hoon Kim ◽

Byoung Kwan Son ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Respiratory Failure ◽

Scrub Typhus ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Acid Base ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Download Full-text

Transcriptional survey of alveolar macrophages in a murine model of chronic granulomatous inflammation reveals common themes with human sarcoidosis

AJP Lung Cellular and Molecular Physiology ◽

10.1152/ajplung.00289.2017 ◽

2018 ◽

Vol 314 (4) ◽

pp. L617-L625 ◽

Cited By ~ 8

Author(s):

Arjun Mohan ◽

Anagha Malur ◽

Matthew McPeek ◽

Barbara P. Barna ◽

Lynn M. Schnapp ◽

...

Keyword(s):

Differentially Expressed Genes ◽

Murine Model ◽

Alveolar Macrophages ◽

Gene Set Enrichment Analysis ◽

Differentially Expressed ◽

Multiwall Carbon Nanotube ◽

Gene Set Enrichment ◽

Integrated Network ◽

Gene Set ◽

Gene Sets

To advance our understanding of the pathobiology of sarcoidosis, we developed a multiwall carbon nanotube (MWCNT)-based murine model that shows marked histological and inflammatory signal similarities to this disease. In this study, we compared the alveolar macrophage transcriptional signatures of our animal model with human sarcoidosis to identify overlapping molecular programs. Whole genome microarrays were used to assess gene expression of alveolar macrophages in six MWCNT-exposed and six control animals. The results were compared with the transcriptional profiles of alveolar immune cells in 15 sarcoidosis patients and 12 healthy humans. Rigorous statistical methods were used to identify differentially expressed genes. To better elucidate activated pathways, integrated network and gene set enrichment analysis (GSEA) was performed. We identified over 1,000 differentially expressed between control and MWCNT mice. Gene ontology functional analysis showed overrepresentation of processes primarily involved in immunity and inflammation in MCWNT mice. Applying GSEA to both mouse and human samples revealed upregulation of 92 gene sets in MWCNT mice and 142 gene sets in sarcoidosis patients. Commonly activated pathways in both MWCNT mice and sarcoidosis included adaptive immunity, T-cell signaling, IL-12/IL-17 signaling, and oxidative phosphorylation. Differences in gene set enrichment between MWCNT mice and sarcoidosis patients were also observed. We applied network analysis to differentially expressed genes common between the MWCNT model and sarcoidosis to identify key drivers of disease. In conclusion, an integrated network and transcriptomics approach revealed substantial functional similarities between a murine model and human sarcoidosis particularly with respect to activation of immune-specific pathways.

Download Full-text

Mapping the human protein interactome

Cell Research ◽

10.1038/cr.2008.72 ◽

2008 ◽

Vol 18 (7) ◽

pp. 716-724 ◽

Cited By ~ 28

Author(s):

Daniel Figeys

Keyword(s):

Human Protein ◽

Protein Interactome

Download Full-text

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

BMC Bioinformatics ◽

10.1186/s12859-020-03929-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mike Fang ◽

Brian Richardson ◽

Cheryl M. Cameron ◽

Jean-Eudes Dazard ◽

Mark J. Cameron

Keyword(s):

Drug Targets ◽

T Regulatory Cells ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Regulatory Cells ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Gastroenteropancreatic Neuroendocrine Tumor ◽

Public Datasets

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

Download Full-text

A novel method to quantify gene set functional association based on gene ontology

Journal of The Royal Society Interface ◽

10.1098/rsif.2011.0551 ◽

2011 ◽

Vol 9 (70) ◽

pp. 1063-1072 ◽

Cited By ~ 22

Author(s):

Sali Lv ◽

Yan Li ◽

Qianghu Wang ◽

Shangwei Ning ◽

Teng Huang ◽

...

Keyword(s):

Gene Ontology ◽

Genetic Basis ◽

Protein Complexes ◽

Sequence Similarity ◽

Functional Association ◽

Future Studies ◽

Gene Set ◽

Complex Disorders ◽

Gene Sets ◽

Novel Method

Numerous gene sets have been used as molecular signatures for exploring the genetic basis of complex disorders. These gene sets are distinct but related to each other in many cases; therefore, efforts have been made to compare gene sets for studies such as those evaluating the reproducibility of different experiments. Comparison in terms of biological function has been demonstrated to be helpful to biologists. We improved the measurement of semantic similarity to quantify the functional association between gene sets in the context of gene ontology and developed a web toolkit named Gene Set Functional Similarity (GSFS; http://bioinfo.hrbmu.edu.cn/GSFS ). Validation based on protein complexes for which the functional associations are known demonstrated that the GSFS scores tend to be correlated with sequence similarity scores and that complexes with high GSFS scores tend to be involved in the same functional catalogue. Compared with the pairwise method and the annotation method, the GSFS shows better discrimination and more accurately reflects the known functional catalogues shared between complexes. Case studies comparing differentially expressed genes of prostate tumour samples from different microarray platforms and identifying coronary heart disease susceptibility pathways revealed that the method could contribute to future studies exploring the molecular basis of complex disorders.

Download Full-text

T1000: A reduced toxicogenomics gene set for improved decision making

10.7287/peerj.preprints.27839 ◽

2019 ◽

Author(s):

Othman Soufan ◽

Jessica Ewald ◽

Charles Viau ◽

Doug Crump ◽

Markus Hecker ◽

...

Keyword(s):

Dose Response ◽

Gene Clusters ◽

Gene Set ◽

Chemical Hazards ◽

Gene Sets ◽

Toxicological Research ◽

Multiple Species ◽

Response Modeling

There is growing interest within regulatory agencies and toxicological research communities to develop, test, and apply new approaches, such as toxicogenomics, to more efficiently evaluate chemical hazards. Given the complexity of analyzing thousands of genes simultaneously, there is a need to identify reduced gene sets.Though several gene sets have been defined for toxicological applications, few of these were purposefully derived using toxicogenomics data. Here, we developed and applied a systematic approach to identify 1000 genes (called Toxicogenomics-1000 or T1000) highly responsive to chemical exposures. First, a co-expression network of 11,210genes was built by leveraging microarray data from the Open TG-GATEs program. This network was then re-weighted based on prior knowledge of their biological (KEGG, MSigDB) and toxicological (CTD) relevance. Finally, weighted correlation network analysis was applied to identify 258 gene clusters. T1000 was defined by selecting genes from each cluster that were most associated with outcome measures. For model evaluation, we compared the performance of T1000 to that of other gene sets (L1000, S1500, Genes selected by Limma, and random set) using two external datasets. Additionally, a smaller (T384) and a larger version (T1500) of T1000 were used for dose-response modeling to test the effect of gene set size. Our findings demonstrated that the T1000 gene set is predictive of apical outcomes across a range of conditions (e.g.,in vitroand in vivo, dose-response, multiple species, tissues, and chemicals), and generally performs as well, or better than other gene sets available.

Download Full-text