Utilizing Cancer - Functional Gene Set - Compound Networks to Identify Putative Drugs for Breast Cancer

2018 ◽  
Vol 21 (2) ◽  
pp. 74-83
Author(s):  
Tzu-Hung Hsiao ◽  
Yu-Chiao Chiu ◽  
Yu-Heng Chen ◽  
Yu-Ching Hsu ◽  
Hung-I Harry Chen ◽  
...  

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.

2015 ◽  
Vol 6 ◽  
pp. 2438-2448 ◽  
Author(s):  
Andrew Williams ◽  
Sabina Halappanavar

Background: The presence of diverse types of nanomaterials (NMs) in commerce is growing at an exponential pace. As a result, human exposure to these materials in the environment is inevitable, necessitating the need for rapid and reliable toxicity testing methods to accurately assess the potential hazards associated with NMs. In this study, we applied biclustering and gene set enrichment analysis methods to derive essential features of altered lung transcriptome following exposure to NMs that are associated with lung-specific diseases. Several datasets from public microarray repositories describing pulmonary diseases in mouse models following exposure to a variety of substances were examined and functionally related biclusters of genes showing similar expression profiles were identified. The identified biclusters were then used to conduct a gene set enrichment analysis on pulmonary gene expression profiles derived from mice exposed to nano-titanium dioxide (nano-TiO2), carbon black (CB) or carbon nanotubes (CNTs) to determine the disease significance of these data-driven gene sets. Results: Biclusters representing inflammation (chemokine activity), DNA binding, cell cycle, apoptosis, reactive oxygen species (ROS) and fibrosis processes were identified. All of the NM studies were significant with respect to the bicluster related to chemokine activity (DAVID; FDR p-value = 0.032). The bicluster related to pulmonary fibrosis was enriched in studies where toxicity induced by CNT and CB studies was investigated, suggesting the potential for these materials to induce lung fibrosis. The pro-fibrogenic potential of CNTs is well established. Although CB has not been shown to induce fibrosis, it induces stronger inflammatory, oxidative stress and DNA damage responses than nano-TiO2 particles. Conclusion: The results of the analysis correctly identified all NMs to be inflammogenic and only CB and CNTs as potentially fibrogenic. In addition to identifying several previously defined, functionally relevant gene sets, the present study also identified two novel genes sets: a gene set associated with pulmonary fibrosis and a gene set associated with ROS, underlining the advantage of using a data-driven approach to identify novel, functionally related gene sets. The results can be used in future gene set enrichment analysis studies involving NMs or as features for clustering and classifying NMs of diverse properties.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 317-317
Author(s):  
Xiao J. Yan ◽  
Wentian Li ◽  
Sophia Yancopoulos ◽  
Igor Dozmorov ◽  
Carlo Calissano ◽  
...  

Abstract Abstract 317 By using reciprocal densities of surface membrane CXCR4 and CD5, chronic lymphocytic leukemia (CLL) B cells can be divided into 3 fractions indicating time since last division (proliferative, intermediate, and resting). It has been suggested that cells in these fractions represent a continuum from resting to intermediate to proliferative. In this study, we made intraclonal gene expression profile (GEP) comparisons of these fractions from 17 CLL patients to try to confirm this notion and interclonal comparisons between U-CLL and M-CLL patients to determine if pathways involved in the actions of these fractions differed between patient subgroups. PBMCs from 8 U-CLL and 9 M-CLL patients were sorted into 3 fractions (CD19+CD3−CD5hiCXCR4lo, PROLIF), (CD19+CD3−CD5intCXCR4int, INTERM), and (CD19+CD3−CD5loCXCR4hi, REST); RNA was purified from each, and gene expression microarrays using Illumina HumanHT12 beadchips performed. To determine differentially expressed genes in intraclonal comparisons, expression value ratios for fractions from each patient were computed, log-transformed, and Student t-test performed using R (www.r-project.org); for interclonal comparisons, raw GEP data between subpopulations were compared: U-PROLIF and M-PROLIF, and U-REST and M-REST. Sets of significant genes (≥1.5 fold change and P<0.01) were analyzed using Ingenuity Pathway Analysis (IPA) and Gene Set Enrichment Analysis (GSEA). Upon plotting intraclonal average log ratios of PROLIF/INTERM vs INTERM/REST, it was clear that gene expression levels changed in the same direction, i.e. PROLIF>INTERM>REST, or PROLIF<INTERM<REST, consistent with a continuum between the 3 fractions. Within this pattern, 36 genes were significant for both plotted ratios. Of these, 29 were overexpressed, along with CD5; CD68, ITGAX, CCND2, CRIP1 and LGALS1 were the highest. Functional analysis using IPA showed these genes to be related to NFkB signaling and cell trafficking. Seven genes (ADARB1, BACH2, CNTNAP2, HRK, RHPN2, PRPML, and RXPA) were significantly downregulated, along with CXCR4. Next we characterized GEP differences between the PROLIF and REST fractions, identifying 390 genes up-regulated in PROLIF and 244 in REST. The top 5 upregulated PROLIF genes were CD68, LY96, ITGAX, CCND2 and CRIP1, and the top 5 REST genes were BACH2, CXCR4, ADARB1, RHPN2 and HRK. Functionally, the upregulated PROLIF genes were related to BCR signaling, cytokines (IFNa, IL12), NFkB, and Akt, whereas the upregulated REST genes related to BCL2, cell death and cell movement. By GSEA, 813/881 gene sets, defined by expression neighborhoods centered on cancer associated genes, were upregulated in the PROLIF with 436 gene sets significant at a false discovery rate (FDR) <10%; 206 sets were significantly enriched with p value <0.01. For the REST, 68/881 gene sets were upregulated, with none significant even at FDR <25%. Finally, we examined PROLIF and REST fractions from U-CLL vs M-CLL patients. In this interclonal analysis, 93 genes were significantly different between U-PROLIF and M-PROLIF. The top 5 in U-PROLIF were MSI2, TGFBR3, TP53I3, RGCC and IGSF3, and the top 5 in M-PROLIF were MTSS1, BACE2, BRI3BP, AP3B1 and UBE2G2. Similarly, there were 125 genes that were significantly different between U-REST and M-REST. The top 5 in U-REST were DUSP26, CLEC2B, MDK, and EGR2 and in M-REST were NAPSA, RAB24, TARDBP, KCNN4 and ADD3. Interestingly, U-PROLIF and M-PROLIF differed in pathway assignments, with upregulated genes in U-PROLIF contributing to cell signaling and activation, particularly implicating Akt, ERK and P38MAPK. The intraclonal gene GEP analysis on these 3 fractions confirms that CLL clones contain a spectrum of cells that transition in a sequential manner from PROLIF to INTERM to REST fractions. Functional analyses show that genes upregulated in PROLIF correlate with cell signaling and proliferation, while genes upregulated in REST relate to cell death. Thus the PROLIF fraction is enriched in recently divided cells that likely exit from lymphoid tissue and the REST in older, less vital cells that either traffic to lymphoid tissue or die. The interclonal analysis implies that the stimuli and/or the responses of cells in the PROLIF and REST fractions differ between U-CLL and M-CLL. This last novel finding suggests either distinct cells of origin or distinct activation pathways for the IGHV-defined CLL subsets. Disclosures: Barrientos: gilead and pharmacyclics research funding: Research Funding.


2016 ◽  
Vol 34 (4_suppl) ◽  
pp. 558-558 ◽  
Author(s):  
Michael Sangmin Lee ◽  
Benjamin Garrett Vincent ◽  
Autumn Jackson McRee ◽  
Hanna Kelly Sanoff

558 Background: Different immune cell infiltrates into colorectal cancer (CRC) tumors are associated with different prognoses. Tumor-associated macrophages contribute to immune evasion and accelerated tumor progression. Conversely, tumor infiltrating lymphocytes at the invasive margin of CRC liver metastases are associated with improved outcomes with chemotherapy. Cetuximab is an IgG1 monoclonal antibody against epidermal growth factor receptor (EGFR) and stimulates antibody-dependent cellular cytotoxicity (ADCC) in vitro. However, it is unclear in humans if response to cetuximab is modulated by the immune response. We hypothesized that different immune patterns detected in gene expression profiles of CRC metastases are associated with different responses to cetuximab. Methods: We retrieved gene expression data from biopsies of metastases from 80 refractory CRC patients treated with cetuximab monotherapy (GEO GSE5851). Samples were dichotomized by cetuximab response as having either disease control (DC) or progressive disease (PD). We performed gene set enrichment analysis (GSEA) with GenePattern 3.9.4 using gene sets of immunologic signatures obtained from the Molecular Signatures Database v5.0. Results: Among the 68 patients with response annotated, 25 had DC and 43 had PD. In the PD cohort, 59/1910 immunologic gene sets had false discovery rate (FDR) < 0.1. Notably, multiple gene sets upregulated in monocyte signatures were associated with PD. Also, gene sets consistent with PD1-ligated T cells compared to control activated T cells (FDR = 0.052) or IL4-treated CD4 T cells compared to controls (FDR = 0.087) were associated with PD. Conclusions: Cetuximab-resistant patients tended to have baseline increased expression of gene signatures reflective of monocytic infiltrates, consistent with also having increased expression of the IL4-treated T-cell signature. Cetuximab resistance was also associated with increased expression of the PD1-ligated T cell signature. These preliminary findings support further evaluation of the effect of differential immune infiltrates in prognosis of metastatic CRC treated with cetuximab.


2017 ◽  
Author(s):  
Jie Tan ◽  
Matthew Huyck ◽  
Dongbo Hu ◽  
René A. Zelaya ◽  
Deborah A. Hogan ◽  
...  

AbstractBackgroundGene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data.ResultsHere we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server (http://adage.greenelab.com) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and Δanr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr.ConclusionsWe designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.


2020 ◽  
Author(s):  
Priyanka Chakraborty ◽  
Jason T George ◽  
Wendy A Woodward ◽  
Herbert Levine ◽  
Mohit Kumar Jolly

AbstractInflammatory breast cancer (IBC) is a highly aggressive breast cancer that metastasizes largely via tumor emboli, and has a 5-year survival rate of less than 30%. No unique genomic signature has yet been identified for IBC nor has any specific molecular therapeutic been developed to manage the disease. Thus, identifying gene expression signatures specific to IBC remains crucial. Here, we compare various gene lists that have been proposed as molecular footprints of IBC using different clinical samples as training and validation sets and using independent training algorithms, and determine their accuracy in identifying IBC samples in three independent datasets. We show that these gene lists have little to no mutual overlap, and have limited predictive accuracy in identifying IBC samples. Despite this inconsistency, single-sample gene set enrichment analysis (ssGSEA) of IBC samples correlate with their position on the epithelial-hybrid-mesenchymal spectrum. This positioning, together with ssGSEA scores, improves the accuracy of IBC identification across the three independent datasets. Finally, we observed that IBC samples robustly displayed a higher coefficient of variation in terms of EMT scores, as compared to non-IBC samples. Pending verification that this patient-to-patient variability extends to intratumor heterogeneity within a single patient, these results suggest that higher heterogeneity along the epithelial-hybrid-mesenchymal spectrum can be regarded to be a hallmark of IBC and a possibly useful biomarker.


2020 ◽  
Author(s):  
Xiaomei Lei ◽  
Zhijun Feng ◽  
Xiaojun Wang ◽  
Xiaodong He

Abstract Background. Exploring alterations in the host transcriptome following SARS-CoV-2 infection is not only highly warranted to help us understand molecular mechanisms of the disease, but also provide new prospective for screening effective antiviral drugs, finding new therapeutic targets, and evaluating the risk of systemic inflammatory response syndrome (SIRS) early.Methods. We downloaded three gene expression matrix files from the Gene Expression Omnibus (GEO) database, and extracted the gene expression data of the SARS-CoV-2 infection and non-infection in human samples and different cell line samples, and then performed gene set enrichment analysis (GSEA), respectively. Thereafter, we integrated the results of GSEA and obtained co-enriched gene sets and co-core genes in three various microarray data. Finally, we also constructed a protein-protein interaction (PPI) network and molecular modules for co-core genes and performed Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis for the genes from modules to clarify their possible biological processes and underlying signaling pathway. Results. A total of 11 co-enriched gene sets were identified from the three various microarray data. Among them, 10 gene sets were activated, and involved in immune response and inflammatory reaction. 1 gene set was suppressed, and participated in cell cycle. The analysis of molecular modules showed that 2 modules might play a vital role in the pathogenic process of SARS-CoV-2 infection. The KEGG enrichment analysis showed that genes from module one enriched in signaling pathways related to inflammation, but genes from module two enriched in signaling of cell cycle and DNA replication. Particularly, necroptosis signaling, a newly identified type of programmed cell death that differed from apoptosis, was also determined in our findings. Additionally, for patients with SARS-CoV-2 infection, genes from module one showed a relatively high-level expression while genes from module two showed low-level. Conclusions. We identified two molecular modules were used to assess severity and predict the prognosis of the patients with SARS-CoV-2 infection. In addition, these results provide a unique opportunity to explore more molecular pathways as new potential targets on therapy in COVID 19.


2021 ◽  
Author(s):  
Yannian Luo ◽  
Juan Xu ◽  
Mingzhen Zhou ◽  
Xiaomei Lei ◽  
Wen Cao ◽  
...  

Abstract Background. Exploring alterations in the host transcriptome following SARS-CoV-2 infection is not only highly warranted to help us understand molecular mechanisms of the disease, but also provide new prospective for screening effective antiviral drugs, finding new therapeutic targets, and evaluating the risk of systemic inflammatory response syndrome (SIRS) early.Methods. We downloaded three gene expression matrix files from the Gene Expression Omnibus (GEO) database, and extracted the gene expression data of the SARS-CoV-2 infection and non-infection in human samples and different cell line samples, and then performed gene set enrichment analysis (GSEA), respectively. Thereafter, we integrated the results of GSEA and obtained co-enriched gene sets and co-core genes in three various microarray data. Finally, we also constructed a protein-protein interaction (PPI) network and molecular modules for co-core genes and performed Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis for the genes from modules to clarify their possible biological processes and underlying signaling pathway. Results. A total of 11 co-enriched gene sets were identified from the three various microarray data. Among them, 10 gene sets were activated, and involved in immune response and inflammatory reaction. 1 gene set was suppressed, and participated in cell cycle. The analysis of molecular modules showed that 2 modules might play a vital role in the pathogenic process of SARS-CoV-2 infection. The KEGG enrichment analysis showed that genes from module one enriched in signaling pathways related to inflammation, but genes from module two enriched in signaling of cell cycle and DNA replication. Particularly, necroptosis signaling, a newly identified type of programmed cell death that differed from apoptosis, was also determined in our findings. Additionally, for patients with SARS-CoV-2 infection, genes from module one showed a relatively high-level expression while genes from module two showed low-level. Conclusions. We identified two molecular modules were used to assess severity and predict the prognosis of the patients with SARS-CoV-2 infection. In addition, these results provide a unique opportunity to explore more molecular pathways as new potential targets on therapy in COVID 19.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 22-22
Author(s):  
Ellen K. Kendall ◽  
Manishkumar S. Patel ◽  
Sarah Ondrejka ◽  
Agrima Mian ◽  
Yazeed Sawalha ◽  
...  

Background: Diffuse large B-cell lymphoma (DLBCL) is the most common type of non-Hodgkin lymphoma. While 60% of DLBCL patients achieve complete remission with frontline therapy, relapsed/refractory (R/R) DLBCL patients have a poor prognosis with median overall survival below one year, necessitating investigation into the biological principles that distinguish cured from R/R DLBCL. Recent analyses have identified unfavorable molecular signatures when accounting for gene expression, copy number alterations and mutational profiles in R/R DLBCL. However, an integrative analysis of the relationship between epigenetic and transcriptomic changes has yet to be described. In this study, we compared baseline methylation and gene expression profiles of DLBCL patients with dichotomized clinical outcomes. Methods: Diagnostic DLBCL biopsies were obtained from two patient cohorts: patients who relapsed or were refractory following chemoimmunotherapy ("R/R"), and patients who entered durable clinical remission following therapy ("cured"). The median age for R/R and cured cohorts were 62 (range 35-86) years vs. 64 (range 28-83) years (P= 0.27). High-intermediate or high IPI scores were present in 14 vs. 6 patients (P= 0.08) in the R/R and cured cohorts, respectively. All patients were treated with frontline R-CHOP or R-EPOCH. DNA and RNA were extracted simultaneously from formalin-fixed, paraffin embedded biopsy samples. An Illumina 850k Methylation Array was used to identify DNA methylation levels in 29 R/R patients and 20 cured patients. RNA sequencing was performed on 9 R/R patients and 7 cured patients at diagnosis using Illumina HiSeq4000. Differentially methylated probes were identified using the DMRcate package, and differentially expressed genes were identified using the DESeq2 package. Gene set enrichment analysis was performed using canonical pathway gene sets from MSigDB. Results: At the time of diagnosis, we found significant epigenetic and transcriptomic differences between cured and R/R patients. Comparing cured to R/R samples, there were 8,159 differentially methylated probes (FDR&lt;0.05). Differentially methylated regions between R/R and cured cohorts overlap with genes previously identified as mutation hotspots in DLBCL. Upon comparing transcriptomic profiles between R/R and cured, 267 genes were found to be differentially expressed (Log2FC&gt;|1| and FDR&lt;0.05). Gene set enrichment analysis revealed gene sets related to cell cycle, membrane trafficking, Rho and Rab family GTPase function, and transcriptional regulation were upregulated in the R/R samples. Gene sets related to innate immune signaling, Type I and II interferon signaling, fatty acid and carbohydrate metabolism were upregulated in the cured samples. To identify genes likely to be regulated by specific changes in methylation, we selected genes that were both differentially expressed and differentially methylated between the R/R and cured cohorts. In the R/R samples, 13 genes (ARMC5, ARRDC1, C12orf57, CCSER1, D2HGDH, DUOX2, FAM189B, FKBP2, KLF5, MFSD10, NEK8, NT5C, and WDR18) were significantly hypermethylated and underexpressed when compared to cured specimens, suggesting that epigenetic silencing of these genes is associated with lack of response to chemoimmunotherapy. In contrast, 12 genes (ATP2B1, C15orf41, FAM102B, FAM3C, FHOD3, FYTTD1, GPR180, KIAA1841, LRMP, MEF2A, RRAS2, and TPD52) were significantly hypermethylated and underexpressed in cured patients, suggesting that epigenetic silencing of these genes is favorable for treatment response. Many of these epigenetically modified genes have been previously implicated in cancer biology, including roles in NOTCH signaling, chromosomal instability, and biomarkers of prognosis. Conclusions: This is the first integrative epigenetic and transcriptomic analysis of diagnostic biopsies from cured and R/R DLBCL patients following chemoimmunotherapy. At the time of diagnosis, both the methylation and gene expression profiles significantly differ between patients that enter durable remission as opposed to those who are R/R to therapy. Soon, the hypomethylating agent CC-486 (i.e. oral azacitidine) will be explored in combination with mini-R-CHOP for older DLBCL patients in whom DNA methylation is likely increased. These data support the use of hypomethylating agents to potentially restore sensitivity of DLBCL to chemoimmunotherapy. Disclosures Hsi: Eli Lilly: Research Funding; Abbvie: Research Funding; Miltenyi: Consultancy, Honoraria; Seattle Genetics: Consultancy, Honoraria; CytomX: Consultancy, Honoraria. Hill:Celgene: Consultancy, Honoraria, Research Funding; BMS: Consultancy, Honoraria, Research Funding; Novartis: Consultancy, Honoraria; Kite, a Gilead Company: Consultancy, Honoraria, Research Funding; AstraZenica: Consultancy, Honoraria, Research Funding; Pharmacyclics: Consultancy, Honoraria, Research Funding; Takeda: Research Funding; Beigene: Consultancy, Honoraria, Research Funding; Genentech: Consultancy, Honoraria, Research Funding; Abbvie: Consultancy, Honoraria, Research Funding; Karyopharm: Consultancy, Honoraria, Research Funding.


2021 ◽  
Author(s):  
Viola Hollestein ◽  
Geert Poelmans ◽  
Natalie Forde ◽  
Christian F Beckmann ◽  
Christine Ecker ◽  
...  

Background: The excitatory/inhibitory (E/I) imbalance hypothesis posits that an imbalance between excitatory (glutamatergic) and inhibitory (GABAergic) mechanisms underlies the behavioral characteristics of autism spectrum disorder (autism). However, how E/I imbalance arises and how it may differ across autism symptomatology and brain regions is not well understood. Methods: We used innovative analysis methods - combining competitive gene-set analysis and gene-expression profiles in relation to cortical thickness (CT)- to investigate the relationship between genetic variance, brain structure and autism symptomatology of participants from the EU-AIMS LEAP cohort (autism=360, male/female=259/101; neurotypical control participants=279, male/female=178/101) aged 6 to 30 years. Competitive gene-set analysis investigated associations between glutamatergic and GABAergic signaling pathway gene-sets and clinical measures, and CT. Additionally, we investigated expression profiles of the genes within those sets throughout the brain and how those profiles relate to differences in CT between autistic and neurotypical control participants in the same regions. Results: The glutamate gene-set was associated with all autism symptom severity scores on the Autism Diagnostic Observation Schedule-2 (ADOS-2) and the Autism Diagnostic Interview-Revised (ADI-R) within the autistic group, while the GABA set was associated with sensory processing measures (using the SSP subscales) across all participants. Brain regions with greater gene expression of both glutamate and GABA genes showed greater differences in CT between autistic and neurotypical control participants. Conclusions: Our results suggest crucial roles for glutamate and GABA genes in autism symptomatology as well as CT, where GABA is more strongly associated with sensory processing and glutamate more with autism symptom severity. 


2019 ◽  
Author(s):  
Heonjong Han ◽  
Sangyoung Lee ◽  
Insuk Lee

ABSTRACTGene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets, however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.


Sign in / Sign up

Export Citation Format

Share Document