scholarly journals simplifyEnrichment: an R/Bioconductor package for Clustering and Visualizing Functional Enrichment Results

2020 ◽  
Author(s):  
Zuguang Gu ◽  
Daniel Hübschmann

AbstractMotivationFunctional enrichment analysis or gene set enrichment analysis is a basic bioinformatics method that evaluates biological importance of a list of genes of interest. However, it may produce a long list of significant terms with highly redundant information that is difficult to summarize. Current tools to simplify enrichment results by clustering them into groups either still produce redundancy between clusters or do not retain consistent term similarities within clusters.Resultswe proposed a new method named binary cut for clustering similarity matrices of functional terms. Through comprehensive benchmarks on both simulated and real-world datasets, we demonstrated that binary cut can efficiently cluster functional terms into groups where terms showed more consistent similarities within groups and were more mutually exclusive between groups. We compared binary cut clustering on the similarity matrices from different similarity measurements and we found the semantic similarity worked well with binary cut while the similarity matrices based on gene overlap showed less consistent patterns and they were not recommended to work with binary cut. We implemented the binary cut algorithm into an R package simplifyEnrichment which additionally provides functionalities for visualizing, summarizing and comparing the clusterings.Availability and implementationThe simplifyEnrichment package and the documentations are available at https://bioconductor.org/packages/simplifyEnrichment/. The reports for the analysis of all datasets benchmarked in the paper are available at https://simplifyenrichment.github.io/. The scripts that performed the analysis are available at https://github.com/jokergoo/simplifyEnrichment_manuscript.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 709 ◽  
Author(s):  
Liis Kolberg ◽  
Uku Raudvere ◽  
Ivan Kuzmin ◽  
Jaak Vilo ◽  
Hedi Peterson

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 709 ◽  
Author(s):  
Liis Kolberg ◽  
Uku Raudvere ◽  
Ivan Kuzmin ◽  
Jaak Vilo ◽  
Hedi Peterson

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.


2021 ◽  
Vol 8 ◽  
Author(s):  
Hongsheng Lin ◽  
Yangyi Xie ◽  
Yinzhi Kong ◽  
Li Yang ◽  
Mingfen Li

Long non-coding RNAs (lncRNAs) as important regulators of gene expression also have critical functions in immune regulation. This study identified lncRNA modulators of immune-related pathways as biomarkers for hepatocellular carcinoma (HCC). The profile of lncRNA regulation in immune pathways in HCC was comprehensively mapped. To determine lncRNAs with immunomodulatory functions specific to HCC, the enrichment of lncRNAs in a collection of 17 immune functions was calculated applying gene set enrichment analysis (GSEA). Unsupervised clustering of samples were performed in the R package ConsensusClusterPlus to analyze subtype survival and immunological characteristics. The enrichment of 3,134 lncRNA–immune pathway pairs in both diseased and normal samples showed a total of 1,984 immunoregulatory functional lncRNAs specific to HCC only. In addition, 18 immune-related lncRNAs were disordered in HCC and were significantly associated with immune cell infiltration. Functional enrichment analysis indicated that the 18 dysregulated immune lncRNAs were enriched in cytokines, cytokine receptors, TGFb family members, TNF family members, and TNF family member receptor pathways. Two molecular subtypes of hepatocellular carcinoma were identified based on 18 dysregulated immune lncRNAs. Immunological profiling showed that subtype 1 samples with higher levels of cytokine response had a better survival, but subtype 2 samples with higher levels of tumor proliferation had poorer survival. This study identified 18 HCC-specific dysregulated immune lncRNAs and two HCC molecular subtypes with significant prognostic differences and immune characteristics. The current findings help understand the function of lncRNAs and promote the identification of immunotherapy targets.


Biomolecules ◽  
2019 ◽  
Vol 9 (9) ◽  
pp. 429 ◽  
Author(s):  
Zou ◽  
Zheng ◽  
Deng ◽  
Yang ◽  
Xie ◽  
...  

Circular RNA CDR1as/ciRS-7 functions as an oncogenic regulator in various cancers. However, there has been a lack of systematic and comprehensive analysis to further elucidate its underlying role in cancer. In the current study, we firstly performed a bioinformatics analysis of CDR1as among 868 cancer samples by using RNA-seq datasets of the MiOncoCirc database. Gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), gene set enrichment analysis (GSEA), CIBERSORT, Estimating the Proportion of Immune and Cancer cells (EPIC), and the MAlignant Tumors using Expression data (ESTIMATE) algorithm were applied to investigate the underlying functions and pathways. Functional enrichment analysis suggested that CDR1as has roles associated with angiogenesis, extracellular matrix (ECM) organization, integrin binding, and collagen binding. Moreover, pathway analysis indicated that it may regulate the TGF-β signaling pathway and ECM-receptor interaction. Therefore, we used CIBERSORT, EPIC, and the ESTIMATE algorithm to investigate the association between CDR1as expression and the tumor microenvironment. Our data strongly suggest that CDR1as may play a specific role in immune and stromal cell infiltration in tumor tissue, especially those of CD8+ T cells, activated NK cells, M2 macrophages, cancer-associated fibroblasts (CAFs) and endothelial cells. Generally, systematic and comprehensive analyses of CDR1as were conducted to shed light on its underlying pro-cancerous mechanism. CDR1as regulates the TGF-β signaling pathway and ECM-receptor interaction to serve as a mediator in alteration of the tumor microenvironment.


2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Xiang Qian ◽  
Zhuo Chen ◽  
Sha Sha Chen ◽  
Lu Ming Liu ◽  
Ai Qin Zhang

The study aimed to clarify the potential immune-related targets and mechanisms of Qingyihuaji Formula (QYHJ) against pancreatic cancer (PC) through network pharmacology and weighted gene co-expression network analysis (WGCNA). Active ingredients of herbs in QYHJ were identified by the TCMSP database. Then, the putative targets of active ingredients were predicted with SwissTargetPrediction and the STITCH databases. The expression profiles of GSE32676 were downloaded from the GEO database. WGCNA was used to identify the co-expression modules. Besides, the putative targets, immune-related targets, and the critical module genes were mapped with the specific disease to select the overlapped genes (OGEs). Functional enrichment analysis of putative targets and OGEs was conducted. The overall survival (OS) analysis of OGEs was investigated using the Kaplan-Meier plotter. The relative expression and methylation levels of OGEs were detected in UALCAN, human protein atlas (HPA), Oncomine, DiseaseMeth version 2.0 and, MEXPRESS database, respectively. Gene set enrichment analysis (GSEA) was conducted to elucidate the key pathways of highly-expressed OGEs further. OS analyses found that 12 up-regulated OGEs, including CDK1, PLD1, MET, F2RL1, XDH, NEK2, TOP2A, NQO1, CCND1, PTK6, CTSE, and ERBB2 that could be utilized as potential diagnostic indicators for PC. Further, methylation analyses suggested that the abnormal up-regulation of these OGEs probably resulted from hypomethylation, and GSEA revealed the genes markedly related to cell cycle and proliferation of PC. This study identified CDK1, PLD1, MET, F2RL1, XDH, NEK2, TOP2A, NQO1, CCND1, PTK6, CTSE, and ERBB2 might be used as reliable immune-related biomarkers for prognosis of PC, which may be essential immunotherapies targets of QYHJ.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0246668
Author(s):  
Lihua Cai ◽  
Honglong Wu ◽  
Ke Zhou

Identifying biomarkers that are associated with different types of cancer is an important goal in the field of bioinformatics. Different researcher groups have analyzed the expression profiles of many genes and found some certain genetic patterns that can promote the improvement of targeted therapies, but the significance of some genes is still ambiguous. More reliable and effective biomarkers identification methods are then needed to detect candidate cancer-related genes. In this paper, we proposed a novel method that combines the infinite latent feature selection (ILFS) method with the functional interaction (FIs) network to rank the biomarkers. We applied the proposed method to the expression data of five cancer types. The experiments indicated that our network-constrained ILFS (NCILFS) provides an improved prediction of the diagnosis of the samples and locates many more known oncogenes than the original ILFS and some other existing methods. We also performed functional enrichment analysis by inspecting the over-represented gene ontology (GO) biological process (BP) terms and applying the gene set enrichment analysis (GSEA) method on selected biomarkers for each feature selection method. The enrichments analysis reports show that our network-constraint ILFS can produce more biologically significant gene sets than other methods. The results suggest that network-constrained ILFS can identify cancer-related genes with a higher discriminative power and biological significance.


Dose-Response ◽  
2020 ◽  
Vol 18 (3) ◽  
pp. 155932582094207
Author(s):  
Yi-qi Jin ◽  
Dong-liu Miao

Background: Epigenetic alterations have been shown to lead to human carcinogenesis. The aim of this study was to perform an integrative analysis to develop an epigenetic signature to predict overall survival (OS) of esophageal cancer. Methods: DNA methylation and messenger RNA expression data of esophageal cancer samples were downloaded from The Cancer Genome Atlas database and were incorporated and analyzed using an R package MethylMix. Functional enrichment analysis of the methylation-related differentially expressed genes (DEGs) was performed. Epigenetic signature and nomogram associated with the OS of esophageal cancer were established by the multivariate Cox model. Results: A total of 71 methylation-related DEGs were identified. Kyoto Encyclopedia of Genes and Genomes pathway analysis revealed that these genes were involved in the biological process related to the initiation and progression of esophageal cancer. Two-gene (FAM24B and FAM200A) risk signature for OS was developed by multivariate Cox analysis, of which had high accuracy. The signature is independent of clinicopathological variables and indicated better predictive power than other clinicopathological variables. Moreover, we developed a novel prognostic nomogram based on risk score and 3 clinicopathological factors. Conclusions: Our study indicated possible methylation-related DEGs and established an epigenetic signature, which may provide novel insights for understanding the pathogenesis of esophageal cancer.


2020 ◽  
Author(s):  
Chen Xu ◽  
Ling-bing Meng ◽  
Yu Xiao ◽  
Yong Qiu ◽  
Ying-jue Du ◽  
...  

Abstract Background Osteoarthritis (OA) is a chronic, progressive, inflammatory, degenerative disease, which has become an osteoarthropathy that seriously affects physical health and quality of life of elderly people. However, the etiology and pathogenesis of OA remains unclear. Therefore, the study purposed to utilize bioinformatics technology to perform identification and functional enrichment analysis of differentially expressed genes in osteoarthritis. Method The main methods of this study consist of access to microarray data (GSE82107 and GSE55235), identification of differently expressed genes (DEGs) by GEO2R between OA and normal synovium samples, enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) by Gene Set Enrichment Analysis (GSEA), construction and analysis of protein-protein interaction (PPI) network, significant module and hub genes. Result A total of 300 DEGs were identified, consisting of 64 up-regulated genes and 11 down-regulated genes in OA samples compared to normal synovium tissues. Gene set enrichment analysis of DEGs provided a comprehensive overview of some major pathophysiological mechanisms in OA: cellular response to hydrogen peroxide, P53 signaling pathway and so on. The study also built the PPI network, and a total of 10 key genes were identified: CYR61, PENK, GOLM1, DUSP1, ATF3, STC2, FOSB, PRSS23, TF, and TNC. Conclusion DEGs exists between OA patients and normal cartilage tissue, which may be involved in the related mechanism of OA development, especially cellular response to hydrogen peroxide and CYR61.


Author(s):  
Yanxin Liu ◽  
Zhang Feng ◽  
Huaxia Chen

Background: As a tumor suppressor or oncogenic gene, abnormal expression of RUNX family transcription factor 3 (RUNX3) has been reported in various cancers. Introduction: This study aimed to investigate the role of RUNX3 in melanoma. Methods: The expression level of RUNX3 in melanoma tissues was analyzed by immunohistochemistry and the Oncomine database. Based on microarray datasets GSE3189 and GSE7553, differentially expressed genes (DEGs) in melanoma samples were screened, followed by functional enrichment analysis. Gene Set Enrichment Analysis (GSEA) was performed for RUNX3. DEGs that co-expressed with RUNX3 were analyzed, and the transcription factors (TFs) of RUNX3 and its co-expressed genes were predicted. The protein-protein interactions (PPIs) for RUNX3 were analyzed utilizing the GeneMANIA database. MicroRNAs (miRNAs) that could target RUNX3 expression, were predicted. Results : RUNX3 expression was significantly up-regulated in melanoma tissues. GSEA showed that RUNX3 expression was positively correlated with melanogenesis and melanoma pathways. Eleven DEGs showed significant co-expression with RUNX3 in melanoma, for example, TLE4 was negatively co-expressed with RUNX3. RUNX3 was identified as a TF that regulated the expression of both itself and its co-expressed genes. PPI analysis showed that 20 protein-encoding genes interacted with RUNX3, among which 9 genes were differentially expressed in melanoma, such as CBFB and SMAD3. These genes were significantly enriched in transcriptional regulation by RUNX3, RUNX3 regulates BCL2L11 (BIM) transcription, regulation of I-kappaB kinase/NF-kappaB signaling, and signaling by NOTCH. A total of 31 miRNAs could target RUNX3, such as miR-326, miR-330-5p, and miR-373-3p. Conclusion: RUNX3 expression was up-regulated in melanoma and was implicated in the development of melanoma.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Huan Deng ◽  
Qingqing Hang ◽  
Dijian Shen ◽  
Yibi Zhang ◽  
Ming Chen

Abstract Purpose Exploring the molecular mechanisms of lung adenocarcinoma (LUAD) is beneficial for developing new therapeutic strategies and predicting prognosis. This study was performed to select core genes related to LUAD and to analyze their prognostic value. Methods Microarray datasets from the GEO (GSE75037) and TCGA-LUAD datasets were analyzed to identify differentially coexpressed genes in LUAD using weighted gene coexpression network analysis (WGCNA) and differential gene expression analysis. Functional enrichment analysis was conducted, and a protein–protein interaction (PPI) network was established. Subsequently, hub genes were identified using the CytoHubba plug-in. Overall survival (OS) analyses of hub genes were performed. The Clinical Proteomic Tumor Analysis Consortium (CPTAC) and the Human Protein Atlas (THPA) databases were used to validate our findings. Gene set enrichment analysis (GSEA) of survival-related hub genes were conducted. Immunohistochemistry (IHC) was carried out to validate our findings. Results We identified 486 differentially coexpressed genes. Functional enrichment analysis suggested these genes were primarily enriched in the regulation of epithelial cell proliferation, collagen-containing extracellular matrix, transforming growth factor beta binding, and signaling pathways regulating the pluripotency of stem cells. Ten hub genes were detected using the maximal clique centrality (MCC) algorithm, and four genes were closely associated with OS. The CPTAC and THPA databases revealed that CHRDL1 and SPARCL1 were downregulated at the mRNA and protein expression levels in LUAD, whereas SPP1 was upregulated. GSEA demonstrated that DNA-dependent DNA replication and catalytic activity acting on RNA were correlated with CHRDL1 and SPARCL1 expression, respectively. The IHC results suggested that CHRDL1 and SPARCL1 were significantly downregulated in LUAD. Conclusions Our study revealed that survival-related hub genes closely correlated with the initiation and progression of LUAD. Furthermore, CHRDL1 and SPARCL1 are potential therapeutic and prognostic indicators of LUAD.


Sign in / Sign up

Export Citation Format

Share Document