scholarly journals gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 709 ◽  
Author(s):  
Liis Kolberg ◽  
Uku Raudvere ◽  
Ivan Kuzmin ◽  
Jaak Vilo ◽  
Hedi Peterson

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 709 ◽  
Author(s):  
Liis Kolberg ◽  
Uku Raudvere ◽  
Ivan Kuzmin ◽  
Jaak Vilo ◽  
Hedi Peterson

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.


2020 ◽  
Author(s):  
Zuguang Gu ◽  
Daniel Hübschmann

AbstractMotivationFunctional enrichment analysis or gene set enrichment analysis is a basic bioinformatics method that evaluates biological importance of a list of genes of interest. However, it may produce a long list of significant terms with highly redundant information that is difficult to summarize. Current tools to simplify enrichment results by clustering them into groups either still produce redundancy between clusters or do not retain consistent term similarities within clusters.Resultswe proposed a new method named binary cut for clustering similarity matrices of functional terms. Through comprehensive benchmarks on both simulated and real-world datasets, we demonstrated that binary cut can efficiently cluster functional terms into groups where terms showed more consistent similarities within groups and were more mutually exclusive between groups. We compared binary cut clustering on the similarity matrices from different similarity measurements and we found the semantic similarity worked well with binary cut while the similarity matrices based on gene overlap showed less consistent patterns and they were not recommended to work with binary cut. We implemented the binary cut algorithm into an R package simplifyEnrichment which additionally provides functionalities for visualizing, summarizing and comparing the clusterings.Availability and implementationThe simplifyEnrichment package and the documentations are available at https://bioconductor.org/packages/simplifyEnrichment/. The reports for the analysis of all datasets benchmarked in the paper are available at https://simplifyenrichment.github.io/. The scripts that performed the analysis are available at https://github.com/jokergoo/simplifyEnrichment_manuscript.


2021 ◽  
Author(s):  
Rui Fan ◽  
Qinghua Cui

ABSTRACTGene functional enrichment analysis represents one of the most popular bioinformatics methods for annotating the pathways and function categories of a given gene list. Current algorithms for enrichment computation such as Fisher’s exact test and hypergeometric test totally depend on the category count numbers of the gene list and one gene set. In this case, whatever the genes are, they were treated equally. However, actually genes show different scores in their essentiality in a gene list and in a gene set. It is thus hypothesized that the essentiality scores could be important and should be considered in gene functional analysis. For this purpose, here we proposed WEAT (https://www.cuilab.cn/weat/), a weighted gene set enrichment algorithm and online tool by weighting genes using essentiality scores. We confirmed the usefulness of WEAT using two case studies, the functional analysis of one aging-related gene list and one gene list involved in Lung Squamous Cell Carcinoma (LUSC). Finally, we believe that the WEAT method and tool could provide more possibilities for further exploring the functions of given gene lists.


Dose-Response ◽  
2020 ◽  
Vol 18 (3) ◽  
pp. 155932582094207
Author(s):  
Yi-qi Jin ◽  
Dong-liu Miao

Background: Epigenetic alterations have been shown to lead to human carcinogenesis. The aim of this study was to perform an integrative analysis to develop an epigenetic signature to predict overall survival (OS) of esophageal cancer. Methods: DNA methylation and messenger RNA expression data of esophageal cancer samples were downloaded from The Cancer Genome Atlas database and were incorporated and analyzed using an R package MethylMix. Functional enrichment analysis of the methylation-related differentially expressed genes (DEGs) was performed. Epigenetic signature and nomogram associated with the OS of esophageal cancer were established by the multivariate Cox model. Results: A total of 71 methylation-related DEGs were identified. Kyoto Encyclopedia of Genes and Genomes pathway analysis revealed that these genes were involved in the biological process related to the initiation and progression of esophageal cancer. Two-gene (FAM24B and FAM200A) risk signature for OS was developed by multivariate Cox analysis, of which had high accuracy. The signature is independent of clinicopathological variables and indicated better predictive power than other clinicopathological variables. Moreover, we developed a novel prognostic nomogram based on risk score and 3 clinicopathological factors. Conclusions: Our study indicated possible methylation-related DEGs and established an epigenetic signature, which may provide novel insights for understanding the pathogenesis of esophageal cancer.


2021 ◽  
Author(s):  
Qinglong Wang ◽  
Zhe Zhao ◽  
Wantao Wang ◽  
Zhipeng Huang ◽  
Wenbo Wang

Abstract Background: Kashin-Beck disease (KBD) is currently an endemic form of osteoarthritis. In this study, we explored novel KBD diagnostic biomarkers.Methods: The GSE59446 dataset was used to conduct Weighted Gene Co-expression Network Analysis (WGCNA) and differentially expressed genes (DEGs) analysis with peripheral blood samples of 100 healthy individuals and 100 KBD patients. As part of the gene ontology pathway enrichment analysis, genes related to SONFH and DEGs were selected from the extraction module. Then, central DEGs were selected for LASSO analysis, and, based on SVM-RFE and DEG results, overlapping genes were identified as key KBD genes. Next, we analyzed the correlations between the selected genes and age, gender, and other factors to eliminate their influences on gene expression. Finally, we evaluated the diagnostic value of key KBD genes using case information collected by us.Results: Seven gene co-expression modules were created using WGCNA. The turquoise module was identified as a KBD key module since it showed the highest correlation to KBD. The functional enrichment analysis revealed that the genes associated with this key module were mainly involved in mitochondrial reactions, protein heterooligomerization, and negatively regulating cysteine-type endopeptidase-dependent apoptotic processes. Additionally, 12 key genes were identified using the LASSO analysis, 5 major genes using SVM-RFE analysis, and 36 DEGs were screened through the "limma" R package. The GLRX5 gene - pivotal in DEGs, LASSO, and SVM-RFE - was further aggregated as the key KBD gene. Correlation analyses confirmed the GLRX5 diagnostic value for KBD and that it was not related to age, gender, and other factors. Finally, data from our patients demonstrated that GLRX5 can be a KBD diagnostic biomarker.Conclusions: We demonstrated that the target gene GLRX5 can be a KBD non-invasive diagnosis biomarker.


2019 ◽  
Vol 47 (W1) ◽  
pp. W191-W198 ◽  
Author(s):  
Uku Raudvere ◽  
Liis Kolberg ◽  
Ivan Kuzmin ◽  
Tambet Arak ◽  
Priit Adler ◽  
...  

Abstract Biological data analysis often deals with lists of genes arising from various studies. The g:Profiler toolset is widely used for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. The mission of g:Profiler is to provide a reliable service based on up-to-date high quality data in a convenient manner across many evidence types, identifier spaces and organisms. g:Profiler relies on Ensembl as a primary data source and follows their quarterly release cycle while updating the other data sources simultaneously. The current update provides a better user experience due to a modern responsive web interface, standardised API and libraries. The results are delivered through an interactive and configurable web design. Results can be downloaded as publication ready visualisations or delimited text files. In the current update we have extended the support to 467 species and strains, including vertebrates, plants, fungi, insects and parasites. By supporting user uploaded custom GMT files, g:Profiler is now capable of analysing data from any organism. All past releases are maintained for reproducibility and transparency. The 2019 update introduces an extensive technical rewrite making the services faster and more flexible. g:Profiler is freely available at https://biit.cs.ut.ee/gprofiler.


2021 ◽  
Vol 22 (5) ◽  
pp. 2738
Author(s):  
Federica Morani ◽  
Luisa Bisceglia ◽  
Giulia Rosini ◽  
Luciano Mutti ◽  
Ombretta Melaiu ◽  
...  

Malignant pleural mesothelioma (MPM) is a fatal tumor lacking effective therapies. The characterization of overexpressed genes could constitute a strategy for identifying drivers of tumor progression as targets for novel therapies. Thus, we performed an integrated gene-expression analysis on RNAseq data of 85 MPM patients from TCGA dataset and reference samples from the GEO. The gene list was further refined by using published studies, a functional enrichment analysis, and the correlation between expression and patients’ overall survival. Three molecular signatures defined by 15 genes were detected. Seven genes were involved in cell adhesion and extracellular matrix organization, with the others in control of the mitotic cell division or apoptosis inhibition. Using Western blot analyses, we found that ADAMTS1, PODXL, CIT, KIF23, MAD2L1, TNNT1, and TRAF2 were overexpressed in a limited number of cell lines. On the other hand, interestingly, CTHRC1, E-selectin, SPARC, UHRF1, PRSS23, BAG2, and MDK were abundantly expressed in over 50% of the six MPM cell lines analyzed. Thus, these proteins are candidates as drivers for sustaining the tumorigenic process. More studies with small-molecule inhibitors or silencing RNAs are fully justified and need to be undertaken to better evaluate the cancer-driving role of the targets herewith identified.


2021 ◽  
Author(s):  
Kaumadi Wijesooriya ◽  
Sameer A Jadaan ◽  
Kaushalya L Perera ◽  
Tanuveer Kaur ◽  
Mark Ziemann

Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. An example analysis of public RNA-seq reveals that these methodological errors alter enrichment results dramatically. To ascertain the frequency of these errors in the literature, we performed a screen of 186 open access research articles describing functional enrichment results. We find that 95% of analyses using over-representation tests did not implement an appropriate background gene list or did not describe this in the methods. Failure to perform p-value correction for multiple tests was identified in 43% of analyses. Many studies lacked detail in the methods section about the tools and gene sets used. Only 15% of studies avoided major flaws, which highlights the poor state of functional enrichment rigour and reporting in the contemporary literature. We provide a set of minimum standards that should act as a checklist for researchers and peer-reviewers.


2021 ◽  
Vol 8 ◽  
Author(s):  
Hongsheng Lin ◽  
Yangyi Xie ◽  
Yinzhi Kong ◽  
Li Yang ◽  
Mingfen Li

Long non-coding RNAs (lncRNAs) as important regulators of gene expression also have critical functions in immune regulation. This study identified lncRNA modulators of immune-related pathways as biomarkers for hepatocellular carcinoma (HCC). The profile of lncRNA regulation in immune pathways in HCC was comprehensively mapped. To determine lncRNAs with immunomodulatory functions specific to HCC, the enrichment of lncRNAs in a collection of 17 immune functions was calculated applying gene set enrichment analysis (GSEA). Unsupervised clustering of samples were performed in the R package ConsensusClusterPlus to analyze subtype survival and immunological characteristics. The enrichment of 3,134 lncRNA–immune pathway pairs in both diseased and normal samples showed a total of 1,984 immunoregulatory functional lncRNAs specific to HCC only. In addition, 18 immune-related lncRNAs were disordered in HCC and were significantly associated with immune cell infiltration. Functional enrichment analysis indicated that the 18 dysregulated immune lncRNAs were enriched in cytokines, cytokine receptors, TGFb family members, TNF family members, and TNF family member receptor pathways. Two molecular subtypes of hepatocellular carcinoma were identified based on 18 dysregulated immune lncRNAs. Immunological profiling showed that subtype 1 samples with higher levels of cytokine response had a better survival, but subtype 2 samples with higher levels of tumor proliferation had poorer survival. This study identified 18 HCC-specific dysregulated immune lncRNAs and two HCC molecular subtypes with significant prognostic differences and immune characteristics. The current findings help understand the function of lncRNAs and promote the identification of immunotherapy targets.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1976 ◽  
Author(s):  
Michael J. Steinbaugh ◽  
Lorena Pantano ◽  
Rory D. Kirchner ◽  
Victor Barrera ◽  
Brad A. Chapman ◽  
...  

RNA-seq analysis involves multiple steps from processing raw sequencing data to identifying, organizing, annotating, and reporting differentially expressed genes. bcbio is an open source, community-maintained framework providing automated and scalable RNA-seq methods for identifying gene abundance counts. We have developed bcbioRNASeq, a Bioconductor package that provides ready-to-render templates and wrapper functions to post-process bcbio output data. bcbioRNASeq automates the generation of high-level RNA-seq reports, including identification of differentially expressed genes, functional enrichment analysis and quality control analysis.


Sign in / Sign up

Export Citation Format

Share Document