Guidelines for reliable and reproducible functional enrichment analysis

Mapping Intimacies ◽

10.1101/2021.09.06.459114 ◽

2021 ◽

Author(s):

Kaumadi Wijesooriya ◽

Sameer A Jadaan ◽

Kaushalya L Perera ◽

Tanuveer Kaur ◽

Mark Ziemann

Keyword(s):

Contemporary Literature ◽

Gene List ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

P Value ◽

Rna Seq ◽

Gene Sets ◽

Methodological Errors ◽

Peer Reviewers

Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. An example analysis of public RNA-seq reveals that these methodological errors alter enrichment results dramatically. To ascertain the frequency of these errors in the literature, we performed a screen of 186 open access research articles describing functional enrichment results. We find that 95% of analyses using over-representation tests did not implement an appropriate background gene list or did not describe this in the methods. Failure to perform p-value correction for multiple tests was identified in 43% of analyses. Many studies lacked detail in the methods section about the tools and gene sets used. Only 15% of studies avoided major flaws, which highlights the poor state of functional enrichment rigour and reporting in the contemporary literature. We provide a set of minimum standards that should act as a checklist for researchers and peer-reviewers.

Download Full-text

gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler

F1000Research ◽

10.12688/f1000research.24956.2 ◽

2020 ◽

Vol 9 ◽

pp. 709 ◽

Cited By ~ 1

Author(s):

Liis Kolberg ◽

Uku Raudvere ◽

Ivan Kuzmin ◽

Jaak Vilo ◽

Hedi Peterson

Keyword(s):

Gene List ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Automated Analysis ◽

R Package ◽

Biological Data ◽

Functional Enrichment ◽

Link Type ◽

Functional Profiling ◽

Rest Api

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.

Download Full-text

Improved cancer biomarkers identification using network-constrained infinite latent feature selection

PLoS ONE ◽

10.1371/journal.pone.0246668 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246668

Author(s):

Lihua Cai ◽

Honglong Wu ◽

Ke Zhou

Keyword(s):

Feature Selection ◽

Expression Profiles ◽

Biological Significance ◽

Feature Selection Method ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Functional Enrichment ◽

Gene Sets ◽

Significant Gene

Identifying biomarkers that are associated with different types of cancer is an important goal in the field of bioinformatics. Different researcher groups have analyzed the expression profiles of many genes and found some certain genetic patterns that can promote the improvement of targeted therapies, but the significance of some genes is still ambiguous. More reliable and effective biomarkers identification methods are then needed to detect candidate cancer-related genes. In this paper, we proposed a novel method that combines the infinite latent feature selection (ILFS) method with the functional interaction (FIs) network to rank the biomarkers. We applied the proposed method to the expression data of five cancer types. The experiments indicated that our network-constrained ILFS (NCILFS) provides an improved prediction of the diagnosis of the samples and locates many more known oncogenes than the original ILFS and some other existing methods. We also performed functional enrichment analysis by inspecting the over-represented gene ontology (GO) biological process (BP) terms and applying the gene set enrichment analysis (GSEA) method on selected biomarkers for each feature selection method. The enrichments analysis reports show that our network-constraint ILFS can produce more biologically significant gene sets than other methods. The results suggest that network-constrained ILFS can identify cancer-related genes with a higher discriminative power and biological significance.

Download Full-text

Graph analytics for phenome-genome associations inference

10.1101/682229 ◽

2019 ◽

Author(s):

Davide Cirillo ◽

Dario Garcia-Gasulla ◽

Ulises Cortés ◽

Alfonso Valencia

Keyword(s):

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Specific Gene ◽

Proteus Syndrome ◽

Supporting Evidence ◽

Human Phenotype ◽

Biological Ontologies ◽

Statistical Framework ◽

Gene Sets

AbstractMotivationBiological ontologies, such as the Human Phenotype Ontology (HPO) and the Gene Ontology (GO), are extensively used in biomedical research to find enrichment in the annotations of specific gene sets. However, the interpretation of the encoded information would greatly benefit from methods that effectively interoperate between multiple ontologies providing molecular details of disease-related features.ResultsIn this work, we present a statistical framework based on graph theory to infer direct associations between HPO and GO terms that do not share co-annotated genes. The method enables to map genotypic features to phenotypic features thus providing a valid tool for bridging functional and pathological annotations. We validated the results by (a) supporting evidence of known drug-target associations (PanDrugs), protein-protein physical and functional interactions (BioGRID and STRING), and common pathways (Reactome); (b) comparing relationships inferred from early ontology releases with knowledge contained in the latest versions.ApplicationsWe applied our method to improve the interpretation of molecular processes involved in pathological conditions, illustrating the applicability of our predictions with a number of biological examples. In particular, we applied our method to expand the list of relevant genes from standard functional enrichment analysis of high-throughput experimental results in the context of comorbidities between Alzheimer’s disease, Lung Cancer and Glioblastoma. Moreover, we analyzed pathways linked to predicted phenotype-genotype associations getting insights into the molecular actors of cellular senescence in Proteus syndrome.Availabilityhttps://github.com/dariogarcia/phenotype-genotype_graph_characterization

Download Full-text

Toward comprehensive functional analysis of gene lists weighted by gene essentiality scores

10.1101/2021.04.26.441450 ◽

2021 ◽

Author(s):

Rui Fan ◽

Qinghua Cui

Keyword(s):

Functional Analysis ◽

Gene List ◽

Lung Squamous Cell Carcinoma ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Hypergeometric Test ◽

Exact Test ◽

Gene Set ◽

Gene Functional Analysis

ABSTRACTGene functional enrichment analysis represents one of the most popular bioinformatics methods for annotating the pathways and function categories of a given gene list. Current algorithms for enrichment computation such as Fisher’s exact test and hypergeometric test totally depend on the category count numbers of the gene list and one gene set. In this case, whatever the genes are, they were treated equally. However, actually genes show different scores in their essentiality in a gene list and in a gene set. It is thus hypothesized that the essentiality scores could be important and should be considered in gene functional analysis. For this purpose, here we proposed WEAT (https://www.cuilab.cn/weat/), a weighted gene set enrichment algorithm and online tool by weighting genes using essentiality scores. We confirmed the usefulness of WEAT using two case studies, the functional analysis of one aging-related gene list and one gene list involved in Lung Squamous Cell Carcinoma (LUSC). Finally, we believe that the WEAT method and tool could provide more possibilities for further exploring the functions of given gene lists.

Download Full-text

Identification of associated molecular events in Helicobacter pylori infection and gastric cancer through integrated computational analysis of Microarray and RNA-Seq datasets

10.21203/rs.3.rs-1189424/v1 ◽

2021 ◽

Author(s):

Sabaoon Zeb ◽

Rehan Zafar Paracha ◽

Maryum Nisar ◽

Rimsha Khalid ◽

Zartasha Mustansar ◽

...

Keyword(s):

Gastric Cancer ◽

Helicobacter Pylori ◽

Cancer Progression ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

World Health ◽

Separate Analysis ◽

Rna Seq ◽

Molecular Events

Abstract According to the World Health Organization, Gastric cancer (GC) is the third leading cause of death worldwide, where, the major precursor of cancer progression is infection with Helicobacter pylori. It has been reported that 50% of the total populace is infected with H.pylori, while in 80% the ulcer emerges in later stages of the infection. Although extensive separate analysis has been performed on H.pylori infection and GC data, however, there is a need to perform comparative analysis to identify the cross-talk between the conditions and to hunt significant molecular events that occurs during H.pylori induced GC. The aim of this multi-population study was to identify common molecular events and potential bio-markers against H.pylori induced GC. We performed microarray and RNA-seq analysis on publicly available H.pylori infection, gastritis, H.pylori induced GC and GC datasets to obtain Differentially Expressed Genes (DEGs). After obtaining the DEGs, integrative analysis, functional enrichment analysis and network biology approaches were utilized to identify common markers and hub genes between various disease conditions. Functional enrichment analysis revealed the DEGs of H.pylori infection, gastritis, H.pylori induced GC and GC were strongly associated with spliceosome, adherens junction, focal adhesion and ribosome. Being one of the common DEG, and highly interactive hub protein in the networks of all the conditions, translationally controlled tumour protein (TPT1) was identified as a significant predictive biomarker for early prognosis and diagnosis of H.pylori induced GC. Therefore, the mechanisms behind TPT1 should be further studied using in vitro cell-based functional assays, to determine its role in the progression of H.pylori induced GC.

Download Full-text

Integrated In-silico Analysis to Study the Role of microRNAs in the Detection of Chronic Kidney Diseases

Current Bioinformatics ◽

10.2174/1574893614666190923115032 ◽

2020 ◽

Vol 15 (2) ◽

pp. 144-154

Author(s):

Amina Khan ◽

Andleeb Zahra ◽

Sana Mumtaz ◽

M. Qaiser Fatmi ◽

Muhammad J. Khan

Keyword(s):

Target Genes ◽

Kidney Development ◽

Kidney Diseases ◽

In Silico Analysis ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Renal Diseases ◽

P Value ◽

Chronic Kidney Diseases

Background: MicroRNAs (miRNAs) play an important role in the pathogenesis of various renal diseases, including Chronic Kidney Diseases (CKD). CKD refers to the gradual loss of kidney function with the declining Glomerular Functional Rate (GFR). Objective: This study focused on the regulatory mechanism of miRNA to control gene expression in CKD. Methods: In this context, two lists of Differentially Expressed Genes (DEGs) were obtained; one from the three selected experiments by setting a cutoff p-value of <0.05 (List A), and one from a list of target genes of miRNAs (List B). Both lists were then compared to get a common dataset of 33 miRNAs, each had a set of DEGs i.e. both up-regulated and down-regulated genes (List C). These data were subjected to functional enrichment analysis, network illustration, and gene homology studies. Results: This study confirmed the active participation of various miRNAs i.e. hsa -miR-15a-5p, hsa-miR-195-5p, hsa-miR-365-3p, hsa-miR-30a-5p, hsa-miR-124-3p, hsa-miR-200b-3p, and hsamiR- 429 in the dysregulation of genes involved in kidney development and function. Integrated analyses depicted that miRNAs modulated renal development, homeostasis, various metabolic processes, immune responses, and ion transport activities. Furthermore, homology studies of miRNA-mRNA hybrid highlighted the effect of partial complementary binding pattern on the regulation of genes by miRNA. Conclusion: The study highlighted the great values of miRNAs as biomarkers in kidney diseases. In addition, the need for further investigations on miRNA-based studies is also commended in the development of diagnostic, prognostic, and therapeutic tools for renal diseases.

Download Full-text

A Novel Three-miRNA Signature Identified Using Bioinformatics Predicts Survival in Esophageal Carcinoma

BioMed Research International ◽

10.1155/2020/5973082 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

KunZhe Wu ◽

ChunDong Zhang ◽

Cheng Zhang ◽

DongQiu Dai

Keyword(s):

Growth Factor ◽

Esophageal Carcinoma ◽

Target Genes ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Rna Seq ◽

Mirna Signature ◽

Cox Regression Analysis ◽

Ppi Networks

Objective. We identified differentially expressed microRNAs (DEMs) between esophageal carcinoma (ESCA) tissues and normal esophageal tissues. We then constructed a novel three-miRNA signature to predict the prognosis of ESCA patients using bioinformatics analysis. Materials and Methods. We combined two microarray profiling datasets from the Gene Expression Omnibus (GEO) database and RNA-seq datasets from the Cancer Genome Atlas (TCGA) database to analyze DEMs in ESCA. The clinical data from 168 ESCA patients were selected from the TCGA database to assess the prognostic role of the DEMs. The TargetScan, miRDB, miRWalk, and DIANA websites were used to predict the miRNA target genes. Functional enrichment analysis was conducted using the Database for Annotation, Visualization, and Integrated Discovery (David), and protein-protein interaction (PPI) networks were obtained using the Search Tool for the Retrieval of Interacting Genes database (STRING). Results. With cut-off criteria of P<0.05 and |log2FC| > 1.0, 33 overlapping DEMs, including 27 upregulated and 6 downregulated miRNAs, were identified from GEO microarray datasets and TCGA RNA-seq count datasets. The Kaplan–Meier survival analysis indicated that a three-miRNA signature (miR-1301-3p, miR-431-5p, and miR-769-5p) was significantly associated with the overall survival of ESCA patients. The results of univariate and multivariate Cox regression analysis showed that the three-miRNA signature was a potential prognostic factor in ESCA. Furthermore, the gene functional enrichment analysis revealed that the target genes of the three miRNAs participate in various cancer-related pathways, including viral carcinogenesis, forkhead box O (FoxO), vascular endothelial growth factor (VEGF), human epidermal growth factor receptor 2 (ErbB2), and mammalian target of rapamycin (mTOR) signaling pathways. In the PPI network, three target genes (MAPK1, RB1, and CLTC) with a high degree of connectivity were selected as hub genes. Conclusions. Our results revealed that a three-miRNA signature (miR-1301-3p, miR-431-5p, and miR-769-5p) is a potential novel prognostic biomarker for ESCA.

Download Full-text

Neutrophil-mediated immune response as a possible mechanism of acute unilateral vestibulopathy

Journal of Vestibular Research ◽

10.3233/ves-200044 ◽

2020 ◽

Vol 30 (6) ◽

pp. 363-374

Author(s):

Eun Hye Oh ◽

Je-Keun Rhee ◽

Jin-Hong Shin ◽

Jae Wook Cho ◽

Dae-Seong Kim ◽

...

Keyword(s):

Mononuclear Cells ◽

Complete Blood Count ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

P Value ◽

Ppi Network ◽

Immune Pathway ◽

Unilateral Vestibulopathy ◽

Acute Unilateral Vestibulopathy

OBJECTIVE: This study aimed to investigate the underlying pathogenesis of acute unilateral vestibulopathy (AUV) using gene expression profiling combined with bioinformatics analysis. METHODS: Total RNA was extracted from the peripheral blood mononuclear cells of ten AUV patients in the acute phase and from ten controls. The differentially expressed genes (DEGs) between these two groups were screened using microarray analysis with the cut-off criteria (|fold changes| > 1.5 and p-value < 0.05). Functional enrichment analysis of DEGs was performed using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis, and the protein-protein interaction (PPI) network was constructed using the STRING (Search Tool for the Retrieval of Interacting Genes) database. RESULTS: There were 57 DEGs (50 up-regulated and 7 down-regulated) identified in the AUV group. Functional enrichment analysis showed that most of the up-regulated DEGs were significantly enriched in terms related to the neutrophil-mediated immune pathway. From the PPI network, the top ten hub genes were extracted by calculating four topological properties, and most of them were related to the innate immune system, inflammatory processes and vascular disorders. The complete blood count tests showed that the neutrophil-to-lymphocyte ratio was significantly higher in the 72 AUV patients than in the age-matched controls (2.93±2.25 vs 1.54±0.61, p < 0.001). CONCLUSIONS: This study showed that the neutrophil-mediated immune pathway may contribute to the development of AUV by mediating inflammatory and thrombotic changes in the vestibular organ.

Download Full-text

Genome-wide phylogenetic analysis, expression pattern, and transcriptional regulatory network of the pig C/EBP gene family

10.21203/rs.3.rs-96827/v1 ◽

2020 ◽

Author(s):

Chaoxin Zhang ◽

Tao Wang ◽

Shengwei Liu ◽

Bing Zhang ◽

Xue Li ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Regulatory Network ◽

Target Genes ◽

Transcriptional Regulatory Network ◽

Expression Patterns ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Rna Seq ◽

Group I

Abstract Background: The vertebrate C/EBP transcription factors regulate many important biological processes, such as cell proliferation, differentiation, signal transduction, inflammation, and energy metabolism. The first C/EBP protein was identified in rat liver nuclei. Development of sequencing technology resulted in identification of the C/EBP genes in various species. In this study, a bioinformatics approach was used to determine the distribution of the members of the C/EBP family in vertebrates. A phylogenetic tree was constructed to analyze the C/EBP genes in vertebrates. Based on RNA-seq data, the expression patterns of pig C/EBP members in various tissues were analyzed. In addition, a gene transcription regulatory network was constructed with pig C/EBP members as the core.Results: We identified a total of 92 C/EBP genes in 17 vertebrate genomes. Phylogenetic analysis showed that all C/EBP TFs were classified into two groups; group I contained C/EBPβ TFs, and group II contained the remaining C/EBP TFs. The C/EBPα, C/EBPβ, C/EBPδ, C/EBPγ, and C/EBPζ genes were expressed ubiquitously with inconsistent expression patterns in various tissues. Moreover, a pig C/EBP regulatory network was constructed, including C/EBP genes, TFs, and miRNAs. A total of 39 FFL motifs were detected in the pig C/EBP regulatory network. Based on the RNA-seq data, gene expression patterns related to this FFL sub-network were analyzed in 27 adult Duroc tissues. Certain FFL motifs may be tissue specific. Functional enrichment analysis indicated that C/EBP and its target genes are involved in many important biological pathways. Conclusions: These results provide valuable information that clarifies the evolutionary relationships of the C/EBP family and contributes to the understanding of the biological function of C/EBP genes.

Download Full-text

Integrated Analysis of lncRNA and mRNA Reveals Novel Insights into Wool Bending in Zhongwei Goat

Animals ◽

10.3390/ani11113326 ◽

2021 ◽

Vol 11 (11) ◽

pp. 3326

Author(s):

Xiaobo Li ◽

Zhanfa Liu ◽

Shaohui Ye ◽

Yue Liu ◽

Qian Chen ◽

...

Keyword(s):

Signaling Pathway ◽

Target Genes ◽

Economic Value ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Hair Follicles ◽

Functional Enrichment ◽

Integrated Analysis ◽

Rna Seq ◽

Camp Signaling Pathway

Chinese Zhongwei goat is a rare and precious fur breed as its lamb fur is a well-known fur product. Wool bending of lamb fur of the Zhongwei goat is its most striking feature. However, the curvature of the wool decreases gradually with growth, which significantly affects its quality and economic value. The mechanism regulating the phenotypic changes of hair bending is still unclear. In the present study, the skin tissues of Zhongwei goats at 45 days (curving wool) and 108 days (slight-curving wool) after birth were taken as the research objects, and the expression profiling of long non-coding RNAs (lncRNAs) and mRNAs were analyzed based on the Ribo Zero RNA sequencing (RNA-seq) method. In total, 46,013 mRNAs and 13,549 lncRNAs were identified, of which 352 were differentially expressed mRNAs and 60 were. lncRNAs. Functional enrichment analysis of the target genes of lncRNAs were mainly enriched in PI3K-Akt, Arachidonic acid metabolic, cAMP, Wnt, and other signaling pathways. The qRT-PCR results of eight selected lncRNAs and target genes were consistent with the sequencing result, which indicated our data were reliable. Through the analysis of the weighted gene co-expression network, 13 co-expression modules were identified. The turquoise module contained a large number of differential expressed lncRNAs, which were mainly enriched in the PI3K-Akt signaling pathway and cAMP signaling pathway. The predicted LOC102172600 and LOC102191729 might affect the development of hair follicles and the curvature of wool by regulating the target genes. Our study provides novel insights into the potential roles of lncRNAs in the regulation of wool bending. In addition, the study offers a theoretical basis for further study of goat wool growth, so as to be a guidance and reference for breeding and improvement in the future.

Download Full-text