go annotation
Recently Published Documents





2022 ◽  
Vol 23 (1) ◽  
Elena Rojano ◽  
Fernando M. Jabato ◽  
James R. Perkins ◽  
José Córdoba-Caballero ◽  
Federico García-Criado ◽  

Abstract Background Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. Results We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer’s method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of $$F_{max}$$ F max and $$S_{min}$$ S min We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer’s method led to the top performance in almost all scenarios. Conclusions DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer’s method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun. Code maintained at https://github.com/ElenaRojano/DomFun. Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project.

2022 ◽  
Vol 12 ◽  
Lu Xia ◽  
Lu Liu ◽  
Qiang Wang ◽  
Jing Ding ◽  
Xin Wang

PurposeThis study aimed to analyse the correlation between the pyroptosis pathway and epilepsy using bioinformatics analysis technology. We analyzed the expression of gasdermin D (GSDMD) and gasdermin E (GSDME), the key molecules of pyroptosis, in kainic acid-induced epileptic mice.MethodsWeighted gene co-expression network analysis (WGCNA) was used to construct a signed co-expression network from expression data to screen gene sets closely related to epilepsy. The correlation between the module and epilepsy was verified through module conservative analysis, gene ontology (GO) annotation analysis, and correlation analysis with known epilepsy genes. We obtained currently recognized pyroptosis-related molecules through literature review, and correlation analysis was used to evaluate their correlation with epilepsy. Differentially expressed gene (DEG) analysis was used to analyse expression changes of pyroptosis-related molecules at the transcriptome level, compared to the sham group. We subsequently established a kainic acid-induced status epilepticus (SE) model in mice and validated the mRNA and protein expression of GSDMD and GSDME, the key molecules of pyroptosis, by quantitative reverse transcription PCR (qRT-PCR) and western blotting (WB).ResultsUsing WGCNA, module conservative analysis, and correlation analysis with known epilepsy genes, we screened out a module (a gene set of interest) closely related to epilepsy that was prominently enriched in immune and inflammatory-related biological processes. Correlation analysis results suggest that pyroptosis-related molecules are closely related to this module, but have no obvious correlation with others. DEG analysis of molecules associated with pyroptosis suggests that most of the pyroptosis-related molecules had significantly increased expression after SE, such as IL1b, Casp1, Casp4, Pycard, Gsdmd, Nlrp3, Aim2, Mefv, Tlr2, Tlr3, and Tlr4. qRT-PCR and WB analysis confirmed that the mRNA and protein levels of GSDMD in the mouse hippocampus were significantly upregulated after SE. The mRNA expression of GSDME was not different between the epilepsy group and sham group. However, the WB results showed that the expression of full-length GSDME was decreased and GSDME-N-terminus were significantly increased after SE.ConclusionsOur study highlights that the pyroptosis pathway may be closely related to epilepsy. GSDMD and GSDME, the key executive molecules of pyroptosis, will help to understand the pathogenesis of epilepsy and aid in discovering new targets for anti-epileptic drug treatments.

Renjun Mao ◽  
Zhenqing Bai ◽  
Jiawen Wu ◽  
Ruilian Han ◽  
Xuemin Zhang ◽  

Senna obtusifolia is a famous medicinal plant that is widely used in Asian countries. Its seed plays an important role in the treatment of many diseases because it contains various anthraquinones and flavonoids. Our previous studies have indicated that three space environment-induced S. obtusifolia lines (SP-lines) i.e., QC10, QC29, and QC46, have higher seed yield and aurantio-obtusin (AO) content. However, the underlying mechanism of higher AO content in SP-lines is still unknown. Herein, transcriptome sequencing and HPLC were employed to analyze the differences between SP-lines and ground control (GC3) and elucidate the regulatory mechanisms of AO accumulation in SP-lines. The results show that 4002 differentially expressed genes (DEGs) were identified in SP-lines versus (vs.) GC3. DEGs in the QC10 vs. GC3, QC29 vs. GC3, and QC46 vs. GC3 comparisons were classified into 28, 36, and 81 GO terms and involved in 63, 74, and 107 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. KEGG pathway and gene expression analysis revealed that DEGs involved in anthraquinone pathways were significantly elevated in QC10 and QC46. Integrating the results of GO annotation, KEGG enrichment, and gene expression analysis, we propose that the elevated genes such as DAHPS, DHQS, and MenB enhance the metabolic flux in the anthraquinone pathway and promote AO content in QC10 and QC46. Taken together, this study elucidated the mechanism of AO content in SP-lines and provides valuable genetic information for S. obtusifolia. In addition, to the best of our knowledge, this study presents the first transcriptome analysis of environment-induced medicinal plants and paves the way to select elite S. obtusifolia varieties in the future.

2021 ◽  
Jörgen Östling ◽  
Marleen Van Geest ◽  
Henric K Olsson ◽  
Sven-Erik Dahlen ◽  
Emilia Viklund ◽  

Abstract BackgroundThere is a lack of early and precise biomarkers for personalized respiratory medicine. Breath contains an aerosol of droplet particles, which are formed from the epithelial lining fluid when the small airways close and re-open during inhalation succeeding a full expiration. These particles can be collected by impaction using the PExA® method (Particles in Exhaled Air), and are derived from an area of high clinical interest previously difficult to access, making them a potential source of biomarkers reflecting pathological processes in the small airways.Research questionOur aim was to investigate if PExA method is useful for discovery of biomarkers that reflect pathology of small airways.Methods and analysis10 healthy controls and 20 subjects with asthma, of whom 10 with small airway involvement as indicated by a high lung clearance index (LCI ≥2.9 z-score), were examined in a cross-sectional design, using the PExA instrument. The samples were analysed with the SOMAscan proteomics platform (SomaLogic Inc). ResultsTwo hundred-seven proteins were detected in up to 80% of the samples. Nine proteins showed differential abundance in subjects with asthma and high LCI as compared to healthy controls. Two of these were less abundant (ALDOA4, C4), and seven more abundant (FIGF, SERPINA1, CD93, CCL18, F10, IgM, IL1RAP). sRAGE levels were lower in ex-smokers (n=14) than in never smokers (n=16). Gene Ontology (GO) annotation database analyses revealed that the PEx proteome is enriched in extracellular proteins associated with extracellular exosome-vesicles and innate immunity.ConclusionThe applied analytical method was reproducible and allowed identification of pathologically interesting proteins in PEx samples from asthmatic subjects with high LCI. The results suggest that PEx based proteomics is a novel and promising approach to study respiratory diseases with small airway involvement.

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Shangdu Zhang ◽  
Zhenliang Luo ◽  
Xiang Wu ◽  
Bangzhi Shi ◽  
Huidan Jiang ◽  

Cadmium (Cd) pollution in paddy soil is an increasingly serious issue in rice production. It has been reported that there is a higher or lower grain Cd accumulation in the rice cultivars Yuzhenxiang (YZX) or Xiangwanxian 12 (XWX), respectively. To better manage the Cd pollution problem, the genes that might play vital roles in governing the difference in root Cd responses between these two rice cultivars were examined. In this study, the results of RNA sequencing (RNA-seq) showed that there were 341 and 161 differentially expressed genes in the roots of YZX and XWX after Cd exposure, respectively. Among these genes, 7 genes, such as Os06g0196300 (OsJ_019618), Os07g0570700 (OsJ_24808), ADI1, GDCSH, HSFB2C, PEX11-4, and CLPB1, possessed higher degree nodes with each other, through interaction analysis by the STRING (search tool for the retrieval of interacting genes/proteins) software, suggesting that they might play vital roles in Cd response. Based on GO enrichment analysis, 41 differently expressed genes after Cd treatment in YZX or XWX were identified to be related to Cd response. Through comparative transcriptomic analysis, 257 genes might be associated with the root Cd response difference between YZX and XWX. Furthermore, we supposed that ADI1, CFBP1, PEX11-4, OsJ_019618, OsJ_24808, GDCSH, CLPB1, LAC6, and WNK3 might be implicated in Cd response based on the combined analysis of RT-qPCR, interaction, and GO annotation analysis. In conclusion, the numerous genes that might be related to Cd stress response and root Cd response difference between YZX and XWX at the booting stage may be of benefit for the development of rice varieties with low Cd consumption.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Wei Li ◽  
Jian Li ◽  
Yong Yang

Background. Based on the latest research of WHO, it has been revealed that more than 15 million people suffer from stroke every year worldwide. Of these 15 million people, 6 million succumb to death, and 5 million get permanently disabled. This is the prime reason for the substantial economic burden on all parts of the world. Methods. These data have been obtained from the GEO database, and the GEO2R tool was used to find out the differentially expressed miRNAs (DEMs) between the stroke and normal patients’ blood. FunRich and miRNet were considered to find potential upstream transcription factors and downstream target genes of candidate EMRs. Next, we use GO annotation and KEGG pathway enrichment. Target genes were analyzed with the help of the R software. Then, the STRING database and Cytoscape software were used to conduct PPI and DEM-hub gene networks. Finally, GSE58294 was used to estimate the hub gene expressions. Results. Six DEMs in total were selected out from GSE95204 and GSE117064 datasets. 663 DEMs’ target genes were predicted, and NRF1, EGR1, MYC, YY1, E2F1, SP4, and SP1 were predicted as an upstream transcription factor for DEMs’ target genes. Target genes of DEMs were primarily augmented in the PI3K-Akt signaling pathway and p53 signaling pathway. The network construction of DEM hygiene is potentially modulated by hsa-miR-3591-5p, hsa-miR-548as-3p, hsa-miR-206, and hsa-miR-4503 hub genes which were found among the top 10 of the hub genes. Among the top 10 hub genes, justification of CTNNB1, PTEN, ESR1, CCND1, KRAS, AKT1, CCND2, CDKN1B, and MYCN was constant with that in the GSE58294 dataset. Conclusion. In summary, our research first constructs the miRNA-mRNA network in stroke, which probably renders an awakening purview into the pathogenesis and cure of stroke.

2021 ◽  
Vol In Press (In Press) ◽  
Keliang Li ◽  
Yan Jiao ◽  
Jingjing Liang ◽  
Pingping Pan ◽  
Yanji Zhu ◽  

Background: The current study was done to identify key genes associated with Kawasaki disease (KD). Methods: Three datasets were collected from Gene Expression Omnibus (GEO) database. Then, differentially expressed genes (DEGs) analysis, gene ontology (GO) annotation, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment were performed. Quantitative real-time polymerase chain reaction (qRT-PCR) was used to determine the expression levels of DEGs in KD. Receiver operating characteristic (ROC) analysis was performed to assess the diagnostic value of DEGs. Results: In total, 2923 DEGs (1239 up- and 1684 down-regulated genes) were detected in KD. Ribosome, Leishmaniasis, and Tuberculosis significantly enriched KEGG pathways of DEGs. Six DEGs, including ADM, S100A12, ZNF438, MYD88, FCGR2A, and FCGR3B, were selected for qRT-PCR validation. Except for MYD88, the qRT-PCR results displayed similar expression patterns with that in our integrated analysis. ROC analysis revealed the diagnostic value of the six DEGs. Conclusions: Our study was expected to provide clues toward understanding the pathophysiology of KD inflammation.

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Kang Zhao ◽  
Jucun Huang ◽  
Hongmei Xia ◽  
Jianjun Zhang ◽  
Liming Liu

Background. Hepatic fibrosis is a severe liver disease that has threatened human health for a long time. In order to undergo timely and adequate therapy, it is important for patients to obtain an accurate diagnosis of fibrosis. Laboratory inspection methods have been efficient in distinguishing between advanced hepatic fibrosis stages (F3, F4), but the identification of early stages of fibrosis has not been achieved. The development of proteomics may provide us with a new direction to identify the stages of fibrosis. Methods. We established serum proteomic maps for patients with hepatic fibrosis at different stages and identified differential expression of proteins between fibrosis stages through ultra-high-performance liquid chromatography tandem mass spectrometry proteomic analysis. Results. From the proteomic profiles of the serum of patients with different stages of liver fibrosis, a total of 1,338 proteins were identified. Among three early fibrosis stages (control, F1, and F2), 55 differential proteins were identified, but no proteins simultaneously exhibited differential expression between control, F1, and F2. Differential proteins were detected in the comparison between different fibrosis stages. Significant differences were found between advanced fibrosis stages (F2-vs.-F3 and F3-vs.-F4) through a series of statistical analysis, including hierarchical clustering, Gene Ontology (GO) functional annotation, Kyoto Encyclopedia of Genes and Genomes pathway, and protein-protein interaction network analysis. The differential proteins identified by GO annotation were associated with biological processes (mainly platelet degranulation and cell adhesion), molecular functions, and cellular components. Conclusions. All potential biomarkers identified between the stages of fibrosis could be key points in determining the fibrosis staging. The differences between early stages may provide a useful reference in addressing the challenge of early fibrosis staging.

2021 ◽  
Yi-Heng Zhu ◽  
Chengxin Zhang ◽  
Yan Liu ◽  
Gilbert S Omenn ◽  
Peter L Freddolino ◽  

Gene Ontology (GO) has been widely used to annotate functions of genes and gene products. We proposed a new method (TripletGO) to deduce GO terms of protein-coding and non-coding genes, through the integration of four complementary pipelines built on transcript expression profiling, genetic sequence alignment, protein sequence alignment and naive probability, respectively. TripletGO was tested on a large set of 5,754 genes from 8 species (human, mouse, arabidopsis, rat, fly, budding yeast, fission yeast, and nematoda) and 2,433 proteins with available expression data from the CAFA3 experiment and achieved function annotation accuracy significantly beyond the current state-of-the-art approaches. Detailed analyses show that the major advantage of TripletGO lies in the coupling of a new triplet-network based profiling method with the feature space mapping technique which can accurately recognize function patterns from transcript expressions. Meanwhile, the combination of multiple complementary models, especially those from transcript expression and protein-level alignments, improves the coverage and accuracy of the final GO annotation results. The standalone package and an online server of TripletGO are freely available at https://zhanglab.ccmb.med.umich.edu/TripletGO/.

2021 ◽  
Vol 22 (1) ◽  
Jiyu Chen ◽  
Nicholas Geard ◽  
Justin Zobel ◽  
Karin Verspoor

Abstract Background Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annotations and the associated evidence texts identified by expert curators is reliable but time-consuming, and is infeasible in the context of rapidly growing biological literature. A key challenge is maintaining consistency of existing GO annotations as new studies are published and the GO vocabulary is updated. Results In this work, we introduce a formalisation of biological database annotation inconsistencies, identifying four distinct types of inconsistency. We propose a novel and efficient method using state-of-the-art text mining models to automatically distinguish between consistent GO annotation and the different types of inconsistent GO annotation. We evaluate this method using a synthetic dataset generated by directed manipulation of instances in an existing corpus, BC4GO. We provide detailed error analysis for demonstrating that the method achieves high precision on more confident predictions. Conclusions Two models built using our method for distinct annotation consistency identification tasks achieved high precision and were robust to updates in the GO vocabulary. Our approach demonstrates clear value for human-in-the-loop curation scenarios.

Sign in / Sign up

Export Citation Format

Share Document