scholarly journals Assigning protein function from domain-function associations using DomFun

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Elena Rojano ◽  
Fernando M. Jabato ◽  
James R. Perkins ◽  
José Córdoba-Caballero ◽  
Federico García-Criado ◽  
...  

Abstract Background Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. Results We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer’s method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of $$F_{max}$$ F max and $$S_{min}$$ S min We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer’s method led to the top performance in almost all scenarios. Conclusions DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer’s method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun. Code maintained at https://github.com/ElenaRojano/DomFun. Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project.

2020 ◽  
Author(s):  
Elena Rojano ◽  
Fernando Moreno Jabato ◽  
James Richard Perkins ◽  
José Córdoba Caballero ◽  
Ian Sillitoe ◽  
...  

Abstract Background: Protein function prediction remains a key challenge. Domain composition is key to understanding protein function, and domain-based prediction methods consistently perform well in challenges such as CAFA. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. Results: We analysed 14 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the Gene Ontology, KEGG, Reactome and the Human Phenotype Ontology. We validated the results using the CAFA 2 benchmark platform for GO and HPO annotation, finding Simpson's index combined with Stouffer's method led to the best performance in almost all scenarios. We also found that FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. Results were similar to other high-performing domain-based methods in CAFA 2. We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 2 for GO, moreover we found good performance for the other annotation sources. As with CAFA 2, Simpson's index with Stouffer's method led to the top performance in most scenarios. Conclusions: DomFun shows comparable performance to other methods evaluated in CAFA 2 when predicting human proteins function with GO. Through our own benchmark procedure, PPP we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson's index derived domain-function associations combined using Stouffer's method. The tool has been implemented so that it could be easily adapted to incorporate other protein features, such as domain data from other sources. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun and its code is available at https://github.com/ElenaRojano/DomFun .


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12019
Author(s):  
Thi Thuy Duong Vu ◽  
Jaehee Jung

Protein function prediction is a crucial part of genome annotation. Prediction methods have recently witnessed rapid development, owing to the emergence of high-throughput sequencing technologies. Among the available databases for identifying protein function terms, Gene Ontology (GO) is an important resource that describes the functional properties of proteins. Researchers are employing various approaches to efficiently predict the GO terms. Meanwhile, deep learning, a fast-evolving discipline in data-driven approach, exhibits impressive potential with respect to assigning GO terms to amino acid sequences. Herein, we reviewed the currently available computational GO annotation methods for proteins, ranging from conventional to deep learning approach. Further, we selected some suitable predictors from among the reviewed tools and conducted a mini comparison of their performance using a worldwide challenge dataset. Finally, we discussed the remaining major challenges in the field, and emphasized the future directions for protein function prediction with GO.


Molecules ◽  
2017 ◽  
Vol 22 (10) ◽  
pp. 1732 ◽  
Author(s):  
Renzhi Cao ◽  
Colton Freitas ◽  
Leong Chan ◽  
Miao Sun ◽  
Haiqing Jiang ◽  
...  

2008 ◽  
Vol 9 (1) ◽  
pp. 350 ◽  
Author(s):  
Xiaoyu Jiang ◽  
Naoki Nariai ◽  
Martin Steffen ◽  
Simon Kasif ◽  
Eric D Kolaczyk

Sign in / Sign up

Export Citation Format

Share Document