Hierarchical multi-label classification for protein function prediction: A local approach based on neural networks

Author(s):  
Ricardo Cerri ◽  
Rodrigo C. Barros ◽  
Andre C. P. L. F. de Carvalho
2018 ◽  
Author(s):  
Cen Wan ◽  
Domenico Cozzetto ◽  
Rui Fa ◽  
David T. Jones

Protein-protein interaction network data provides valuable information that infers direct links between genes and their biological roles. This information brings a fundamental hypothesis for protein function prediction that interacting proteins tend to have similar functions. With the help of recently-developed network embedding feature generation methods and deep maxout neural networks, it is possible to extract functional representations that encode direct links between protein-protein interactions information and protein function. Our novel method, STRING2GO, successfully adopts deep maxout neural networks to learn functional representations simultaneously encoding both protein-protein interactions and functional predictive information. The experimental results show that STRING2GO outperforms other network embedding-based prediction methods and one benchmark method adopted in a recent large scale protein function prediction competition.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Ahmet Sureyya Rifaioglu ◽  
Tunca Doğan ◽  
Maria Jesus Martin ◽  
Rengul Cetin-Atalay ◽  
Volkan Atalay

2018 ◽  
Author(s):  
Rui Fa ◽  
Domenico Cozzetto ◽  
Cen Wan ◽  
David T. Jones

AbstractMachine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.


Molecules ◽  
2017 ◽  
Vol 22 (10) ◽  
pp. 1732 ◽  
Author(s):  
Renzhi Cao ◽  
Colton Freitas ◽  
Leong Chan ◽  
Miao Sun ◽  
Haiqing Jiang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document