Hands-on on Protein Function Prediction with Machine Learning and Interactive Analytics

DeepGOZero: Improving protein function prediction from sequence and zero-shot learning based on ontology axioms

10.1101/2022.01.14.476325 ◽

2022 ◽

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction ◽

Training Data ◽

Large Set ◽

Theoretic Approach ◽

Machine Learning Model ◽

Protein Functions

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero

Download Full-text

Human Protein Function Prediction Enhancement Using Decision Tree Based Machine Learning Approach

Communications in Computer and Information Science - Information, Communication and Computing Technology ◽

10.1007/978-981-15-1384-8_23 ◽

2019 ◽

pp. 279-293

Author(s):

Sunny Sharma ◽

Gurvinder Singh ◽

Rajinder Singh

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Human Protein ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

10.20944/preprints201711.0160.v1 ◽

2017 ◽

Author(s):

Chunyan Yu ◽

Xiaoxu Li ◽

Hong Yang ◽

Yinghong Li ◽

Weiwei Xue ◽

...

Keyword(s):

Machine Learning ◽

False Discovery Rate ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Machine Learning Algorithms ◽

Identification Accuracy ◽

Homologous Proteins ◽

Prediction Algorithms ◽

False Discovery

The knowledge of protein function is essential for the study of biological processes, the understanding of disease mechanism and the exploration of novel therapeutic target. Apart from experimental methods, a number of in-silico approaches have been developed and extensively used for protein function prediction. Among these approaches, BLAST predicts functions based on protein sequence similarity, and machine learning predicts functional families from protein sequences irrespective of their similarity, which complements BLAST and other methods in predicting diverse classes of proteins including distantly related proteins and homologous proteins of different functions. However, their identification accuracies and the false discovery rate have not yet been assessed so far, which greatly limits the usage of these prediction algorithms. Herein, a comprehensive comparison of the performances among four popular functional prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these algorithms were systematically assessed by four metrics (sensitivity, specificity, accuracy and Matthews correlation coefficient) based on the independent test datasets generated from 93 protein families defined by UniProtKB Keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model species (homo sapiens, arabidopsis thaliana, saccharomyces cerevisiae and mycobacterium tuberculosis). As a result, the substantially higher sensitivity and stability of BLAST and SVM were observed compared with that of PNN and KNN. But the machine learning algorithms (PNN, KNN and SVM) were found capable of significantly reducing the false discovery rate (SVM < PNN ≈ KNN). In summary, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.

Download Full-text

Machine learning techniques for protein function prediction

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.25832 ◽

2019 ◽

Vol 88 (3) ◽

pp. 397-413 ◽

Cited By ~ 9

Author(s):

Rosalin Bonetta ◽

Gianluca Valentino

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Machine Learning Kernel Methods for Protein Function Prediction

2019 International Conference on Smart Systems and Inventive Technology (ICSSIT) ◽

10.1109/icssit46314.2019.8987852 ◽

2019 ◽

Author(s):

Anjna Jayant Deen ◽

Manasi Gyanchandani

Keyword(s):

Machine Learning ◽

Kernel Methods ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

Machine Learning Methods for Protein Function Prediction

Algorithms for Intelligent Systems - Proceedings of International Conference on Computational Intelligence and Emerging Power System ◽

10.1007/978-981-16-4103-9_8 ◽

2021 ◽

pp. 85-98

Author(s):

Saurabh Biswas ◽

Yasha Hasija

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Machine Learning Configurations for Enhanced Human Protein Function Prediction Accuracy

Smart Computational Strategies: Theoretical and Practical Aspects ◽

10.1007/978-981-13-6295-8_4 ◽

2019 ◽

pp. 37-47

Author(s):

Amritpal Singh ◽

Sunny Sharma ◽

Gurvinder Singh ◽

Rajinder Singh

Keyword(s):

Machine Learning ◽

Protein Function ◽

Prediction Accuracy ◽

Protein Function Prediction ◽

Function Prediction ◽

Human Protein

Download Full-text

Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach

Lecture Notes in Computer Science - Algorithms in Bioinformatics ◽

10.1007/978-3-540-87361-7_17 ◽

2008 ◽

pp. 198-209 ◽

Cited By ~ 2

Author(s):

Thomas Lingner ◽

Peter Meinicke

Keyword(s):

Machine Learning ◽

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Function Prediction ◽

Learning Approach ◽

Machine Learning Approach ◽

Target Set

Download Full-text

A Review of Protein Function Prediction Under Machine Learning Perspective

Recent Patents on Biotechnology ◽

10.2174/18722083113079990006 ◽

2013 ◽

Vol 7 (2) ◽

pp. 122-141 ◽

Cited By ~ 23

Author(s):

Juliana Bernardes ◽

Carlos Pedreira

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

Predicting Human Protein Function with Multi-task Deep Neural Networks

10.1101/256420 ◽

2018 ◽

Cited By ~ 1

Author(s):

Rui Fa ◽

Domenico Cozzetto ◽

Cen Wan ◽

David T. Jones

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Protein Function ◽

Deep Neural Networks ◽

Protein Function Prediction ◽

Function Prediction ◽

Machine Learning Algorithms ◽

Medium Size ◽

Prediction Ability

AbstractMachine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.

Download Full-text