3P282 FCANAL : a structure based protein function prediction method. Application to enzyme active sites and metal binding sites

AbstractMetalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between catalytic and non-catalytic metal binding sites, finding physicochemical features that distinguish these two types of metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model’s ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and de novo enzyme design.

Download Full-text

DeepGOZero: Improving protein function prediction from sequence and zero-shot learning based on ontology axioms

10.1101/2022.01.14.476325 ◽

2022 ◽

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction ◽

Training Data ◽

Large Set ◽

Theoretic Approach ◽

Machine Learning Model ◽

Protein Functions

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero

Download Full-text

Development of a structure based protein function prediction method: Calcium binding protein

Chem-Bio Informatics Journal ◽

10.1273/cbij.3.96 ◽

2003 ◽

Vol 3 ◽

pp. 96-113 ◽

Cited By ~ 3

Author(s):

Takeo Asaoka ◽

Tadashi Ando ◽

Toshiyuki Meguro ◽

Ichiro Yamato

Keyword(s):

Protein Function ◽

Calcium Binding ◽

Binding Protein ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction ◽

Calcium Binding Protein

Download Full-text

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences

Bioinformatics ◽

10.1093/bioinformatics/bty704 ◽

2018 ◽

Vol 35 (5) ◽

pp. 753-759 ◽

Cited By ~ 8

Author(s):

Aashish Jain ◽

Daisuke Kihara

Keyword(s):

Protein Function ◽

Transfer Functions ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Prediction Method ◽

Query Protein ◽

Function Prediction ◽

Homology Search ◽

Supplementary Information ◽

Phylogenetic Distance

Abstract Motivation Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. Results Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP’s predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. Availability and implementation Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

2P303 FCANAL : A structure based protein function prediction method. Application to enzymes and binding proteins

Seibutsu Butsuri ◽

10.2142/biophys.45.s195_3 ◽

2005 ◽

Vol 45 (supplement) ◽

pp. S195

Author(s):

A. Suzuki ◽

T. Ando ◽

A. Matsumura ◽

H. Sakao ◽

I. Yamato ◽

...

Keyword(s):

Protein Function ◽

Binding Proteins ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction

Download Full-text

An efficient algorithm for matching protein binding sites for protein function prediction

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine - BCB '11 ◽

10.1145/2147805.2147837 ◽

2011 ◽

Cited By ~ 1

Author(s):

Leif Ellingson ◽

Jinfeng Zhang

Keyword(s):

Protein Binding ◽

Binding Sites ◽

Protein Function ◽

Efficient Algorithm ◽

Protein Function Prediction ◽

Function Prediction ◽

Protein Binding Sites

Download Full-text

2P-242 FCANAL, structure-based protein function prediction method, applied to various types of proteins(Bioinformatics:Functional genomics,The 47th Annual Meeting of the Biophysical Society of Japan)

Seibutsu Butsuri ◽

10.2142/biophys.49.s145_1 ◽

2009 ◽

Vol 49 (supplement) ◽

pp. S145

Author(s):

Yuuichi Watanabe ◽

Kousuke Kaido ◽

Takashi Ando ◽

Ichiro Yamato ◽

Satoru Miyazaki

Keyword(s):

Annual Meeting ◽

Protein Function ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction ◽

Biophysical Society

Download Full-text

Machine Learning Differentiates Enzymatic and Non-Enzymatic Metals in Proteins

10.1101/2021.02.01.429261 ◽

2021 ◽

Author(s):

Ryan Feehan ◽

Meghan W. Franklin ◽

Joanna S.G. Slusky

Keyword(s):

Machine Learning ◽

Metal Binding ◽

Binding Sites ◽

Active Sites ◽

De Novo ◽

Enzyme Design ◽

Metal Binding Sites ◽

Ensemble Machine Learning ◽

Machine Learning Model ◽

Physicochemical Features

AbstractMetalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between these two types of metal binding sites, finding physicochemical features that distinguish active and inactive metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model’s ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and de novo enzyme design.

Download Full-text