scholarly journals Automatic single- and multi-label enzymatic function prediction by machine learning

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3095 ◽  
Author(s):  
Shervine Amidi ◽  
Afshine Amidi ◽  
Dimitrios Vlachakis ◽  
Nikos Paragios ◽  
Evangelia I. Zacharaki

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available athttps://figshare.com/s/a63e0bafa9b71fc7cbd7.

2019 ◽  
Vol 20 (11) ◽  
pp. 2845 ◽  
Author(s):  
Ruibo Gao ◽  
Mengmeng Wang ◽  
Jiaoyan Zhou ◽  
Yuhang Fu ◽  
Meng Liang ◽  
...  

During the past decade, due to the number of proteins in PDB database being increased gradually, traditional methods cannot better understand the function of newly discovered enzymes in chemical reactions. Computational models and protein feature representation for predicting enzymatic function are more important. Most of existing methods for predicting enzymatic function have used protein geometric structure or protein sequence alone. In this paper, the functions of enzymes are predicted from many-sided biological information including sequence information and structure information. Firstly, we extract the mutation information from amino acids sequence by the position scoring matrix and express structure information with amino acids distance and angle. Then, we use histogram to show the extracted sequence and structural features respectively. Meanwhile, we establish a network model of three parallel Deep Convolutional Neural Networks (DCNN) to learn three features of enzyme for function prediction simultaneously, and the outputs are fused through two different architectures. Finally, The proposed model was investigated on a large dataset of 43,843 enzymes from the PDB and achieved 92.34% correct classification when sequence information is considered, demonstrating an improvement compared with the previous result.


2020 ◽  
Author(s):  
Tatjana Skrbic ◽  
Amos Maritan ◽  
Achille Giacometti ◽  
George D. Rose ◽  
Jayanth R. Banavar

The native state structures of globular proteins are stable and well-packed indicating that self-interactions are favored over protein-solvent interactions under folding conditions. We use this as a guiding principle to derive the geometry of the building blocks of protein structures, alpha-helices and strands assembled into beta-sheets, with no adjustable parameters, no amino acid sequence information, and no chemistry. There is an almost perfect fit between the dictates of mathematics and physics and the rules of quantum chemistry. Our theory establishes an energy landscape that channels protein evolution by providing sequence-independent platforms for elaborating sequence-dependent functional diversity. Our work highlights the vital role of discreteness in life and has implications for the creation of artificial life and on the nature of life elsewhere in the cosmos.


Author(s):  
Arun G. Ingale

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.


2020 ◽  
Vol 49 (D1) ◽  
pp. D452-D457
Author(s):  
Lisanna Paladin ◽  
Martina Bevilacqua ◽  
Sara Errigo ◽  
Damiano Piovesan ◽  
Ivan Mičetić ◽  
...  

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.


2002 ◽  
Vol 68 (6) ◽  
pp. 2731-2736 ◽  
Author(s):  
Hirokazu Nankai ◽  
Wataru Hashimoto ◽  
Kousaku Murata

ABSTRACT When cells of Bacillus sp. strain GL1 were grown in a medium containing xanthan as a carbon source, α-mannosidase exhibiting activity toward p-nitrophenyl-α-d-mannopyranoside (pNP-α-d-Man) was produced intracellularly. The 350-kDa α-mannosidase purified from a cell extract of the bacterium was a trimer comprising three identical subunits, each with a molecular mass of 110 kDa. The enzyme hydrolyzed pNP-α-d-Man (Km = 0.49 mM) and d-mannosyl-(α-1,3)-d-glucose most efficiently at pH 7.5 to 9.0, indicating that the enzyme catalyzes the last step of the xanthan depolymerization pathway of Bacillus sp. strain GL1. The gene for α-mannosidase cloned most by using N-terminal amino acid sequence information contained an open reading frame (3,144 bp) capable of coding for a polypeptide with a molecular weight of 119,239. The deduced amino acid sequence showed homology with the amino acid sequences of α-mannosidases belonging to glycoside hydrolase family 38.


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Wajid Arshad Abbasi ◽  
Adiba Yaseen ◽  
Fahad Ul Hassan ◽  
Saiqa Andleeb ◽  
Fayyaz Ul Amir Afsar Minhas

Abstract Background Determining binding affinity in protein-protein interactions is important in the discovery and design of novel therapeutics and mutagenesis studies. Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore sequence-based protein binding affinity prediction using machine learning. Method We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the protein binding affinity. Results We present our findings that the true generalization performance of even the state-of-the-art sequence-only predictor is far from satisfactory and that the development of machine learning methods for binding affinity prediction with improved generalization performance is still an open problem. We have also proposed a sequence-based novel protein binding affinity predictor called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its python code are available at https://sites.google.com/view/wajidarshad/software. Conclusion This paper highlights the fact that the true generalization performance of even the state-of-the-art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem.


Sign in / Sign up

Export Citation Format

Share Document