Automatic single- and multi-label enzymatic function prediction by machine learning

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available athttps://figshare.com/s/a63e0bafa9b71fc7cbd7.

Download Full-text

Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation

International Journal of Molecular Sciences ◽

10.3390/ijms20112845 ◽

2019 ◽

Vol 20 (11) ◽

pp. 2845 ◽

Cited By ~ 6

Author(s):

Ruibo Gao ◽

Mengmeng Wang ◽

Jiaoyan Zhou ◽

Yuhang Fu ◽

Meng Liang ◽

...

Keyword(s):

Amino Acids ◽

Computational Models ◽

Structural Features ◽

Feature Representation ◽

Biological Information ◽

Sequence Information ◽

Deep Convolutional Neural Networks ◽

Structure Information ◽

Enzymatic Function ◽

Proposed Model

During the past decade, due to the number of proteins in PDB database being increased gradually, traditional methods cannot better understand the function of newly discovered enzymes in chemical reactions. Computational models and protein feature representation for predicting enzymatic function are more important. Most of existing methods for predicting enzymatic function have used protein geometric structure or protein sequence alone. In this paper, the functions of enzymes are predicted from many-sided biological information including sequence information and structure information. Firstly, we extract the mutation information from amino acids sequence by the position scoring matrix and express structure information with amino acids distance and angle. Then, we use histogram to show the extracted sequence and structural features respectively. Meanwhile, we establish a network model of three parallel Deep Convolutional Neural Networks (DCNN) to learn three features of enzyme for function prediction simultaneously, and the outputs are fused through two different architectures. Finally, The proposed model was investigated on a large dataset of 43,843 enzymes from the PDB and achieved 92.34% correct classification when sequence information is considered, demonstrating an improvement compared with the previous result.

Download Full-text

Building blocks of protein structures – Physics meets Biology

10.1101/2020.11.10.375105 ◽

2020 ◽

Author(s):

Tatjana Skrbic ◽

Amos Maritan ◽

Achille Giacometti ◽

George D. Rose ◽

Jayanth R. Banavar

Keyword(s):

Protein Structures ◽

Building Blocks ◽

Vital Role ◽

Beta Sheets ◽

Sequence Information ◽

Alpha Helices ◽

Solvent Interactions ◽

Guiding Principle ◽

Amino Acid Sequence Information

The native state structures of globular proteins are stable and well-packed indicating that self-interactions are favored over protein-solvent interactions under folding conditions. We use this as a guiding principle to derive the geometry of the building blocks of protein structures, alpha-helices and strands assembled into beta-sheets, with no adjustable parameters, no amino acid sequence information, and no chemistry. There is an almost perfect fit between the dictates of mathematics and physics and the rules of quantum chemistry. Our theory establishes an energy landscape that channels protein evolution by providing sequence-independent platforms for elaborating sequence-dependent functional diversity. Our work highlights the vital role of discreteness in life and has implications for the creation of artificial life and on the nature of life elsewhere in the cosmos.

Download Full-text

Faculty Opinions recommendation of A study of archaeal enzymes involved in polar lipid synthesis linking amino acid sequence information, genomic contexts and lipid composition.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1028632.342399 ◽

2005 ◽

Author(s):

Robert Michell

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Polar Lipid ◽

Lipid Composition ◽

Lipid Synthesis ◽

Sequence Information ◽

Amino Acid Sequence Information

Download Full-text

Amino acid sequence information in proteins and complex proteinaceous material revealed by pyrolysis-capillary gas chromatography-low and high resolution mass spectrometry

Journal of Analytical and Applied Pyrolysis ◽

10.1016/0165-2370(87)85038-6 ◽

1987 ◽

Vol 11 ◽

pp. 313-327 ◽

Cited By ~ 75

Author(s):

Jaap J. Boon ◽

J.W. De Leeuw

Keyword(s):

Mass Spectrometry ◽

Gas Chromatography ◽

Amino Acid ◽

High Resolution ◽

Amino Acid Sequence ◽

Capillary Gas Chromatography ◽

High Resolution Mass Spectrometry ◽

Sequence Information ◽

Amino Acid Sequence Information ◽

Resolution Mass

Download Full-text

Peer Review #2 of "Automatic single- and multi-label enzymatic function prediction by machine learning (v0.1)"

10.7287/peerj.3095v0.1/reviews/2 ◽

2017 ◽

Keyword(s):

Machine Learning ◽

Peer Review ◽

Function Prediction ◽

Enzymatic Function

Download Full-text

Prediction of Structural and Functional Aspects of Protein

Advances in Secure Computing, Internet Services, and Applications - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4940-8.ch016 ◽

2014 ◽

pp. 317-333

Author(s):

Arun G. Ingale

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Sequence Information ◽

Predict Protein Structure ◽

Basic Ideas

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.

Download Full-text

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Nucleic Acids Research ◽

10.1093/nar/gkaa1097 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D452-D457

Author(s):

Lisanna Paladin ◽

Martina Bevilacqua ◽

Sara Errigo ◽

Damiano Piovesan ◽

Ivan Mičetić ◽

...

Keyword(s):

Protein Data Bank ◽

Tandem Repeat ◽

Tandem Repeats ◽

Classification Scheme ◽

Sequence Similarity ◽

Protein Structures ◽

Hierarchical Classification ◽

Structural Similarity ◽

Data Bank ◽

Similarity Class

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

Download Full-text

Epitope mapping by a method that requires no amino acid sequence information

Analytical Biochemistry ◽

10.1016/0003-2697(92)90596-y ◽

1992 ◽

Vol 205 (1) ◽

pp. 179-182 ◽

Cited By ~ 7

Author(s):

Jie Yuan ◽

Philip S. Low

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Epitope Mapping ◽

Sequence Information ◽

Amino Acid Sequence Information

Download Full-text

Molecular Identification of Family 38 α-Mannosidase of Bacillus sp. Strain GL1, Responsible for Complete Depolymerization of Xanthan

Applied and Environmental Microbiology ◽

10.1128/aem.68.6.2731-2736.2002 ◽

2002 ◽

Vol 68 (6) ◽

pp. 2731-2736 ◽

Cited By ~ 15

Author(s):

Hirokazu Nankai ◽

Wataru Hashimoto ◽

Kousaku Murata

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Cell Extract ◽

Amino Acid Sequences ◽

Glycoside Hydrolase Family ◽

Sequence Information ◽

Reading Frame ◽

A Cell ◽

Terminal Amino ◽

Amino Acid Sequence Information

ABSTRACT When cells of Bacillus sp. strain GL1 were grown in a medium containing xanthan as a carbon source, α-mannosidase exhibiting activity toward p-nitrophenyl-α-d-mannopyranoside (pNP-α-d-Man) was produced intracellularly. The 350-kDa α-mannosidase purified from a cell extract of the bacterium was a trimer comprising three identical subunits, each with a molecular mass of 110 kDa. The enzyme hydrolyzed pNP-α-d-Man (Km = 0.49 mM) and d-mannosyl-(α-1,3)-d-glucose most efficiently at pH 7.5 to 9.0, indicating that the enzyme catalyzes the last step of the xanthan depolymerization pathway of Bacillus sp. strain GL1. The gene for α-mannosidase cloned most by using N-terminal amino acid sequence information contained an open reading frame (3,144 bp) capable of coding for a polypeptide with a molecular weight of 119,239. The deduced amino acid sequence showed homology with the amino acid sequences of α-mannosidases belonging to glycoside hydrolase family 38.

Download Full-text

ISLAND: in-silico proteins binding affinity prediction using sequence information

BioData Mining ◽

10.1186/s13040-020-00231-w ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Wajid Arshad Abbasi ◽

Adiba Yaseen ◽

Fahad Ul Hassan ◽

Saiqa Andleeb ◽

Fayyaz Ul Amir Afsar Minhas

Keyword(s):

Machine Learning ◽

Protein Binding ◽

Binding Affinity ◽

State Of The Art ◽

Protein Complexes ◽

Protein Structures ◽

Sequence Information ◽

Binding Affinity Prediction ◽

Generalization Performance ◽

Affinity Prediction

Abstract Background Determining binding affinity in protein-protein interactions is important in the discovery and design of novel therapeutics and mutagenesis studies. Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore sequence-based protein binding affinity prediction using machine learning. Method We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the protein binding affinity. Results We present our findings that the true generalization performance of even the state-of-the-art sequence-only predictor is far from satisfactory and that the development of machine learning methods for binding affinity prediction with improved generalization performance is still an open problem. We have also proposed a sequence-based novel protein binding affinity predictor called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its python code are available at https://sites.google.com/view/wajidarshad/software. Conclusion This paper highlights the fact that the true generalization performance of even the state-of-the-art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem.

Download Full-text