scholarly journals EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4750 ◽  
Author(s):  
Afshine Amidi ◽  
Shervine Amidi ◽  
Dimitrios Vlachakis ◽  
Vasileios Megalooikonomou ◽  
Nikos Paragios ◽  
...  

During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.


2019 ◽  
Author(s):  
A.T. Balci ◽  
C. Gumeli ◽  
A. Hakouz ◽  
D. Yuret ◽  
O. Keskin ◽  
...  

AbstractMotivationProtein–protein interactions are crucial in almost all biological processes. Proteins interact through their interfaces. It is important to determine how proteins interact through interfaces to understand protein binding mechanisms and to predict new protein-protein interactions.ResultsWe present DeepInterface, a deep learning based method which predicts, for a given protein complex, if the interface between the proteins of a complex is a true interface or not. The model is a 3-dimensional convolutional neural networks model and the positive datasets are obtained from all complexes in the Protein Data Bank, the negative datasets are the incorrect solutions of the docking decoys. The model analyzes a given interface structure and outputs the probability of the given structure being an interface. The accuracy of the model for several interface data sets, including PIFACE, PPI4DOCK, DOCKGROUND is approximately 88% in the validation dataset and 75% in the test dataset. The method can be used to improve the accuracy of template based PPI predictions.



2021 ◽  
Author(s):  
Anastasiya V Kulikova ◽  
Daniel J Diaz ◽  
James M Loy ◽  
Andrew D Ellington ◽  
Claus O Wilke

The fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding a site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate, and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.



2020 ◽  
Author(s):  
Joseph H. Lubin ◽  
Christine Zardecki ◽  
Elliott M. Dolan ◽  
Changpeng Lu ◽  
Zhuofan Shen ◽  
...  

AbstractThree-dimensional structures of SARS-CoV-2 and other coronaviral proteins archived in the Protein Data Bank were used to analyze viral proteome evolution during the first six months of the COVID-19 pandemic. Analyses of spatial locations, chemical properties, and structural and energetic impacts of the observed amino acid changes in >48,000 viral proteome sequences showed how each one of the 29 viral study proteins have undergone amino acid changes. Structural models computed for every unique sequence variant revealed that most substitutions map to protein surfaces and boundary layers with a minority affecting hydrophobic cores. Conservative changes were observed more frequently in cores versus boundary layers/surfaces. Active sites and protein-protein interfaces showed modest numbers of substitutions. Energetics calculations showed that the impact of substitutions on the thermodynamic stability of the proteome follows a universal bi-Gaussian distribution. Detailed results are presented for six drug discovery targets and four structural proteins comprising the virion, highlighting substitutions with the potential to impact protein structure, enzyme activity, and functional interfaces. Characterizing the evolution of the virus in three dimensions provides testable insights into viral protein function and should aid in structure-based drug discovery efforts as well as the prospective identification of amino acid substitutions with potential for drug resistance.



2020 ◽  
Vol 36 (11) ◽  
pp. 3343-3349 ◽  
Author(s):  
Manaz Kaleel ◽  
Yandan Zheng ◽  
Jialiang Chen ◽  
Xuanming Feng ◽  
Jeremy C Simpson ◽  
...  

Abstract Motivation The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. Results Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75–0.86 outperforming the other state-of-the-art web servers we tested. Availability and implementation SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. Contact [email protected]



2018 ◽  
Author(s):  
Kota Kasahara ◽  
Shintaro Minami ◽  
Yasunori Aizawa

ABSTRACTThe principle of three-dimensional protein structure formation is a long-standing conundrum in structural biology. A globular domain of a soluble protein is formed by a network of atomic contacts among amino acid residues, but regions external to globular domains, like loop and linker, often do not have intramolecular contacts with globular domains. Although these regions can play key roles for protein function as interfaces for intermolecular interactions, their nature remains unclear. Here, we termed protein segments external to globular domains as floating segments and sought for them in tens of thousands of entries in the Protein Data Bank. As a result, we found that 0.72 % of residues are in floating segments. Regarding secondary structural elements, coil structures are enriched in floating segments, especially for long segments. Interactions with polypeptides and polynucleotides, but not small compounds, are enriched in floating segments. The amino acid preferences of floating segments are similar to those of surface residues, with exceptions; the small side chain amino acids, Gly and Ala, are preferred, and some charged side chains, Arg and His, are disfavored for floating segments compared to surface residues. Our comprehensive characterization of floating segments may provide insights into understanding protein sequence-structure-function relationships.



2018 ◽  
Author(s):  
George Symeonidis ◽  
Peter P. Groumpos ◽  
Evangelos Dermatas


Sign in / Sign up

Export Citation Format

Share Document