scholarly journals Mining folded proteomes in the era of accurate structure prediction

2021 ◽  
Author(s):  
Charles Bayly-Jones ◽  
James C. Whisstock

Protein structure fundamentally underpins the function and processes of numerous biological systems. Fold recognition algorithms offer a sensitive and robust tool to detect structural, and thereby functional, similarities between distantly related homologs. In the era of accurate structure prediction owing to advances in machine learning techniques, previously curated sequence databases have become a rich source of biological information. Here, we use bioinformatic fold recognition algorithms to scan the entire AlphaFold structure database to identify novel protein family members, infer function and group predicted protein structures. As an example of the utility of this approach, we identify novel, previously unknown members of various pore-forming protein families, including MACPFs, GSDMs and aerolysin-like proteins. Further, we explore the use of structure-based mining for functional inference.

2021 ◽  
Vol 22 (11) ◽  
pp. 6032
Author(s):  
Donghyuk Suh ◽  
Jai Woo Lee ◽  
Sun Choi ◽  
Yoonji Lee

The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins’ 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug–target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.


2021 ◽  
Author(s):  
Chunxiang Peng ◽  
Xiaogen Zhou ◽  
Yuhao Xia ◽  
Yang Zhang ◽  
Guijun Zhang

With the development of protein structure prediction methods and biological experimental determination techniques, the structure of single-domain proteins can be relatively easier to be modeled or experimentally solved. However, more than 80% of eukaryotic proteins and 67% of prokaryotic proteins contain multiple domains. Constructing a unified multi-domain protein structure database will promote the research of multi-domain proteins, especially in the modeling of multi-domain protein structures. In this work, we develop a unified multi-domain protein structure database (MPDB). Based on MPDB, we also develop a server with two functional modules: (1) the culling module, which filters the whole MPDB according to input criteria; (2) the detection module, which identifies structural analogues of the full-chain according to the structural similarity between input domain models and the protein in MPDB. The module can discover the potential analogue structures, which will contribute to high-quality multi-domain protein structure modeling.


2018 ◽  
Vol 7 (4.5) ◽  
pp. 168
Author(s):  
Khatri Chandni ◽  
Prof. Mrudang Pandya ◽  
Dr. Sunil Jardosh

In recent years, Machine Learning techniques that are based on Deep Learning networks that show a great promise in research          communities.Successful methods for deep learning involve Artificial Neural Networks and Machine Learning. Deep Learning solves severa  problems in bioinformatics. Protein Structure Prediction is one of the most important fields that can be solving using Deep Learning  approaches.These protein are categorized on basis of occurrence of amino acid patterns occur to extract the feature. In these paper aimed to review work based on protein structure prediction solve using Deep Learning Networks. Objective is to review motivate and facilitatethese deep learn the network for predicting protein sequences using Deep Learning. 


2007 ◽  
Vol 4 (3) ◽  
pp. 208-223 ◽  
Author(s):  
José A. Reyes ◽  
David Gilbert

Summary This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results.Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature.


2014 ◽  
Vol 11 (95) ◽  
pp. 20131147 ◽  
Author(s):  
Agnel Praveen Joseph ◽  
Alexandre G. de Brevern

Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.


2021 ◽  
Vol 22 (21) ◽  
pp. 11449
Author(s):  
Gabriel Bianchin de Oliveira ◽  
Helio Pedrini ◽  
Zanoni Dias

Protein secondary structures are important in many biological processes and applications. Due to advances in sequencing methods, there are many proteins sequenced, but fewer proteins with secondary structures defined by laboratory methods. With the development of computer technology, computational methods have (started to) become the most important methodologies for predicting secondary structures. We evaluated two different approaches to this problem—driven by the recent results obtained by computational methods in this task—i) template-free classifiers, based on machine learning techniques; and ii) template-based classifiers, based on searching tools. Both approaches are formed by different sub-classifiers—six for template-free and two for template-based, each with a specific view of the protein. Our results show that these ensembles improve the results of each approach individually.


2020 ◽  
Author(s):  
Yechan Hong ◽  
Yongyu Deng ◽  
Haofan Cui ◽  
Jan Segert ◽  
Jianlin Cheng

AbstractThe fold classification of a protein reveals valuable information about its shape and function. It is important to find a mapping between protein structures and their folds. There are numerous machine learning techniques to predict protein folds from 1-dimensional (1D) protein sequences, but there are few machine learning methods to directly class protein 3D (tertiary) structures into predefined folds (e.g. folds defined in the SCOP database). We develop a 2D-convolutional neural network to classify any protein structure into one of 1232 folds. We extract two classes of input features for each protein: residue-residue distance matrix and persistent homology images derived from 3D protein structures. Due to restrictions in computing resources, we sample every other point in the carbon alpha chain to generate a reduced distance map representation. We find that it does not lead to significant loss in accuracy. Using the distance matrix, we achieve an accuracy of 95.2% on the SCOP dataset. With persistence homology images of 100 × 100 resolution, we achieve an accuracy of 56% on SCOPe 2.07 dataset. Combining the two kinds of features further improves classification accuracy. The source code of our method (PRO3DCNN) is available at https://github.com/jianlin-cheng/PRO3DCNN.


2018 ◽  
Author(s):  
Jack Yang ◽  
Sandip De ◽  
Joshua E Campbell ◽  
Sean Li ◽  
Michele Ceriotti ◽  
...  

Predictive computational methods have the potential to significantly accelerate the discovery of new materials with targeted properties by guiding the choice of candidate materials for synthesis. Recently, a planar pyrrole azaphenacene molecule (pyrido[2,3-b]pyrido[3`,2`:4,5]-pyrrolo[3,2-g]indole, <b>1</b>) was synthesized and shown to have promising properties for charge transport, which relate to stacking of molecules in its crystal structure. Building on our methods for evaluating small molecule organic semiconductors using crystal structure prediction, we have screened a set of 27 structural isomers of <b>1</b> to assess charge mobility in their predicted crystal structures. Machine--learning techniques are used to identify structural classes across the landscapes of all molecules and we find that, despite differences in the arrangement of hydrogen bond functionality, the predicted crystal structures of the molecules studied here can be classified into a small number of packing types. We analyze the predicted property landscapes of the series of molecules and discuss several metrics that can be used to rank the molecules as promising semiconductors. The results suggest several isomers with superior predicted electron mobilities to <b>1</b> and suggest two molecules in particular that represent attractive synthetic targets.


2018 ◽  
Author(s):  
Jack Yang ◽  
Sandip De ◽  
Joshua E Campbell ◽  
Sean Li ◽  
Michele Ceriotti ◽  
...  

Predictive computational methods have the potential to significantly accelerate the discovery of new materials with targeted properties by guiding the choice of candidate materials for synthesis. Recently, a planar pyrrole azaphenacene molecule (pyrido[2,3-b]pyrido[3`,2`:4,5]-pyrrolo[3,2-g]indole, <b>1</b>) was synthesized and shown to have promising properties for charge transport, which relate to stacking of molecules in its crystal structure. Building on our methods for evaluating small molecule organic semiconductors using crystal structure prediction, we have screened a set of 27 structural isomers of <b>1</b> to assess charge mobility in their predicted crystal structures. Machine--learning techniques are used to identify structural classes across the landscapes of all molecules and we find that, despite differences in the arrangement of hydrogen bond functionality, the predicted crystal structures of the molecules studied here can be classified into a small number of packing types. We analyze the predicted property landscapes of the series of molecules and discuss several metrics that can be used to rank the molecules as promising semiconductors. The results suggest several isomers with superior predicted electron mobilities to <b>1</b> and suggest two molecules in particular that represent attractive synthetic targets.


Sign in / Sign up

Export Citation Format

Share Document