fold recognition
Recently Published Documents


TOTAL DOCUMENTS

379
(FIVE YEARS 29)

H-INDEX

54
(FIVE YEARS 5)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Amelia Villegas-Morcillo ◽  
Victoria Sanchez ◽  
Angel M. Gomez

Abstract Background Current state-of-the-art deep learning approaches for protein fold recognition learn protein embeddings that improve prediction performance at the fold level. However, there still exists aperformance gap at the fold level and the (relatively easier) family level, suggesting that it might be possible to learn an embedding space that better represents the protein folds. Results In this paper, we propose the FoldHSphere method to learn a better fold embedding space through a two-stage training procedure. We first obtain prototype vectors for each fold class that are maximally separated in hyperspherical space. We then train a neural network by minimizing the angular large margin cosine loss to learn protein embeddings clustered around the corresponding hyperspherical fold prototypes. Our network architectures, ResCNN-GRU and ResCNN-BGRU, process the input protein sequences by applying several residual-convolutional blocks followed by a gated recurrent unit-based recurrent layer. Evaluation results on the LINDAHL dataset indicate that the use of our hyperspherical embeddings effectively bridges the performance gap at the family and fold levels. Furthermore, our FoldHSpherePro ensemble method yields an accuracy of 81.3% at the fold level, outperforming all the state-of-the-art methods. Conclusions Our methodology is efficient in learning discriminative and fold-representative embeddings for the protein domains. The proposed hyperspherical embeddings are effective at identifying the protein fold class by pairwise comparison, even when amino acid sequence similarities are low.


2021 ◽  
Author(s):  
Charles Bayly-Jones ◽  
James C. Whisstock

Protein structure fundamentally underpins the function and processes of numerous biological systems. Fold recognition algorithms offer a sensitive and robust tool to detect structural, and thereby functional, similarities between distantly related homologs. In the era of accurate structure prediction owing to advances in machine learning techniques, previously curated sequence databases have become a rich source of biological information. Here, we use bioinformatic fold recognition algorithms to scan the entire AlphaFold structure database to identify novel protein family members, infer function and group predicted protein structures. As an example of the utility of this approach, we identify novel, previously unknown members of various pore-forming protein families, including MACPFs, GSDMs and aerolysin-like proteins. Further, we explore the use of structure-based mining for functional inference.


2021 ◽  
Author(s):  
Ke Han ◽  
Yan Liu ◽  
Dong-Jun Yu

ABSTRACTProtein fold recognition is the key to study protein structure and function. As a representative pattern recognition task, there are two main categories of approaches to improve the protein fold recognition performance: 1) extracting more discriminative descriptors, and 2) designing more effective distance metrics. The existing protein fold recognition approaches focus on the first category to finding a robust and discriminative descriptor to represent each protein sequence as a compact feature vector, where different protein sequence is expected to be separated as much as possible in the fold space. These methods have brought huge improvements to the task of protein fold recognition. However, so far, little attention has been paid to the second category. In this paper, we focus not only on the first category, but also on the second point that how to measure the similarity between two proteins more effectively. First, we employ deep convolutional neural network techniques to extract the discriminative fold-specific features from the potential protein residue-residue relationship, we name it SSAfold. On the other hand, due to different feature representation usually subject to varying distributions, the measurement of similarity needs to vary according to different feature distributions. Before, almost all protein fold recognition methods perform the same metrics strategy on all the protein feature ignoring the differences in feature distribution. This paper presents a new protein fold recognition by employing siamese network, we named it PFRSN. The objective of PFRSN is to learns a set of hierarchical nonlinear transformations to project protein pairs into the same fold feature subspace to ensure the distance between positive protein pairs is reduced and that of negative protein pairs is enlarged as much as possible. The experimental results show that the results of SSAfold and PFRSN are highly competitive.


2021 ◽  
Vol 421 ◽  
pp. 127-139 ◽  
Author(s):  
Ke Yan ◽  
Jie Wen ◽  
Yong Xu ◽  
Bin Liu

2020 ◽  
Vol 11 (4) ◽  
pp. 11233-11243

Proteins are macromolecules that enable life. Protein function is due to its three-dimensional structure and shape. It is challenging to understand how a linear sequence of amino acid residues folds into a three-dimensional structure. Machine learning-based methods may help significantly in reducing the gap present between known protein sequence and structure. Identifying protein folds from a sequence can help predict protein tertiary structure, determine protein function, and give insights into protein-protein interactions. This work focuses on the following aspects. The kind of features such as sequential, structural, functional, and evolutionary extracted for representing protein sequence and different methods of extracting these features. This work also includes details of machine learning algorithms used with respective settings and protein fold recognition structures. Detailed performance comparison of well-known works is also given.


2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Wessam Elhefnawy ◽  
Min Li ◽  
Jianxin Wang ◽  
Yaohang Li

Abstract Background One of the most essential problems in structural bioinformatics is protein fold recognition. In this paper, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features at fragment level to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the first stage employs a multi-modal Deep Belief Network (DBN) to predict the potential structural fragments given a sequence, represented as a fragment vector, and then the second stage uses a deep convolutional neural network (CNN) to classify the fragment vector into the corresponding fold. Results Our results show that DeepFrag-k yields 92.98% accuracy in predicting the top-100 most popular fragments, which can be used to generate discriminative fragment feature vectors to improve protein fold recognition. Conclusions There is a set of fragments that can serve as structural “keywords” distinguishing between major protein folds. The deep learning architecture in DeepFrag-k is able to accurately identify these fragments as structure features to improve protein fold recognition.


Sign in / Sign up

Export Citation Format

Share Document