RFRSN: Improving protein fold recognition by siamese network

ABSTRACTProtein fold recognition is the key to study protein structure and function. As a representative pattern recognition task, there are two main categories of approaches to improve the protein fold recognition performance: 1) extracting more discriminative descriptors, and 2) designing more effective distance metrics. The existing protein fold recognition approaches focus on the first category to finding a robust and discriminative descriptor to represent each protein sequence as a compact feature vector, where different protein sequence is expected to be separated as much as possible in the fold space. These methods have brought huge improvements to the task of protein fold recognition. However, so far, little attention has been paid to the second category. In this paper, we focus not only on the first category, but also on the second point that how to measure the similarity between two proteins more effectively. First, we employ deep convolutional neural network techniques to extract the discriminative fold-specific features from the potential protein residue-residue relationship, we name it SSAfold. On the other hand, due to different feature representation usually subject to varying distributions, the measurement of similarity needs to vary according to different feature distributions. Before, almost all protein fold recognition methods perform the same metrics strategy on all the protein feature ignoring the differences in feature distribution. This paper presents a new protein fold recognition by employing siamese network, we named it PFRSN. The objective of PFRSN is to learns a set of hierarchical nonlinear transformations to project protein pairs into the same fold feature subspace to ensure the distance between positive protein pairs is reduced and that of negative protein pairs is enlarged as much as possible. The experimental results show that the results of SSAfold and PFRSN are highly competitive.

Download Full-text

Recent Trends in Machine Learning-based Protein Fold Recognition Methods

Biointerface Research in Applied Chemistry ◽

10.33263/briac114.1123311243 ◽

2020 ◽

Vol 11 (4) ◽

pp. 11233-11243

Keyword(s):

Machine Learning ◽

Protein Sequence ◽

Protein Function ◽

Three Dimensional ◽

Fold Recognition ◽

Machine Learning Algorithms ◽

Dimensional Structure ◽

Protein Fold ◽

Three Dimensional Structure ◽

Protein Fold Recognition

Proteins are macromolecules that enable life. Protein function is due to its three-dimensional structure and shape. It is challenging to understand how a linear sequence of amino acid residues folds into a three-dimensional structure. Machine learning-based methods may help significantly in reducing the gap present between known protein sequence and structure. Identifying protein folds from a sequence can help predict protein tertiary structure, determine protein function, and give insights into protein-protein interactions. This work focuses on the following aspects. The kind of features such as sequential, structural, functional, and evolutionary extracted for representing protein sequence and different methods of extracting these features. This work also includes details of machine learning algorithms used with respective settings and protein fold recognition structures. Detailed performance comparison of well-known works is also given.

Download Full-text