A deep attention network for predicting amino acid signals in the formation of α-helices

2020 ◽  
Vol 18 (05) ◽  
pp. 2050028
Author(s):  
A. Visibelli ◽  
P. Bongini ◽  
A. Rossi ◽  
N. Niccolai ◽  
M. Bianchini

The secondary and tertiary structure of a protein has a primary role in determining its function. Even though many folding prediction algorithms have been developed in the past decades — mainly based on the assumption that folding instructions are encoded within the protein sequence — experimental techniques remain the most reliable to establish protein structures. In this paper, we searched for signals related to the formation of [Formula: see text]-helices. We carried out a statistical analysis on a large dataset of experimentally characterized secondary structure elements to find over- or under-occurrences of specific amino acids defining the boundaries of helical moieties. To validate our hypothesis, we trained various Machine Learning models, each equipped with an attention mechanism, to predict the occurrence of [Formula: see text]-helices. The attention mechanism allows to interpret the model’s decision, weighing the importance the predictor gives to each part of the input. The experimental results show that different models focus on the same subsequences, which can be seen as codes driving the secondary structure formation.

Author(s):  
Bruce A. Shapiro ◽  
Wojciech Kasprzak

Genomic information (nucleic acid and amino acid sequences) completely determines the characteristics of the nucleic acid and protein molecules that express a living organism’s function. One of the greatest challenges in which computation is playing a role is the prediction of higher order structure from the one-dimensional sequence of genes. Rules for determining macromolecule folding have been continually evolving. Specifically in the case of RNA (ribonucleic acid) there are rules and computer algorithms/systems (see below) that partially predict and can help analyze the secondary and tertiary interactions of distant parts of the polymer chain. These successes are very important for determining the structural and functional characteristics of RNA in disease processes and hi the cell life cycle. It has been shown that molecules with the same function have the potential to fold into similar structures though they might differ in their primary sequences. This fact also illustrates the importance of secondary and tertiary structure in relation to function. Examples of such constancy in secondary structure exist in transfer RNAs (tRNAs), 5s RNAs, 16s RNAs, viroid RNAs, and portions of retroviruses such as HIV. The secondary and tertiary structure of tRNA Phe (Kim et al., 1974), of a hammerhead ribozyme (Pley et al., 1994), and of Tetrahymena (Cate et al., 1996a, 1996b) have been shown by their crystal structure. Currently little is known of tertiary interactions, but studies on tRNA indicate these are weaker than secondary structure interactions (Riesner and Romer, 1973; Crothers and Cole, 1978; Jaeger et al., 1989b). It is very difficult to crystallize and/or get nuclear magnetic resonance spectrum data for large RNA molecules. Therefore, a logical place to start in determining the 3D structure of RNA is computer prediction of the secondary structure. The sequence (primary structure) of an RNA molecule is relatively easy to produce. Because experimental methods for determining RNA secondary and tertiary structure (when the primary sequence folds back on itself and forms base pairs) have not kept pace with the rapid discovery of RNA molecules and their function, use of and methods for computer prediction of secondary and tertiary structures have increasingly been developed.


2020 ◽  
Vol 36 (17) ◽  
pp. 4599-4608 ◽  
Author(s):  
Mostofa Rafid Uddin ◽  
Sazan Mahbub ◽  
M Saifur Rahman ◽  
Md Shamsuzzoha Bayzid

Abstract Motivation Protein structures provide basic insight into how they can interact with other proteins, their functions and biological roles in an organism. Experimental methods (e.g. X-ray crystallography and nuclear magnetic resonance spectroscopy) for predicting the secondary structure (SS) of proteins are very expensive and time consuming. Therefore, developing efficient computational approaches for predicting the SS of protein is of utmost importance. Advances in developing highly accurate SS prediction methods have mostly been focused on 3-class (Q3) structure prediction. However, 8-class (Q8) resolution of SS contains more useful information and is much more challenging than the Q3 prediction. Results We present SAINT, a highly accurate method for Q8 structure prediction, which incorporates self-attention mechanism (a concept from natural language processing) with the Deep Inception-Inside-Inception network in order to effectively capture both the short- and long-range interactions among the amino acid residues. SAINT offers a more interpretable framework than the typical black-box deep neural network methods. Through an extensive evaluation study, we report the performance of SAINT in comparison with the existing best methods on a collection of benchmark datasets, namely, TEST2016, TEST2018, CASP12 and CASP13. Our results suggest that self-attention mechanism improves the prediction accuracy and outperforms the existing best alternate methods. SAINT is the first of its kind and offers the best known Q8 accuracy. Thus, we believe SAINT represents a major step toward the accurate and reliable prediction of SSs of proteins. Availability and implementation SAINT is freely available as an open-source project at https://github.com/SAINTProtein/SAINT.


Author(s):  
Gururaj Tejeshwar ◽  
Siddesh Gaddadadevra Mat

Introduction: The primary structure of the protein is a polypeptide chain made up of a sequence of amino acids. What happens due to interaction between the atoms of the backbone is that it forms within a polypeptide a folded structure which is very much within the secondary structure. These alignments can be made more accurate by the inclusion of secondary structure information. Objective: It is difficult to identify the sequence information embedded in the secondary structure of the protein. However, Deep learning methods can be used for solving the identification of the sequence information in the protein structures. Methods: The scope of the proposed work is to increase the accuracy of identifying the sequence information in the primary structure and the tertiary structure, thereby increasing the accuracy of the predicted protein secondary structure (PSS). In this proposed work, homology is eliminated by a Recurrent Neural Network (RNN) based network that consists of three layers namely bi-directional Long Short term Memory (LSTM), time distributed layer and Softmax layer. Results: The proposed LDS model achieves an accuracy of approx. 86% for the prediction of the three-state secondary structure of the protein. Conclusion: The gap between the number of protein primary structures and secondary structures we know is huge and increasing. Machine learning is trying to reduce this gap. In most of the other pre attempts in predicting the secondary structure of proteins the data is divided according to homology of the proteins. This limits the efficiency of the predicting model and limits the inputs given to such models. Hence in our model homology has not been considered while collecting the data for training or testing out model. This has led to our model to not be affected by the homology of the protein fed to it and hence remove that restriction, so any protein can be fed to it.


2019 ◽  
Author(s):  
Mostofa Rafid Uddin ◽  
Sazan Mahbub ◽  
M Saifur Rahman ◽  
Md Shamsuzzoha Bayzid

AbstractMotivationProtein structures provide basic insight into how they can interact with other proteins, their functions and biological roles in an organism. Experimental methods (e.g., X-ray crystallography, nuclear magnetic resonance spectroscopy) for predicting the secondary structure (SS) of proteins are very expensive and time consuming. Therefore, developing efficient computational approaches for predicting the secondary structure of protein is of utmost importance. Advances in developing highly accurate SS prediction methods have mostly been focused on 3-class (Q3) structure prediction. However, 8-class (Q8) resolution of secondary structure contains more useful information and is much more challenging than the Q3 prediction.ResultsWe present SAINT, a highly accurate method for Q8 structure prediction, which incorporates self-attention mechanism (a concept from natural language processing) with the Deep Inception-Inside-Inception (Deep3I) network in order to effectively capture both the short-range and long-range interactions among the amino acid residues. SAINT offers a more interpretable framework than the typical black-box deep neural network methods. Through an extensive evaluation study, we report the performance of SAINT in comparison with the existing best methods on a collection of benchmark datasets, namely, TEST2016, TEST2018, CASP12 and CASP13. Our results suggest that self-attention mechanism improves the prediction accuracy and outperforms the existing best alternate methods. SAINT is the first of its kind and offers the best known Q8 accuracy. Thus, we believe SAINT represents a major step towards the accurate and reliable prediction of secondary structures of proteins.AvailabilitySAINT is freely available as an open source project at https://github.com/SAINTProtein/SAINT.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Nicholas J. Fowler ◽  
Adnan Sljoka ◽  
Mike P. Williamson

AbstractWe present a method that measures the accuracy of NMR protein structures. It compares random coil index [RCI] against local rigidity predicted by mathematical rigidity theory, calculated from NMR structures [FIRST], using a correlation score (which assesses secondary structure), and an RMSD score (which measures overall rigidity). We test its performance using: structures refined in explicit solvent, which are much better than unrefined structures; decoy structures generated for 89 NMR structures; and conventional predictors of accuracy such as number of restraints per residue, restraint violations, energy of structure, ensemble RMSD, Ramachandran distribution, and clashscore. Restraint violations and RMSD are poor measures of accuracy. Comparisons of NMR to crystal structures show that secondary structure is equally accurate, but crystal structures are typically too rigid in loops, whereas NMR structures are typically too floppy overall. We show that the method is a useful addition to existing measures of accuracy.


Biochemistry ◽  
1976 ◽  
Vol 15 (20) ◽  
pp. 4370-4377 ◽  
Author(s):  
P. H. Bolton ◽  
C. R. Jones ◽  
D. Bastedo-Lerner ◽  
K. L. Wong ◽  
D. R. Kearns

1986 ◽  
Vol 238 (2) ◽  
pp. 485-490 ◽  
Author(s):  
S R Martin ◽  
P M Bayley

Near-u.v. and far-u.v. c.d. spectra of bovine testis calmodulin and its tryptic fragments (TR1C, N-terminal half, residues 1-77, and TR2C, C-terminal half, residues 78-148) were recorded in metal-ion-free buffer and in the presence of saturating concentrations of Ca2+ or Cd2+ under a range of different solvent conditions. The results show the following: if there is any interaction between the N-terminal and C-terminal halves of calmodulin, it has not apparent effect on the secondary or tertiary structure of either half; the conformational changes induced by Ca2+ or Cd2+ are substantially greater in TR2C than they are in TR1C; the presence of Ca2+ or Cd2+ confers considerable stability with respect to urea-induced denaturation, both for the whole molecule and for either of the tryptic fragments; a thermally induced transition occurs in whole calmodulin at temperatures substantially below the temperature of major thermal unfolding, both in the presence and in the absence of added metal ion; the effects of Cd2+ are identical with those of Ca2+ under all conditions studied.


Sign in / Sign up

Export Citation Format

Share Document