Amino acid torsion angles enable prediction of protein fold classification

2020 ◽  
Author(s):  
Kun Tian ◽  
Xin Zhao ◽  
Xiaogeng Wan ◽  
Stephen Yau

Abstract Background Protein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins, such as X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy, are complex and may require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction.Results Here we describe a new approach that uses integration and analysis of torsion angle information from the Protein Data Bank to enable prediction of protein structures from amino acid sequences. Our prediction model performed well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. This new prediction model performs well with an average of 92.5% accuracy for structure classification, which is higher than the previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence.Conclusions The results show that our method provides a new effective and reliable tool for protein structure prediction research.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Kun Tian ◽  
Xin Zhao ◽  
Xiaogeng Wan ◽  
Stephen S.-T. Yau

AbstractProtein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins are complex and require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction. Here we describe a new approach to predict of protein structures and structure classes from amino acid sequences. Our prediction model performs well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. The average accuracy is 92.5% for structure classification, which is higher than that of previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence. The results show that our method provides a new effective and reliable tool for protein structure prediction research.


2018 ◽  
Author(s):  
Siyuan Liu ◽  
Xilun Xiang ◽  
Haiguang Liu

ABSTRACTProtein structure prediction relies on two major components, a method to generate good models that are close to the native structure and a scoring function that can select the good models. Based on the statistics from known structures in the protein data bank, a statistical energy function is derived to reflect the amino acid neighbourhood preferences. The neighbourhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighbhoring residue types and relative positions. A scoring algorithm, Nepre, has been implemented and its performance was tested with several decoy sets. The results show that the Nepre program can be applied in model ranking to improve the success rate in structure predictions.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3160 ◽  
Author(s):  
Kumar Manochitra ◽  
Subhash Chandra Parija

BackgroundAmoebiasis is the third most common parasitic cause of morbidity and mortality, particularly in countries with poor hygienic settings. There exists an ambiguity in the diagnosis of amoebiasis, and hence there arises a necessity for a better diagnostic approach. Serine-richEntamoeba histolyticaprotein (SREHP), peroxiredoxin and Gal/GalNAc lectin are pivotal inE. histolyticavirulence and are extensively studied as diagnostic and vaccine targets. For elucidating the cellular function of these proteins, details regarding their respective quaternary structures are essential. However, studies in this aspect are scant. Hence, this study was carried out to predict the structure of these target proteins and characterize them structurally as well as functionally using appropriatein-silicomethods.MethodsThe amino acid sequences of the proteins were retrieved from National Centre for Biotechnology Information database and aligned using ClustalW. Bioinformatic tools were employed in the secondary structure and tertiary structure prediction. The predicted structure was validated, and final refinement was carried out.ResultsThe protein structures predicted by i-TASSER were found to be more accurate than Phyre2 based on the validation using SAVES server. The prediction suggests SREHP to be an extracellular protein, peroxiredoxin a peripheral membrane protein while Gal/GalNAc lectin was found to be a cell-wall protein. Signal peptides were found in the amino-acid sequences of SREHP and Gal/GalNAc lectin, whereas they were not present in the peroxiredoxin sequence. Gal/GalNAc lectin showed better antigenicity than the other two proteins studied. All the three proteins exhibited similarity in their structures and were mostly composed of loops.DiscussionThe structures of SREHP and peroxiredoxin were predicted successfully, while the structure of Gal/GalNAc lectin could not be predicted as it was a complex protein composed of sub-units. Also, this protein showed less similarity with the available structural homologs. The quaternary structures of SREHP and peroxiredoxin predicted from this study would provide better structural and functional insights into these proteins and may aid in development of newer diagnostic assays or enhancement of the available treatment modalities.


2021 ◽  
Author(s):  
Chris Papadopoulos ◽  
Isabelle Callebaut ◽  
Jean-Christophe Gelly ◽  
Isabelle Hatin ◽  
Olivier Namy ◽  
...  

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic ORFs (Open Reading Frames) of S. cerevisiae with the aim of (i) exploring whether the large structural diversity observed in proteomes is already present in noncoding sequences, and (ii) estimating the potential of the noncoding genome to produce novel protein bricks that can either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural diversity of canonical proteins with strikingly the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by identifying intergenic ORFs with a strong translation signal in ribosome profiling experiments and by reconstructing the ancestral sequences of 70 yeast de novo genes. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and the one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.


2012 ◽  
Vol 10 (03) ◽  
pp. 1242010 ◽  
Author(s):  
FILIP JAGODZINSKI ◽  
JEANNE HARDY ◽  
ILEANA STREINU

Predicting the effect of a single amino acid substitution on the stability of a protein structure is a fundamental task in macromolecular modeling. It has relevance to drug design and understanding of disease-causing protein variants. We present KINARI-Mutagen, a web server for performing in silico mutation experiments on protein structures from the Protein Data Bank. Our rigidity-theoretical approach permits fast evaluation of the effects of mutations that may not be easy to perform in vitro, because it is not always possible to express a protein with a specific amino acid substitution. We use KINARI-Mutagen to identify critical residues, and we show that our predictions correlate with destabilizing mutations to glycine. In two in-depth case studies we show that the mutated residues identified by KINARI-Mutagen as critical correlate with experimental data, and would not have been identified by other methods such as Solvent Accessible Surface Area measurements or residue ranking by contributions to stabilizing interactions. We also generate 48 mutants for 14 proteins, and compare our rigidity-based results against experimental mutation stability data. KINARI-Mutagen is available at http://kinari.cs.umass.edu .


2015 ◽  
Vol 32 (6) ◽  
pp. 843-849 ◽  
Author(s):  
Rhys Heffernan ◽  
Abdollah Dehzangi ◽  
James Lyons ◽  
Kuldip Paliwal ◽  
Alok Sharma ◽  
...  

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Author(s):  
Jiaxi Liu ◽  

The prediction of protein three-dimensional structure from amino acid sequence has been a challenge problem in bioinformatics, owing to the many potential applications for robust protein structure prediction methods. Protein structure prediction is essential to bioscience, and its research results are important for other research areas. Methods for the prediction an才d design of protein structures have advanced dramatically. The prediction of protein structure based on average hydrophobic values is discussed and an improved genetic algorithm is proposed to solve the optimization problem of hydrophobic protein structure prediction. An adjustment operator is designed with the average hydrophobic value to prevent the overlapping of amino acid positions. Finally, some numerical experiments are conducted to verify the feasibility and effectiveness of the proposed algorithm by comparing with the traditional HNN algorithm.


2019 ◽  
Author(s):  
Sean M. Cascarina ◽  
Mikaela R. Elder ◽  
Eric D. Ross

AbstractA variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure preferences across the entire PDB proteome. Secondary structure preferences varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure preferences. Comparison of LCD secondary structure preferences with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure preferences as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural preferences among LCDs parsed by the nature and magnitude of single amino acid enrichment.Author SummaryThe structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences “AAAAAAAAAA”, “EEEEEEEEEE”, and “EEKRKEEEKE” will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.


Author(s):  
Ivan Anishchenko ◽  
Tamuka M. Chidyausiku ◽  
Sergey Ovchinnikov ◽  
Samuel J. Pellock ◽  
David Baker

AbstractThere has been considerable recent progress in protein structure prediction using deep neural networks to infer distance constraints from amino acid residue co-evolution1–3. We investigated whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occuring proteins used in training the models. We generated random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting distance maps, which as expected are quite featureless. We then carried out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (KL-divergence) between the distance distributions predicted by the network and the background distribution. Optimization from different random starting points resulted in a wide range of proteins with diverse sequences and all alpha, all beta sheet, and mixed alpha-beta structures. We obtained synthetic genes encoding 129 of these network hallucinated sequences, expressed and purified the proteins in E coli, and found that 27 folded to monomeric stable structures with circular dichroism spectra consistent with the hallucinated structures. Thus deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute, alongside traditional physically based models, to the de novo design of proteins with new functions.


Author(s):  
CHANDRAYANI N. ROKDE ◽  
DR.MANALI KSHIRSAGAR

Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. The understanding of protein structures is vital to determine the function of a protein and its interaction with DNA, RNA and enzyme. Thus, protein structure is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. In this paper, different types of protein structures and methods for its prediction are described.


Sign in / Sign up

Export Citation Format

Share Document