NEPRE: a Scoring Function for Protein Structures based on Neighbourhood Preference

ABSTRACTProtein structure prediction relies on two major components, a method to generate good models that are close to the native structure and a scoring function that can select the good models. Based on the statistics from known structures in the protein data bank, a statistical energy function is derived to reflect the amino acid neighbourhood preferences. The neighbourhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighbhoring residue types and relative positions. A scoring algorithm, Nepre, has been implemented and its performance was tested with several decoy sets. The results show that the Nepre program can be applied in model ranking to improve the success rate in structure predictions.

Download Full-text

A novel score for highly accurate and efficient prediction of native protein structures

10.1101/2020.04.23.056945 ◽

2020 ◽

Author(s):

Lu-yun Wu ◽

Xia-yu Xia ◽

Xian-ming Pan

Keyword(s):

Protein Structure ◽

Energy Function ◽

Structure Prediction ◽

Driving Forces ◽

Protein Structures ◽

Scoring Function ◽

Computational Prediction ◽

Detailed Knowledge ◽

Peptide Bonds ◽

X Ray

AbstractProtein structure resolution has lagged far behind sequence determination, as it is often laborious and time-consuming to resolve individual protein structure – more often than not even impossible. For computational prediction, due to the lack of detailed knowledge on the folding driving forces, how to design an energy function is still an open question. Furthermore, an effective criterion to evaluate the performance of the energy function is also lacking. Here we present a novel knowledge-based-energy scoring function, simply considering the interactions of peptide bonds, rather than, as conventionally, the residues or atoms as the most important energy contribution. This energy scoring was evaluated by selecting the X-ray structure from a large number of possibilities. It not only outperforms the best of the previously published statistical potentials, but also has very low computational expense. Besides, we suggest an alternative criterion to evaluate the performance of the energy scoring function, measured by the template modeling score of the selected rank-one. We argue that the comparison should allow for some deviation between the x-ray and predicted structures. Collectively, this accurate and simple energy scoring function, together with the optimized criterion, will significantly advance the computational protein structure prediction.

Download Full-text

Amino acid torsion angles enable prediction of protein fold classification

10.21203/rs.2.20475/v1 ◽

2020 ◽

Author(s):

Kun Tian ◽

Xin Zhao ◽

Xiaogeng Wan ◽

Stephen Yau

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Prediction Model ◽

Structure Prediction ◽

High Throughput Sequencing ◽

Protein Structures ◽

Data Bank ◽

Amino Acid Sequences ◽

Single Amino Acid ◽

New Approach

Abstract Background Protein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins, such as X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy, are complex and may require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction.Results Here we describe a new approach that uses integration and analysis of torsion angle information from the Protein Data Bank to enable prediction of protein structures from amino acid sequences. Our prediction model performed well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. This new prediction model performs well with an average of 92.5% accuracy for structure classification, which is higher than the previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence.Conclusions The results show that our method provides a new effective and reliable tool for protein structure prediction research.

Download Full-text

FASPR: an open-source tool for fast and accurate protein side-chain packing

Bioinformatics ◽

10.1093/bioinformatics/btaa234 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3758-3765 ◽

Cited By ~ 6

Author(s):

Xiaoqiang Huang ◽

Robin Pearce ◽

Yang Zhang

Keyword(s):

Protein Structure ◽

Protein Design ◽

Structure Prediction ◽

Protein Structures ◽

Scoring Function ◽

Supplementary Information ◽

Side Chain ◽

Chain Packing ◽

And Function ◽

Side Chain Packing

Abstract Motivation Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. Results We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. Availability and implementation The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins

Bioinformatics ◽

10.1093/bioinformatics/btv665 ◽

2015 ◽

Vol 32 (6) ◽

pp. 843-849 ◽

Cited By ~ 51

Author(s):

Rhys Heffernan ◽

Abdollah Dehzangi ◽

James Lyons ◽

Kuldip Paliwal ◽

Alok Sharma ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Structure Prediction ◽

Protein Structures ◽

Correlation Coefficients ◽

Accessible Surface Area ◽

Solvent Accessible Surface Area ◽

Supplementary Information ◽

Amino Acid Residues ◽

Solvent Exposure

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Protein Structure Prediction Based on Improved Genetic Algorithm

International Journal of Environmental Science and Development ◽

10.18178/ijesd.2020.11.9.1289 ◽

2020 ◽

Vol 11 (9) ◽

pp. 450-454

Author(s):

Jiaxi Liu ◽

Keyword(s):

Genetic Algorithm ◽

Protein Structure ◽

Amino Acid ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Improved Genetic Algorithm ◽

Research Areas

The prediction of protein three-dimensional structure from amino acid sequence has been a challenge problem in bioinformatics, owing to the many potential applications for robust protein structure prediction methods. Protein structure prediction is essential to bioscience, and its research results are important for other research areas. Methods for the prediction an才d design of protein structures have advanced dramatically. The prediction of protein structure based on average hydrophobic values is discussed and an improved genetic algorithm is proposed to solve the optimization problem of hydrophobic protein structure prediction. An adjustment operator is designed with the average hydrophobic value to prevent the overlapping of amino acid positions. Finally, some numerical experiments are conducted to verify the feasibility and effectiveness of the proposed algorithm by comparing with the traditional HNN algorithm.

Download Full-text

Amino acid torsion angles enable prediction of protein fold classification

Scientific Reports ◽

10.1038/s41598-020-78465-1 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Kun Tian ◽

Xin Zhao ◽

Xiaogeng Wan ◽

Stephen S.-T. Yau

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Protein Structure Prediction ◽

Structure Prediction ◽

High Throughput Sequencing ◽

Protein Structures ◽

Amino Acid Sequences ◽

Single Amino Acid ◽

New Approach ◽

Protein Functions

AbstractProtein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins are complex and require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction. Here we describe a new approach to predict of protein structures and structure classes from amino acid sequences. Our prediction model performs well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. The average accuracy is 92.5% for structure classification, which is higher than that of previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence. The results show that our method provides a new effective and reliable tool for protein structure prediction research.

Download Full-text

A COMPARATIVE STUDY OF PROTEIN TERTIARY STRUCTURE PREDICTION METHODS

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2014.1168 ◽

2014 ◽

pp. 15-18

Author(s):

CHANDRAYANI N. ROKDE ◽

DR.MANALI KSHIRSAGAR

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Sequence Data ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

X Ray Crystallography ◽

Protein Tertiary Structure Prediction

Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. The understanding of protein structures is vital to determine the function of a protein and its interaction with DNA, RNA and enzyme. Thus, protein structure is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. In this paper, different types of protein structures and methods for its prediction are described.

Download Full-text

Sequence Specific Dihedral Angle Distribution: Application in Protein Structure Prediction and Evaluation

Plant Tissue Culture and Biotechnology ◽

10.3329/ptcb.v19i2.5439 ◽

1970 ◽

Vol 19 (2) ◽

pp. 217-226

Author(s):

S. M. Minhaz Ud-Dean ◽

Mahdi Muhammad Moosa

Keyword(s):

Protein Structure ◽

Dihedral Angle ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Angle Distribution ◽

Ramachandran Plot ◽

Specific Data ◽

Specific Distribution ◽

Structure Evaluation

Protein structure prediction and evaluation is one of the major fields of computational biology. Estimation of dihedral angle can provide information about the acceptability of both theoretically predicted and experimentally determined structures. Here we report on the sequence specific dihedral angle distribution of high resolution protein structures available in PDB and have developed Sasichandran, a tool for sequence specific dihedral angle prediction and structure evaluation. This tool will allow evaluation of a protein structure in pdb format from the sequence specific distribution of Ramachandran angles. Additionally, it will allow retrieval of the most probable Ramachandran angles for a given sequence along with the sequence specific data. Key words: Torsion angle, φ-ψ distribution, sequence specific ramachandran plot, Ramasekharan, protein structure appraisal D.O.I. 10.3329/ptcb.v19i2.5439 Plant Tissue Cult. & Biotech. 19(2): 217-226, 2009 (December)

Download Full-text

Expanding our knowledge of the protein universe: Modelling of protein structures

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314095084 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C491-C491

Author(s):

Jürgen Haas ◽

Alessandro Barbato ◽

Tobias Schmidt ◽

Steven Roth ◽

Andrew Waterhouse ◽

...

Keyword(s):

Computational Modeling ◽

Structure Prediction ◽

Structural Information ◽

Protein Structures ◽

Model Organism ◽

Data Bank ◽

Continuous Model ◽

Structure Modeling ◽

Structure Comparison ◽

Modeling And Prediction

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.

Download Full-text

AlphaFold at CASP13

Bioinformatics ◽

10.1093/bioinformatics/btz422 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4862-4865 ◽

Cited By ~ 48

Author(s):

Mohammed AlQuraishi

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Computational Prediction ◽

Data Bank ◽

Academic Community ◽

Physical Contact ◽

Evolutionary Analysis ◽

History Of ◽

First Time

Abstract Summary: Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.

Download Full-text