scholarly journals The accuracy of NMR protein structures in the Protein Data Bank

2021 ◽  
Author(s):  
Nicholas J Fowler ◽  
Adnan Sljoka ◽  
Mike P Williamson

We recently described a method, ANSURR, for measuring the accuracy of NMR protein structures. It is based on comparing residue-specific measures of rigidity from backbone chemical shifts via the random coil index, and from structures. Here, we report the use of ANSURR to analyse NMR ensembles within the Protein Data Bank (PDB). NMR structures cover a wide range of accuracy, which improved over time until about 2005, since when accuracy has not improved. Most structures have accurate secondary structure, but are too floppy, particularly in loops. There is a need for more experimental restraints in loops. The best current accuracy measures are Ramachandran distribution and number of NOE restraints per residue. The precision of structure ensembles correlates with accuracy, as does the number of hydrogen bond restraints per residue. If a structure contains additional components (such as additional polypeptide chains or ligands), then their inclusion improves accuracy. Analysis of over 7000 PDB NMR ensembles is available via our website ansurr.com.

2020 ◽  
Author(s):  
Nicholas J. Fowler ◽  
Adnan Sljoka ◽  
Mike P. Williamson

AbstractWe present a method, Accuracy of NMR Structures using Random Coil Index and Rigidity (ANSURR), that measures the accuracy of NMR protein structures. It provides a residue-by-residue comparison of two measures of local rigidity: the Random Coil Index [RCI] (a measure of the extent to which backbone chemical shifts adopt random coil values); and local rigidity predicted by mathematical rigidity theory using the computational method Floppy Inclusion and Rigid Substructure Topology [FIRST], calculated from an NMR structural model. We compare RCI and FIRST using a correlation score (which assesses the location of secondary structure), and an RMSD score (which measures overall rigidity, and mainly assesses hydrogen bond correctness). We test the performance of ANSURR using: (a) structures refined in explicit solvent, which have much better RMSD score than unrefined structures, though similar correlation; (b) decoy structures generated for 89 NMR structures. The experimental NMR structures are usually better, though helical and sheet structures behave differently; (c) conventional predictors of structural accuracy such as number of restraints per residue, restraint violations, energy of structure, RMSD of the ensemble (precision of the calculation), Ramachandran distribution, and clashscore. Comparisons of NMR to crystal structures show that secondary structure is equally accurate in both, but crystal structures tend to be too rigid in loops, whereas NMR structures tend to be too floppy overall.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Nicholas J. Fowler ◽  
Adnan Sljoka ◽  
Mike P. Williamson

AbstractWe present a method that measures the accuracy of NMR protein structures. It compares random coil index [RCI] against local rigidity predicted by mathematical rigidity theory, calculated from NMR structures [FIRST], using a correlation score (which assesses secondary structure), and an RMSD score (which measures overall rigidity). We test its performance using: structures refined in explicit solvent, which are much better than unrefined structures; decoy structures generated for 89 NMR structures; and conventional predictors of accuracy such as number of restraints per residue, restraint violations, energy of structure, ensemble RMSD, Ramachandran distribution, and clashscore. Restraint violations and RMSD are poor measures of accuracy. Comparisons of NMR to crystal structures show that secondary structure is equally accurate, but crystal structures are typically too rigid in loops, whereas NMR structures are typically too floppy overall. We show that the method is a useful addition to existing measures of accuracy.


2018 ◽  
Vol 19 (11) ◽  
pp. 3405 ◽  
Author(s):  
Emanuel Peter ◽  
Jiří Černý

In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson–Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.


Author(s):  
Dominique MIAS-LUCQUIN ◽  
Isaure Chauvot de Beauchêne

We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included. For the 143 groups identified as bound-unbound structures of the same protein , we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions, and compared it to the supposedly binding-induced modifications. This benchmark is, to our knowledge, the first attempt made to peruse available structures of protein – ssDNA interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.


2019 ◽  
Vol 52 (6) ◽  
pp. 1422-1426
Author(s):  
Rajendran Santhosh ◽  
Namrata Bankoti ◽  
Adgonda Malgonnavar Padmashri ◽  
Daliah Michael ◽  
Jeyaraman Jeyakanthan ◽  
...  

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.


2020 ◽  
Vol 49 (D1) ◽  
pp. D452-D457
Author(s):  
Lisanna Paladin ◽  
Martina Bevilacqua ◽  
Sara Errigo ◽  
Damiano Piovesan ◽  
Ivan Mičetić ◽  
...  

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.


2020 ◽  
Vol 21 (6) ◽  
pp. 2243
Author(s):  
Nicolas K. Shinada ◽  
Peter Schmidtke ◽  
Alexandre G. de Brevern

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.


2007 ◽  
Vol 02 (03n04) ◽  
pp. 267-271
Author(s):  
ZOLTÁN SZABADKA ◽  
RAFAEL ÖRDÖG ◽  
VINCE GROLMUSZ

The Protein Data Bank (PDB) is the most important depository of protein structural information, containing more than 45,000 deposited entries today. Because of its inhomogeneous structure, its fully automated processing is almost impossible. In a previous work, we cleaned and re-structured the entries in the Protein Data Bank, and from the result we have built the RS-PDB database. Using the RS-PDB database, we draw a Ramachandran-plot from 6,593 "perfect" polypeptide chains found in the PDB, containing 1,192,689 residues. This is a more than tenfold increase in the size of data analyzed before this work. The density of the data points makes it possible to draw a logarithmic heat map enhanced Ramachandran map, showing the fine inner structure of the right-handed α-helix region.


2015 ◽  
Vol 71 (8) ◽  
pp. 1604-1614 ◽  
Author(s):  
Wouter G. Touw ◽  
Robbie P. Joosten ◽  
Gert Vriend

A coordinate-based method is presented to detect peptide bonds that need correction either by a peptide-plane flip or by atrans–cisinversion of the peptide bond. When applied to the whole Protein Data Bank, the method predicts 4617trans–cisflips and many thousands of hitherto unknown peptide-plane flips. A few examples are highlighted for which a correction of the peptide-plane geometry leads to a correction of the understanding of the structure–function relation. All data, including 1088 manually validated cases, are freely available and the method is available from a web server, a web-service interface and throughWHAT_CHECK.


2021 ◽  
Author(s):  
Jakob Toudahl Nielsen ◽  
Frans A.A. Mulder

AbstractNMR chemical shifts (CSs) are delicate reporters of local protein structure, and recent advances in random coil CS (RCCS) prediction and interpretation now offer the compelling prospect of inferring small populations of structure from small deviations from RCCSs. Here, we present CheSPI, a simple and efficient method that provides unbiased and sensitive aggregate measures of local structure and disorder. It is demonstrated that CheSPI can predict even very small amounts of residual structure and robustly delineate subtle differences into four structural classes for intrinsically disordered proteins. For structured regions and proteins, CheSPI can assign up to eight structural classes, which coincide with the well-known DSSP classification. The program is freely available, and can either be invoked from URL www.protein-nmr.org as a web implementation, or run locally from command line as a python program. CheSPI generates comprehensive numeric and graphical output for intuitive annotation and visualization of protein structures. A number of examples are provided.


Sign in / Sign up

Export Citation Format

Share Document