scholarly journals Finding high-quality metal ion-centric regions across the worldwide Protein Data Bank

2019 ◽  
Author(s):  
Sen Yao ◽  
Hunter N.B. Moseley

AbstractAs the number of macromolecular structures in the worldwide Protein Data Bank (wwPDB) continues to grow rapidly, more attention is being paid to the quality of its data, especially for use in aggregated structural and dynamics analyses. In this study, we systematically analyzed 3.5 Å regions around all metal ions across all PDB entries with supporting electron density maps available from the PDB in Europe. All resulting metal ion-centric regions were evaluated with respect to four quality-control criteria involving electron density resolution, atom occupancy, symmetry atom exclusion, and regional electron density discrepancy. The resulting list of metal binding sites passing all four criteria possess high regional structural quality and should be beneficial to a wide variety of downstream analyses. This study demonstrates an approach for the pan-PDB evaluation of metal binding site structural quality with respect to underlying x-ray crystallographic experimental data represented in available electron density maps of proteins. For non-crystallographers in particular, we hope to change the focus and discussion of structural quality from a global evaluation to a regional evaluation, since all structural entries in the wwPDB appear to have both regions of high and low structural quality.

Molecules ◽  
2019 ◽  
Vol 24 (17) ◽  
pp. 3179
Author(s):  
Sen Yao ◽  
Hunter N.B. Moseley

As the number of macromolecular structures in the worldwide Protein Data Bank (wwPDB) continues to grow rapidly, more attention is being paid to the quality of its data, especially for use in aggregated structural and dynamics analyses. In this study, we systematically analyzed 3.5 Å regions around all metal ions across all PDB entries with supporting electron density maps available from the PDB in Europe. All resulting metal ion-centric regions were evaluated with respect to four quality-control criteria involving electron density resolution, atom occupancy, symmetry atom exclusion, and regional electron density discrepancy. The resulting list of metal binding sites passing all four criteria possess high regional structural quality and should be beneficial to a wide variety of downstream analyses. This study demonstrates an approach for the pan-PDB evaluation of metal binding site structural quality with respect to underlying X-ray crystallographic experimental data represented in the available electron density maps of proteins. For non-crystallographers in particular, we hope to change the focus and discussion of structural quality from a global evaluation to a regional evaluation, since all structural entries in the wwPDB appear to have both regions of high and low structural quality.


2019 ◽  
Author(s):  
Sen Yao ◽  
Hunter N.B. Moseley

AbstractHigh-quality three-dimensional structural data is of great value for the functional interpretation of biomacromolecules, especially proteins; however, structural quality varies greatly across the entries in the worldwide Protein Data Bank (wwPDB). Since 2008, the wwPDB has required the inclusion of structure factors with the deposition of x-ray crystallographic structures to support the independent evaluation of structures with respect to the underlying experimental data used to derive those structures. However, interpreting the discrepancies between the structural model and its underlying electron density data is difficult, since derived electron density maps use arbitrary electron density units which are inconsistent between maps from different wwPDB entries. Therefore, we have developed a method that converts electron density values into units of electrons. With this conversion, we have developed new methods that can evaluate specific regions of an x-ray crystallographic structure with respect to a physicochemical interpretation of its corresponding electron density map. We have systematically compared all deposited x-ray crystallographic protein models in the wwPDB with their underlying electron density maps, if available, and characterized the electron density in terms of expected numbers of electrons based on the structural model. The methods generated coherent evaluation metrics throughout all PDB entries with associated electron density data, which are consistent with visualization software that would normally be used for manual quality assessment. To our knowledge, this is the first attempt to derive units of electrons directly from electron density maps without the aid of the underlying structure factors. These new metrics are biochemically-informative and can be extremely useful for filtering out low-quality structural regions from inclusion into systematic analyses that span large numbers of PDB entries. Furthermore, these new metrics will improve the ability of non-crystallographers to evaluate regions of interest within PDB entries, since only the PDB structure and the associated electron density maps are needed. These new methods are available as a well-documented Python package on GitHub and the Python Package Index under a modified Clear BSD open source license.Author summaryElectron density maps are very useful for validating the x-ray structure models in the Protein Data Bank (PDB). However, it is often daunting for non-crystallographers to use electron density maps, as it requires a lot of prior knowledge. This study provides methods that can infer chemical information solely from the electron density maps available from the PDB to interpret the electron density and electron density discrepancy values in terms of units of electrons. It also provides methods to evaluate regions of interest in terms of the number of missing or excessing electrons, so that a broader audience, such as biologists or bioinformaticians, can also make better use of the electron density information available in the PDB, especially for quality control purposes.Software and full results available athttps://github.com/MoseleyBioinformaticsLab/pdb_eda (software on GitHub)https://pypi.org/project/pdb-eda/ (software on PyPI)https://pdb-eda.readthedocs.io/en/latest/ (documentation on ReadTheDocs)https://doi.org/10.6084/m9.figshare.7994294 (code and results on FigShare)


2014 ◽  
Vol 70 (a1) ◽  
pp. C1481-C1481
Author(s):  
Jon Agirre ◽  
Kevin Cowtan

Despite the key implications carbohydrates have in a multitude of pathological processes, a large number of the sugar-containing structures deposited into the Protein Data Bank (PDB) show nomenclature errors [1] that persist even after the remediation of the PDB archive [2]. Here we present the results from a systematic study of the conformation and ring distortion of cyclic carbohydrate models for which structure factors have been deposited into the PDB. These models have also been scored using a real-space correlation coefficient calculated between model and experimental electron density. The results have enabled us to produce a database of well-refined carbohydrate structures for use in the framework of an automated sugar-detecting software, to be announced shortly.


2006 ◽  
Vol 39 (5) ◽  
pp. 728-734 ◽  
Author(s):  
Maria Cristina Burla ◽  
Rocco Caliandro ◽  
Benedetta Carrozzini ◽  
Giovanni Luca Cascarano ◽  
Liberato De Caro ◽  
...  

The Patterson superposition methods described by Burlaet al.[J. Appl. Cryst.(2006),39, 527–535], based on the use of the `multiple implication functions', have been enriched by supplementary filtering techniques based on some general (resolution-dependent) features of both the Patterson and the electron density maps. The method has been implemented in a modified version of the programSIR2004and tested using a set of 20 crystal structures selected from the Protein Data Bank, having a number of non-hydrogen atoms in the asymmetric unit larger than 2000, atomic resolution data and some heavy atoms (equal to or heavier than Ca). The new phasing procedure is able to solve most of the test structures, among which there are two proteins with more than 6000 non-hydrogen atoms in the asymmetric unit, so extending by far the complexity today commonly considered as the limit for Patterson-based methods (i.e.about 2000 non-hydrogen atoms).


2019 ◽  
Author(s):  
Dmytro Guzenko ◽  
Stephen K. Burley ◽  
Jose M. Duarte

AbstractDetection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density volumes directly. But, brute force alignment of 3D data is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density volumes and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http://shape.rcsb.org).Author SummaryProtein structures possess wildly varied shapes, but patterns at different levels are frequently reused by nature. Finding and classifying these similarities is fundamental to understand evolution. Given the continued growth in the number of known protein structures in the Protein Data Bank, the task of comparing them to find the common patterns is becoming increasingly complicated. This is especially true when considering complete protein assemblies with several polypeptide chains, where the large sizes further complicate the issue. Here we present a novel method that can detect similarity between protein shapes and that works equally fast for any size of proteins or assemblies. The method looks at proteins as volumes of density distribution, departing from what is more usual in the field: similarity assessment based on atomic coordinates and chain connectivity. A volumetric function is amenable to be decomposed with a mathematical tool known as 3D Zernike polynomials, resulting in a compact description as vectors of Zernike moments. The tool was introduced in the 1990s, when it was suggested that the moments could be normalized to be invariant to rotations without losing information. Here we demonstrate that in fact this normalization is possible and that it offers a much more accurate method for assessing similarity between shapes, when compared to previous attempts.


2021 ◽  
Author(s):  
Pavel V. Afonine ◽  
Paul D. Adams ◽  
Oleg V Sobolev ◽  
Alexandre Urzhumtsev

Bulk solvent is a major component of bio-macromolecular crystals and therefore contributes significantly to diffraction intensities. Accurate modeling of the bulk-solvent region has been recognized as important for many crystallographic calculations, from computing of R-factors and density maps to model building and refinement. Owing to its simplicity and computational and modeling power, the flat (mask-based) bulk-solvent model introduced by Jiang & Brunger (1994) is used by most modern crystallographic software packages to account for disordered solvent. In this manuscript we describe further developments of the mask-based model that improves the fit between the model and the data and aids in map interpretation. The new algorithm, here referred to as mosaic bulk-solvent model, considers solvent variation across the unit cell. The mosaic model is implemented in the computational crystallography toolbox and can be used in Phenix in most contexts where accounting for bulk-solvent is required. It has been optimized and validated using a sufficiently large subset of the Protein Data Bank entries that have crystallographic data available.


2008 ◽  
Vol 73 (5) ◽  
pp. 608-615 ◽  
Author(s):  
Petr Kolenko ◽  
Tereza Skálová ◽  
Jan Dohnálek ◽  
Jindřich Hašek

Glycosylation of IgG-Fc plays an important role in the activation of the immune system response. Effector functions are modulated by different degrees of deglycosylation of IgG-Fc. However, the geometry of oligosaccharides covalently bound to IgG-Fc does not seem to be in good agreement with electron density in most of the structures deposited in the Protein Data Bank. Our study of correlation between the oligosaccharide geometry, connectivity, and electron density shows several discrepancies, mainly for L-fucose. Revision of refinement of two structures containing the Fc-fragment solved at the highest resolution brings clear evidence for α-L-fucosylation instead of β-L-fucosylation as it was claimed in most of the deposited structures in the Protein Data Bank containing the Fc-fragment, and also in the original structures selected for re-refinement. Our revision refinement results in a decrease in R factors, better agreement with electron density, meaningful contacts, and acceptable geometry of L-fucose.


2014 ◽  
Vol 70 (a1) ◽  
pp. C1483-C1483
Author(s):  
Heping Zheng ◽  
Mahendra Chordia ◽  
David Cooper ◽  
Ivan Shabalin ◽  
Maksymilian Chruszcz ◽  
...  

Metals play vital roles in both the mechanism and architecture of biological macromolecules, and are the most frequently encountered ligands (i.e. non-solvent heterogeneous chemical atoms) in the determination of macromolecular crystal structures. However, metal coordinating environments in protein structures are not always easy to check in routine validation procedures, resulting in an abundance of misidentified and/or suboptimally modeled metal ions in the Protein Data Bank (PDB). We present a solution to identify these problems in three distinct yet related aspects: (1) coordination chemistry; (2) agreement of experimental B-factors and occupancy; and (3) the composition and motif of the metal binding environment. Due to additional strain introduced by macromolecular backbones, the patterns of coordination of metal binding sites in metal-containing macromolecules are more complex and diverse than those found in inorganic or organometallic chemistry. These complications make a comprehensive library of "permitted" coordination chemistry in protein structures less feasible, and the usage of global parameters such as the bond valence method more practical, in the determination and validation of metal binding environments. Although they are relatively infrequent, there are also cases where the experimental B-factor or occupancy of a metal ion suggests careful examination. We have developed a web-based tool called CheckMyMetal [1](http://csgid.org/csgid/metal_sites/) for the quick validation of metal binding sites. Moreover, the acquired knowledge of the composition and spatial arrangement (motif) of the coordinating atoms around the metal ion may also help in the modeling of metal binding sites in macromolecular structures. All of the studies described herein were performed using the NEIGHBORHOOD SQL database [2], which connects information about all modeled non-solvent heterogeneous chemical motifs in PDB structure by vectors describing all contacts to neighboring residues and atoms. NEIGHBORHOOD has broad applications for the validation and data mining of ligand binding environments in the PDB.


2017 ◽  
Vol 73 (3) ◽  
pp. 223-233 ◽  
Author(s):  
Heping Zheng ◽  
David R. Cooper ◽  
Przemyslaw J. Porebski ◽  
Ivan G. Shabalin ◽  
Katarzyna B. Handing ◽  
...  

Metals are essential in many biological processes, and metal ions are modeled in roughly 40% of the macromolecular structures in the Protein Data Bank (PDB). However, a significant fraction of these structures contain poorly modeled metal-binding sites.CheckMyMetal(CMM) is an easy-to-use metal-binding site validation server for macromolecules that is freely available at http://csgid.org/csgid/metal_sites. TheCMMserver can detect incorrect metal assignments as well as geometrical and other irregularities in the metal-binding sites. Guidelines for metal-site modeling and validation in macromolecules are illustrated by several practical examples grouped by the type of metal. These examples showCMMusers (and crystallographers in general) problems they may encounter during the modeling of a specific metal ion.


Sign in / Sign up

Export Citation Format

Share Document