A chemical interpretation of protein electron density maps in the worldwide protein data bank

AbstractHigh-quality three-dimensional structural data is of great value for the functional interpretation of biomacromolecules, especially proteins; however, structural quality varies greatly across the entries in the worldwide Protein Data Bank (wwPDB). Since 2008, the wwPDB has required the inclusion of structure factors with the deposition of x-ray crystallographic structures to support the independent evaluation of structures with respect to the underlying experimental data used to derive those structures. However, interpreting the discrepancies between the structural model and its underlying electron density data is difficult, since derived electron density maps use arbitrary electron density units which are inconsistent between maps from different wwPDB entries. Therefore, we have developed a method that converts electron density values into units of electrons. With this conversion, we have developed new methods that can evaluate specific regions of an x-ray crystallographic structure with respect to a physicochemical interpretation of its corresponding electron density map. We have systematically compared all deposited x-ray crystallographic protein models in the wwPDB with their underlying electron density maps, if available, and characterized the electron density in terms of expected numbers of electrons based on the structural model. The methods generated coherent evaluation metrics throughout all PDB entries with associated electron density data, which are consistent with visualization software that would normally be used for manual quality assessment. To our knowledge, this is the first attempt to derive units of electrons directly from electron density maps without the aid of the underlying structure factors. These new metrics are biochemically-informative and can be extremely useful for filtering out low-quality structural regions from inclusion into systematic analyses that span large numbers of PDB entries. Furthermore, these new metrics will improve the ability of non-crystallographers to evaluate regions of interest within PDB entries, since only the PDB structure and the associated electron density maps are needed. These new methods are available as a well-documented Python package on GitHub and the Python Package Index under a modified Clear BSD open source license.Author summaryElectron density maps are very useful for validating the x-ray structure models in the Protein Data Bank (PDB). However, it is often daunting for non-crystallographers to use electron density maps, as it requires a lot of prior knowledge. This study provides methods that can infer chemical information solely from the electron density maps available from the PDB to interpret the electron density and electron density discrepancy values in terms of units of electrons. It also provides methods to evaluate regions of interest in terms of the number of missing or excessing electrons, so that a broader audience, such as biologists or bioinformaticians, can also make better use of the electron density information available in the PDB, especially for quality control purposes.Software and full results available athttps://github.com/MoseleyBioinformaticsLab/pdb_eda (software on GitHub)https://pypi.org/project/pdb-eda/ (software on PyPI)https://pdb-eda.readthedocs.io/en/latest/ (documentation on ReadTheDocs)https://doi.org/10.6084/m9.figshare.7994294 (code and results on FigShare)

Download Full-text

Finding high-quality metal ion-centric regions across the worldwide Protein Data Bank

10.1101/619809 ◽

2019 ◽

Author(s):

Sen Yao ◽

Hunter N.B. Moseley

Keyword(s):

Protein Data Bank ◽

Electron Density ◽

Metal Binding ◽

Metal Ion ◽

Data Bank ◽

Structural Quality ◽

Global Evaluation ◽

Density Maps ◽

Quality Control Criteria ◽

Control Criteria

AbstractAs the number of macromolecular structures in the worldwide Protein Data Bank (wwPDB) continues to grow rapidly, more attention is being paid to the quality of its data, especially for use in aggregated structural and dynamics analyses. In this study, we systematically analyzed 3.5 Å regions around all metal ions across all PDB entries with supporting electron density maps available from the PDB in Europe. All resulting metal ion-centric regions were evaluated with respect to four quality-control criteria involving electron density resolution, atom occupancy, symmetry atom exclusion, and regional electron density discrepancy. The resulting list of metal binding sites passing all four criteria possess high regional structural quality and should be beneficial to a wide variety of downstream analyses. This study demonstrates an approach for the pan-PDB evaluation of metal binding site structural quality with respect to underlying x-ray crystallographic experimental data represented in available electron density maps of proteins. For non-crystallographers in particular, we hope to change the focus and discussion of structural quality from a global evaluation to a regional evaluation, since all structural entries in the wwPDB appear to have both regions of high and low structural quality.

Download Full-text

Finding High-Quality Metal Ion-Centric Regions Across the Worldwide Protein Data Bank

Molecules ◽

10.3390/molecules24173179 ◽

2019 ◽

Vol 24 (17) ◽

pp. 3179

Author(s):

Sen Yao ◽

Hunter N.B. Moseley

Keyword(s):

Protein Data Bank ◽

Electron Density ◽

Metal Binding ◽

Metal Ion ◽

Data Bank ◽

Structural Quality ◽

Global Evaluation ◽

Density Maps ◽

Quality Control Criteria ◽

Control Criteria

As the number of macromolecular structures in the worldwide Protein Data Bank (wwPDB) continues to grow rapidly, more attention is being paid to the quality of its data, especially for use in aggregated structural and dynamics analyses. In this study, we systematically analyzed 3.5 Å regions around all metal ions across all PDB entries with supporting electron density maps available from the PDB in Europe. All resulting metal ion-centric regions were evaluated with respect to four quality-control criteria involving electron density resolution, atom occupancy, symmetry atom exclusion, and regional electron density discrepancy. The resulting list of metal binding sites passing all four criteria possess high regional structural quality and should be beneficial to a wide variety of downstream analyses. This study demonstrates an approach for the pan-PDB evaluation of metal binding site structural quality with respect to underlying X-ray crystallographic experimental data represented in the available electron density maps of proteins. For non-crystallographers in particular, we hope to change the focus and discussion of structural quality from a global evaluation to a regional evaluation, since all structural entries in the wwPDB appear to have both regions of high and low structural quality.

Download Full-text

Validation of carbohydrate structures: not just nomenclature

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314085180 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C1481-C1481

Author(s):

Jon Agirre ◽

Kevin Cowtan

Keyword(s):

Protein Data Bank ◽

Correlation Coefficient ◽

Electron Density ◽

Systematic Study ◽

Data Bank ◽

Real Space ◽

Space Correlation ◽

Structure Factors ◽

Ring Distortion ◽

Experimental Electron Density

Despite the key implications carbohydrates have in a multitude of pathological processes, a large number of the sugar-containing structures deposited into the Protein Data Bank (PDB) show nomenclature errors [1] that persist even after the remediation of the PDB archive [2]. Here we present the results from a systematic study of the conformation and ring distortion of cyclic carbohydrate models for which structure factors have been deposited into the PDB. These models have also been scored using a real-space correlation coefficient calculated between model and experimental electron density. The results have enabled us to produce a database of well-refined carbohydrate structures for use in the framework of an automated sugar-detecting software, to be announced shortly.

Download Full-text

Use of Patterson-based methods automatically to determine the structures of heavy-atom-containing proteins with up to 6000 non-hydrogen atoms in the asymmetric unit

Journal of Applied Crystallography ◽

10.1107/s0021889806028548 ◽

2006 ◽

Vol 39 (5) ◽

pp. 728-734 ◽

Cited By ~ 6

Author(s):

Maria Cristina Burla ◽

Rocco Caliandro ◽

Benedetta Carrozzini ◽

Giovanni Luca Cascarano ◽

Liberato De Caro ◽

...

Keyword(s):

Crystal Structures ◽

Protein Data Bank ◽

Heavy Atom ◽

Data Bank ◽

Atomic Resolution ◽

Asymmetric Unit ◽

Hydrogen Atoms ◽

Heavy Atoms ◽

Density Maps ◽

Resolution Data

The Patterson superposition methods described by Burlaet al.[J. Appl. Cryst.(2006),39, 527–535], based on the use of the `multiple implication functions', have been enriched by supplementary filtering techniques based on some general (resolution-dependent) features of both the Patterson and the electron density maps. The method has been implemented in a modified version of the programSIR2004and tested using a set of 20 crystal structures selected from the Protein Data Bank, having a number of non-hydrogen atoms in the asymmetric unit larger than 2000, atomic resolution data and some heavy atoms (equal to or heavier than Ca). The new phasing procedure is able to solve most of the test structures, among which there are two proteins with more than 6000 non-hydrogen atoms in the asymmetric unit, so extending by far the complexity today commonly considered as the limit for Patterson-based methods (i.e.about 2000 non-hydrogen atoms).

Download Full-text

Real time structural search of the Protein Data Bank

10.1101/845123 ◽

2019 ◽

Cited By ~ 1

Author(s):

Dmytro Guzenko ◽

Stephen K. Burley ◽

Jose M. Duarte

Keyword(s):

Real Time ◽

Protein Data Bank ◽

Electron Density ◽

Polypeptide Chain ◽

Protein Structures ◽

Data Bank ◽

Zernike Moment ◽

Search Problem ◽

Mathematical Tool ◽

Protein Assemblies

AbstractDetection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density volumes directly. But, brute force alignment of 3D data is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density volumes and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http://shape.rcsb.org).Author SummaryProtein structures possess wildly varied shapes, but patterns at different levels are frequently reused by nature. Finding and classifying these similarities is fundamental to understand evolution. Given the continued growth in the number of known protein structures in the Protein Data Bank, the task of comparing them to find the common patterns is becoming increasingly complicated. This is especially true when considering complete protein assemblies with several polypeptide chains, where the large sizes further complicate the issue. Here we present a novel method that can detect similarity between protein shapes and that works equally fast for any size of proteins or assemblies. The method looks at proteins as volumes of density distribution, departing from what is more usual in the field: similarity assessment based on atomic coordinates and chain connectivity. A volumetric function is amenable to be decomposed with a mathematical tool known as 3D Zernike polynomials, resulting in a compact description as vectors of Zernike moments. The tool was introduced in the 1990s, when it was suggested that the moments could be normalized to be invariant to rotations without losing information. Here we demonstrate that in fact this normalization is possible and that it offers a much more accurate method for assessing similarity between shapes, when compared to previous attempts.

Download Full-text

A mosaic bulk-solvent model improves density maps and the fit between model and data

10.1101/2021.12.09.471976 ◽

2021 ◽

Author(s):

Pavel V. Afonine ◽

Paul D. Adams ◽

Oleg V Sobolev ◽

Alexandre Urzhumtsev

Keyword(s):

Protein Data Bank ◽

Model Building ◽

Data Bank ◽

Crystallographic Data ◽

Software Packages ◽

Map Interpretation ◽

Density Maps ◽

Solvent Model ◽

Modeling Power ◽

R Factors

Bulk solvent is a major component of bio-macromolecular crystals and therefore contributes significantly to diffraction intensities. Accurate modeling of the bulk-solvent region has been recognized as important for many crystallographic calculations, from computing of R-factors and density maps to model building and refinement. Owing to its simplicity and computational and modeling power, the flat (mask-based) bulk-solvent model introduced by Jiang & Brunger (1994) is used by most modern crystallographic software packages to account for disordered solvent. In this manuscript we describe further developments of the mask-based model that improves the fit between the model and the data and aids in map interpretation. The new algorithm, here referred to as mosaic bulk-solvent model, considers solvent variation across the unit cell. The mosaic model is implemented in the computational crystallography toolbox and can be used in Phenix in most contexts where accounting for bulk-solvent is required. It has been optimized and validated using a sufficiently large subset of the Protein Data Bank entries that have crystallographic data available.

Download Full-text

L-Fucose in Crystal Structures of IgG-Fc: Reinterpretation of Experimental Data

Collection of Czechoslovak Chemical Communications ◽

10.1135/cccc20080608 ◽

2008 ◽

Vol 73 (5) ◽

pp. 608-615 ◽

Cited By ~ 2

Author(s):

Petr Kolenko ◽

Tereza Skálová ◽

Jan Dohnálek ◽

Jindřich Hašek

Keyword(s):

Experimental Data ◽

Immune System ◽

Protein Data Bank ◽

Electron Density ◽

Data Bank ◽

System Response ◽

Fc Fragment ◽

Effector Functions ◽

Immune System Response ◽

Good Agreement

Glycosylation of IgG-Fc plays an important role in the activation of the immune system response. Effector functions are modulated by different degrees of deglycosylation of IgG-Fc. However, the geometry of oligosaccharides covalently bound to IgG-Fc does not seem to be in good agreement with electron density in most of the structures deposited in the Protein Data Bank. Our study of correlation between the oligosaccharide geometry, connectivity, and electron density shows several discrepancies, mainly for L-fucose. Revision of refinement of two structures containing the Fc-fragment solved at the highest resolution brings clear evidence for α-L-fucosylation instead of β-L-fucosylation as it was claimed in most of the deposited structures in the Protein Data Bank containing the Fc-fragment, and also in the original structures selected for re-refinement. Our revision refinement results in a decrease in R factors, better agreement with electron density, meaningful contacts, and acceptable geometry of L-fucose.

Download Full-text