Implementing an X-ray validation pipeline for the Protein Data Bank

There is an increasing realisation that the quality of the biomacromolecular structures deposited in the Protein Data Bank (PDB) archive needs to be assessed critically using established and powerful validation methods. The Worldwide Protein Data Bank (wwPDB) organization has convened several Validation Task Forces (VTFs) to advise on the methods and standards that should be used to validate all of the entries already in the PDB as well as all structures that will be deposited in the future. The recommendations of the X-ray VTF are currently being implemented in a software pipeline. Here, ongoing work on this pipeline is briefly described as well as ways in which validation-related information could be presented to users of structural data.

Download Full-text

Worldwide Protein Data Bank validation information: usage and trends

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798318003303 ◽

2018 ◽

Vol 74 (3) ◽

pp. 237-244 ◽

Cited By ~ 5

Author(s):

Oliver S. Smart ◽

Vladimír Horský ◽

Swanand Gore ◽

Radka Svobodová Vařeková ◽

Veronika Bendová ◽

...

Keyword(s):

Protein Data Bank ◽

Three Dimensional ◽

Data Bank ◽

Resonance Spectroscopy ◽

X Ray ◽

Validation Metrics ◽

Task Forces ◽

X Ray Crystallography ◽

Structure Properties

Realising the importance of assessing the quality of the biomolecular structures deposited in the Protein Data Bank (PDB), the Worldwide Protein Data Bank (wwPDB) partners established Validation Task Forces to obtain advice on the methods and standards to be used to validate structures determined by X-ray crystallography, nuclear magnetic resonance spectroscopy and three-dimensional electron cryo-microscopy. The resulting wwPDB validation pipeline is an integral part of the wwPDB OneDep deposition, biocuration and validation system. The wwPDB Validation Service webserver (https://validate.wwpdb.org) can be used to perform checks prior to deposition. Here, it is shown how validation metrics can be combined to produce an overall score that allows the ranking of macromolecular structures and domains in search results. The ValTrendsDBdatabase provides users with a convenient way to access and analyse validation information and other properties of X-ray crystal structures in the PDB, including investigating trends in and correlations between different structure properties and validation metrics.

Download Full-text

New wwPDB validation pipelines for X-ray, NMR and 3DEM structures

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314085210 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C1478-C1478

Author(s):

Swanand Gore ◽

Pieter Hendrickx ◽

Eduardo Sanz-Garcia ◽

Sameer Velankar ◽

Gerard Kleywegt

Keyword(s):

Electron Microscopy ◽

Protein Data Bank ◽

Data Bank ◽

X Ray ◽

Task Forces ◽

Annotation System ◽

Electron Microscopy Data ◽

Microscopy Data ◽

Machine Readable

The Protein Data Bank (PDB) is the single global archive of 3D biomacromolecular structure data. The archive is managed by the Worldwide Protein Data Bank (wwPDB; wwpdb.org) organisation through its partners, the Research Collaboratory for Structural Bioinformatics (RCSB PDB), the Protein Data Bank Japan (PDBj), the Protein Data Bank in Europe and the Biological Magnetic Resonance Bank (BMRB). Analogously, the Electron Microscopy Data Bank (EMDB) is managed by the EMDataBank (emdatabank.org) organisation. A few years ago, realising the needs and opportunities to assess the quality of biomacromolecular structures deposited in the PDB, the wwPDB and EMDataBank partners established Validation Task Forces (VTFs) to advice them on up-to-date and community-agreed methods and standards to validate X-ray, NMR and 3DEM structures and data. All three VTFs have now published their recommendations (1, 2, 3) and these are getting implemented as validation-software pipelines . The pipelines are integrated in the new joint wwPDB deposition and annotation system (http://deposit.wwpdb.org/deposition/). In addition, stand-alone servers are provided to allow practising structural biologists to validate models prior to publication and deposition (http://wwpdb.org/validation-servers.html). The validation pipelines and the output they produce (human-readable PDF reports and machine-readable XML files) will be described.

Download Full-text

Rapid response to emerging biomedical challenges and threats

IUCrJ ◽

10.1107/s2052252521003018 ◽

2021 ◽

Vol 8 (3) ◽

Author(s):

Marek Grabowski ◽

Joanna M. Macnar ◽

Marcin Cymborowski ◽

David R. Cooper ◽

Ivan G. Shabalin ◽

...

Keyword(s):

Protein Data Bank ◽

Biomedical Research ◽

Large Scale ◽

Structural Data ◽

Data Bank ◽

Rapid Response ◽

Multiple Resources

As part of the global mobilization to combat the present pandemic, almost 100 000 COVID-19-related papers have been published and nearly a thousand models of macromolecules encoded by SARS-CoV-2 have been deposited in the Protein Data Bank within less than a year. The avalanche of new structural data has given rise to multiple resources dedicated to assessing the correctness and quality of structural data and models. Here, an approach to evaluate the massive amounts of such data using the resource https://covid19.bioreproducibility.org is described, which offers a template that could be used in large-scale initiatives undertaken in response to future biomedical crises. Broader use of the described methodology could considerably curtail information noise and significantly improve the reproducibility of biomedical research.

Download Full-text

Timely deposition of macromolecular structures is necessary for peer review

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913024621 ◽

2013 ◽

Vol 69 (12) ◽

pp. 2293-2295 ◽

Cited By ~ 3

Author(s):

Robbie P. Joosten ◽

Hayssam Soueidan ◽

Lodewyk F. A. Wessels ◽

Anastassis Perrakis

Keyword(s):

Peer Review ◽

Protein Data Bank ◽

Significant Proportion ◽

Data Bank ◽

X Ray ◽

X Ray Crystallography ◽

Journal Policies ◽

Clear Description

Most of the macromolecular structures in the Protein Data Bank (PDB), which are used daily by thousands of educators and scientists alike, are determined by X-ray crystallography. It was examined whether the crystallographic models and data were deposited to the PDB at the same time as the publications that describe them were submitted for peer review. This condition is necessary to ensure pre-publication validation and the quality of the PDB public archive. It was found that a significant proportion of PDB entries were submitted to the PDB after peer review of the corresponding publication started, and many were only submitted after peer review had ended. It is argued that clear description of journal policies and effective policing is important for pre-publication validation, which is key in ensuring the quality of the PDB and of peer-reviewed literature.

Download Full-text

PDBrenum: a webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences

10.1101/2021.02.14.431128 ◽

2021 ◽

Author(s):

Bulat Faezov ◽

Roland L. Dunbrack

Keyword(s):

Protein Data Bank ◽

Data Bank ◽

Post Translational Modifications ◽

X Ray ◽

X Ray Crystallography ◽

Link Type ◽

Binding Partners ◽

Cryo Electron Microscopy ◽

Comparative Structure ◽

In The Beginning

AbstractThe Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In the beginning the archive held only seven structures but in early 2021, the database has more than 170,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions (e.g., binding partners such as ligands, nucleic acids, or other proteins; mutations and post-translational modifications), thus enabling comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. In addition to the coordinates, there are many fields that contain information regarding specific residues in the sequence of each protein in the entry. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe. The server can also take a list of UniProt identifiers (“P04637” or “P53_HUMAN”) and return the desired files.AvailabilitySource code is freely available at https://github.com/Faezov/PDBrenum. The webserver is located at: http://dunbrack3.fccc.edu/[email protected] or [email protected].

Download Full-text

Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444904019419 ◽

2004 ◽

Vol 60 (10) ◽

pp. 1833-1839 ◽

Cited By ~ 179

Author(s):

Huanwang Yang ◽

Vladimir Guranovic ◽

Shuchismita Dutta ◽

Zukang Feng ◽

Helen M. Berman ◽

...

Keyword(s):

Protein Data Bank ◽

Data Bank ◽

X Ray Diffraction ◽

X Ray

Download Full-text

Analyzing Motion Properties of Proteins Affected by Localized Structures From a Robot Kinematics Perspective

Volume 5A: 39th Mechanisms and Robotics Conference ◽

10.1115/detc2015-47010 ◽

2015 ◽

Author(s):

Keisuke Arikawa

Keyword(s):

Protein Data Bank ◽

Complex Shape ◽

Structural Data ◽

Data Bank ◽

Robot Kinematics ◽

Motion Prediction ◽

Serial Manipulators ◽

Localized Structures ◽

Motion Modes ◽

Structural Compliance

On the basis of robot kinematics, we have thus far developed a method for predicting the motion of proteins from their 3D structural data given in the Protein Data Bank (PDB data). In this method, proteins are modeled as serial manipulators constrained by springs and the structural compliance properties of the models are evaluated. We focus on localized instead of whole structures of proteins. Employing the same model used in our method of motion prediction, the motion properties of the localized structures and the relation between the motion properties of localized and whole structures are analyzed. First, we present a method for graphically expressing the deformation of objects with a complex shape, such as proteins, by approximating the shape as a rectangular prism with a mesh on its surface. We then formulate a method for comparing the motion properties of localized structures cleaved from the whole structure and those remaining in it by expressing the motion of the latter using the decomposed motion modes of the former according to the structural compliance. Finally, we show a method for evaluating the effect of a localized structure on the motion properties of proteins by applying forces to localized structures. In the formulations, we demonstrate applications as illustrative examples using the PDB data of a real protein.

Download Full-text

Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format

Nucleic Acids Research ◽

10.1093/nar/gkr811 ◽

2011 ◽

Vol 40 (D1) ◽

pp. D453-D460 ◽

Cited By ~ 88

Author(s):

A. R. Kinjo ◽

H. Suzuki ◽

R. Yamashita ◽

Y. Ikegawa ◽

T. Kudou ◽

...

Keyword(s):

Protein Data Bank ◽

Resource Description Framework ◽

Structural Data ◽

Data Bank ◽

Data Archive ◽

Description Framework ◽

Resource Description

Download Full-text

A chemical interpretation of protein electron density maps in the worldwide protein data bank

10.1101/613109 ◽

2019 ◽

Cited By ~ 3

Author(s):

Sen Yao ◽

Hunter N.B. Moseley

Keyword(s):

Protein Data Bank ◽

Electron Density ◽

Structural Model ◽

Data Bank ◽

X Ray ◽

New Methods ◽

Link Type ◽

Density Maps ◽

Structure Factors ◽

Python Package

AbstractHigh-quality three-dimensional structural data is of great value for the functional interpretation of biomacromolecules, especially proteins; however, structural quality varies greatly across the entries in the worldwide Protein Data Bank (wwPDB). Since 2008, the wwPDB has required the inclusion of structure factors with the deposition of x-ray crystallographic structures to support the independent evaluation of structures with respect to the underlying experimental data used to derive those structures. However, interpreting the discrepancies between the structural model and its underlying electron density data is difficult, since derived electron density maps use arbitrary electron density units which are inconsistent between maps from different wwPDB entries. Therefore, we have developed a method that converts electron density values into units of electrons. With this conversion, we have developed new methods that can evaluate specific regions of an x-ray crystallographic structure with respect to a physicochemical interpretation of its corresponding electron density map. We have systematically compared all deposited x-ray crystallographic protein models in the wwPDB with their underlying electron density maps, if available, and characterized the electron density in terms of expected numbers of electrons based on the structural model. The methods generated coherent evaluation metrics throughout all PDB entries with associated electron density data, which are consistent with visualization software that would normally be used for manual quality assessment. To our knowledge, this is the first attempt to derive units of electrons directly from electron density maps without the aid of the underlying structure factors. These new metrics are biochemically-informative and can be extremely useful for filtering out low-quality structural regions from inclusion into systematic analyses that span large numbers of PDB entries. Furthermore, these new metrics will improve the ability of non-crystallographers to evaluate regions of interest within PDB entries, since only the PDB structure and the associated electron density maps are needed. These new methods are available as a well-documented Python package on GitHub and the Python Package Index under a modified Clear BSD open source license.Author summaryElectron density maps are very useful for validating the x-ray structure models in the Protein Data Bank (PDB). However, it is often daunting for non-crystallographers to use electron density maps, as it requires a lot of prior knowledge. This study provides methods that can infer chemical information solely from the electron density maps available from the PDB to interpret the electron density and electron density discrepancy values in terms of units of electrons. It also provides methods to evaluate regions of interest in terms of the number of missing or excessing electrons, so that a broader audience, such as biologists or bioinformaticians, can also make better use of the electron density information available in the PDB, especially for quality control purposes.Software and full results available athttps://github.com/MoseleyBioinformaticsLab/pdb_eda (software on GitHub)https://pypi.org/project/pdb-eda/ (software on PyPI)https://pdb-eda.readthedocs.io/en/latest/ (documentation on ReadTheDocs)https://doi.org/10.6084/m9.figshare.7994294 (code and results on FigShare)

Download Full-text

GeoMine: interactive pattern mining of protein–ligand interfaces in the Protein Data Bank

Bioinformatics ◽

10.1093/bioinformatics/btaa693 ◽

2020 ◽

Author(s):

Konrad Diedrich ◽

Joel Graef ◽

Katrin Schöning-Stierand ◽

Matthias Rarey

Keyword(s):

Protein Data Bank ◽

Web Application ◽

Pattern Mining ◽

Structural Data ◽

Data Bank ◽

Supplementary Information ◽

User Friendliness ◽

Iterative Search ◽

Potential Applications ◽

Query Generation

Abstract Summary The searching of user-defined 3D queries in molecular interfaces is a computationally challenging problem that is not satisfactorily solved so far. Most of the few existing tools focused on that purpose are desktop based and not openly available. Besides that, they show a lack of query versatility, search efficiency and user-friendliness. We address this issue with GeoMine, a publicly available web application that provides textual, numerical and geometrical search functionality for protein–ligand binding sites derived from structural data contained in the Protein Data Bank (PDB). The query generation is supported by a 3D representation of a start structure that provides interactively selectable elements like atoms, bonds and interactions. GeoMine gives full control over geometric variability in the query while performing a deterministic, precise search. Reasonably selective queries are processed on the entire set of protein–ligand complexes in the PDB within a few minutes. GeoMine offers an interactive and iterative search process of successive result analyses and query adaptations. From the numerous potential applications, we picked two from the field of side-effect analyze showcasing the usefulness of GeoMine. Availability and implementation GeoMine is part of the ProteinsPlus web application suite and freely available at https://proteins.plus. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text