PDBrenum: a webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences

Mapping Intimacies ◽

10.1101/2021.02.14.431128 ◽

2021 ◽

Author(s):

Bulat Faezov ◽

Roland L. Dunbrack

Keyword(s):

Protein Data Bank ◽

Data Bank ◽

Post Translational Modifications ◽

X Ray ◽

X Ray Crystallography ◽

Link Type ◽

Binding Partners ◽

Cryo Electron Microscopy ◽

Comparative Structure ◽

In The Beginning

AbstractThe Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In the beginning the archive held only seven structures but in early 2021, the database has more than 170,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions (e.g., binding partners such as ligands, nucleic acids, or other proteins; mutations and post-translational modifications), thus enabling comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. In addition to the coordinates, there are many fields that contain information regarding specific residues in the sequence of each protein in the entry. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe. The server can also take a list of UniProt identifiers (“P04637” or “P53_HUMAN”) and return the desired files.AvailabilitySource code is freely available at https://github.com/Faezov/PDBrenum. The webserver is located at: http://dunbrack3.fccc.edu/[email protected] or [email protected].

Download Full-text

PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences

PLoS ONE ◽

10.1371/journal.pone.0253411 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0253411

Author(s):

Bulat Faezov ◽

Roland L. Dunbrack

Keyword(s):

Protein Data Bank ◽

Data Bank ◽

Signal Peptides ◽

Asymmetric Unit ◽

Post Translational Modifications ◽

X Ray ◽

X Ray Crystallography ◽

Binding Partners ◽

Cryo Electron Microscopy ◽

Comparative Structure

The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., “P04637” or “P53_HUMAN”) and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.

Download Full-text

Macromolecular X-ray crystallography: soon to be a road less travelled?

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798320004660 ◽

2020 ◽

Vol 76 (5) ◽

pp. 400-405 ◽

Cited By ~ 1

Author(s):

John H. Beale

Keyword(s):

Electron Microscopy ◽

Life Expectancy ◽

Protein Data Bank ◽

Structural Biology ◽

Data Bank ◽

New Labour ◽

X Ray ◽

X Ray Crystallography ◽

Cryo Electron Microscopy

The number of new X-ray crystallography-based submissions to the Protein Data Bank appears to be at the beginning of a decline, perhaps signalling an end to the era of the dominance of X-ray crystallography within structural biology. This letter, from the viewpoint of a young structural biologist, applies the Copernican method to the life expectancy of crystallography and asks whether the technique is still the mainstay of structural biology. A study of the rate of Protein Data Bank depositions allows a more nuanced analysis of the fortunes of macromolecular X-ray crystallography and shows that cryo-electron microscopy might now be outcompeting crystallography for new labour and talent, perhaps heralding a change in the landscape of the field.

Download Full-text

A chemical interpretation of protein electron density maps in the worldwide protein data bank

10.1101/613109 ◽

2019 ◽

Cited By ~ 3

Author(s):

Sen Yao ◽

Hunter N.B. Moseley

Keyword(s):

Protein Data Bank ◽

Electron Density ◽

Structural Model ◽

Data Bank ◽

X Ray ◽

New Methods ◽

Link Type ◽

Density Maps ◽

Structure Factors ◽

Python Package

AbstractHigh-quality three-dimensional structural data is of great value for the functional interpretation of biomacromolecules, especially proteins; however, structural quality varies greatly across the entries in the worldwide Protein Data Bank (wwPDB). Since 2008, the wwPDB has required the inclusion of structure factors with the deposition of x-ray crystallographic structures to support the independent evaluation of structures with respect to the underlying experimental data used to derive those structures. However, interpreting the discrepancies between the structural model and its underlying electron density data is difficult, since derived electron density maps use arbitrary electron density units which are inconsistent between maps from different wwPDB entries. Therefore, we have developed a method that converts electron density values into units of electrons. With this conversion, we have developed new methods that can evaluate specific regions of an x-ray crystallographic structure with respect to a physicochemical interpretation of its corresponding electron density map. We have systematically compared all deposited x-ray crystallographic protein models in the wwPDB with their underlying electron density maps, if available, and characterized the electron density in terms of expected numbers of electrons based on the structural model. The methods generated coherent evaluation metrics throughout all PDB entries with associated electron density data, which are consistent with visualization software that would normally be used for manual quality assessment. To our knowledge, this is the first attempt to derive units of electrons directly from electron density maps without the aid of the underlying structure factors. These new metrics are biochemically-informative and can be extremely useful for filtering out low-quality structural regions from inclusion into systematic analyses that span large numbers of PDB entries. Furthermore, these new metrics will improve the ability of non-crystallographers to evaluate regions of interest within PDB entries, since only the PDB structure and the associated electron density maps are needed. These new methods are available as a well-documented Python package on GitHub and the Python Package Index under a modified Clear BSD open source license.Author summaryElectron density maps are very useful for validating the x-ray structure models in the Protein Data Bank (PDB). However, it is often daunting for non-crystallographers to use electron density maps, as it requires a lot of prior knowledge. This study provides methods that can infer chemical information solely from the electron density maps available from the PDB to interpret the electron density and electron density discrepancy values in terms of units of electrons. It also provides methods to evaluate regions of interest in terms of the number of missing or excessing electrons, so that a broader audience, such as biologists or bioinformaticians, can also make better use of the electron density information available in the PDB, especially for quality control purposes.Software and full results available athttps://github.com/MoseleyBioinformaticsLab/pdb_eda (software on GitHub)https://pypi.org/project/pdb-eda/ (software on PyPI)https://pdb-eda.readthedocs.io/en/latest/ (documentation on ReadTheDocs)https://doi.org/10.6084/m9.figshare.7994294 (code and results on FigShare)

Download Full-text

Worldwide Protein Data Bank validation information: usage and trends

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798318003303 ◽

2018 ◽

Vol 74 (3) ◽

pp. 237-244 ◽

Cited By ~ 5

Author(s):

Oliver S. Smart ◽

Vladimír Horský ◽

Swanand Gore ◽

Radka Svobodová Vařeková ◽

Veronika Bendová ◽

...

Keyword(s):

Protein Data Bank ◽

Three Dimensional ◽

Data Bank ◽

Resonance Spectroscopy ◽

X Ray ◽

Validation Metrics ◽

Task Forces ◽

X Ray Crystallography ◽

Structure Properties

Realising the importance of assessing the quality of the biomolecular structures deposited in the Protein Data Bank (PDB), the Worldwide Protein Data Bank (wwPDB) partners established Validation Task Forces to obtain advice on the methods and standards to be used to validate structures determined by X-ray crystallography, nuclear magnetic resonance spectroscopy and three-dimensional electron cryo-microscopy. The resulting wwPDB validation pipeline is an integral part of the wwPDB OneDep deposition, biocuration and validation system. The wwPDB Validation Service webserver (https://validate.wwpdb.org) can be used to perform checks prior to deposition. Here, it is shown how validation metrics can be combined to produce an overall score that allows the ranking of macromolecular structures and domains in search results. The ValTrendsDBdatabase provides users with a convenient way to access and analyse validation information and other properties of X-ray crystal structures in the PDB, including investigating trends in and correlations between different structure properties and validation metrics.

Download Full-text

In vivoprotein crystallization in combination with highly brilliant radiation sources offers novel opportunities for the structural analysis of post-translationally modified eukaryotic proteins

Acta Crystallographica Section F Structural Biology Communications ◽

10.1107/s2053230x15011450 ◽

2015 ◽

Vol 71 (8) ◽

pp. 929-937 ◽

Cited By ~ 17

Author(s):

Michael Duszenko ◽

Lars Redecke ◽

Celestin Nzanzu Mudogo ◽

Benjamin Philip Sommer ◽

Stefan Mogk ◽

...

Keyword(s):

Structural Analysis ◽

Three Dimensional ◽

Data Bank ◽

Post Translational Modifications ◽

X Ray ◽

X Ray Crystallography ◽

Radiation Sources ◽

Eukaryotic Proteins ◽

Serial Crystallography

During the last decade, the number of three-dimensional structures solved by X-ray crystallography has increased dramatically. By 2014, it had crossed the landmark of 100 000 biomolecular structures deposited in the Protein Data Bank. This tremendous increase in successfully crystallized proteins is primarily owing to improvements in cloning strategies, the automation of the crystallization process and new innovative approaches to monitor crystallization. However, these improvements are mainly restricted to soluble proteins, while the crystallization and structural analysis of membrane proteins or proteins that undergo major post-translational modifications remains challenging. In addition, the need for relatively large crystals for conventional X-ray crystallography usually prevents the analysis of dynamic processes within cells. Thus, the advent of high-brilliance synchrotron and X-ray free-electron laser (XFEL) sources and the establishment of serial crystallography (SFX) have opened new avenues in structural analysis using crystals that were formerly unusable. The successful structure elucidation of cathepsin B, accomplished by the use of microcrystals obtained byin vivocrystallization in baculovirus-infected Sf9 insect cells, clearly proved that crystals grown intracellularly are very well suited for X-ray analysis. Here, methods by whichin vivocrystals can be obtained, isolated and used for structural analysis by novel highly brilliant XFEL and synchrotron-radiation sources are summarized and discussed.

Download Full-text

Timely deposition of macromolecular structures is necessary for peer review

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913024621 ◽

2013 ◽

Vol 69 (12) ◽

pp. 2293-2295 ◽

Cited By ~ 3

Author(s):

Robbie P. Joosten ◽

Hayssam Soueidan ◽

Lodewyk F. A. Wessels ◽

Anastassis Perrakis

Keyword(s):

Peer Review ◽

Protein Data Bank ◽

Significant Proportion ◽

Data Bank ◽

X Ray ◽

X Ray Crystallography ◽

Journal Policies ◽

Clear Description

Most of the macromolecular structures in the Protein Data Bank (PDB), which are used daily by thousands of educators and scientists alike, are determined by X-ray crystallography. It was examined whether the crystallographic models and data were deposited to the PDB at the same time as the publications that describe them were submitted for peer review. This condition is necessary to ensure pre-publication validation and the quality of the PDB public archive. It was found that a significant proportion of PDB entries were submitted to the PDB after peer review of the corresponding publication started, and many were only submitted after peer review had ended. It is argued that clear description of journal policies and effective policing is important for pre-publication validation, which is key in ensuring the quality of the PDB and of peer-reviewed literature.

Download Full-text

High-resolution cryo-EM structure of photosystem II reveals damage from high-dose electron beams

Communications Biology ◽

10.1038/s42003-021-01919-3 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Koji Kato ◽

Naoyuki Miyazaki ◽

Tasuku Hamaguchi ◽

Yoshiki Nakajima ◽

Fusamichi Akita ◽

...

Keyword(s):

Photosystem Ii ◽

Physiological State ◽

High Dose ◽

Final State ◽

X Ray ◽

Redox States ◽

X Ray Crystallography ◽

Cryo Electron Microscopy ◽

Redox Active ◽

Beam Damage

AbstractPhotosystem II (PSII) plays a key role in water-splitting and oxygen evolution. X-ray crystallography has revealed its atomic structure and some intermediate structures. However, these structures are in the crystalline state and its final state structure has not been solved. Here we analyzed the structure of PSII in solution at 1.95 Å resolution by single-particle cryo-electron microscopy (cryo-EM). The structure obtained is similar to the crystal structure, but a PsbY subunit was visible in the cryo-EM structure, indicating that it represents its physiological state more closely. Electron beam damage was observed at a high-dose in the regions that were easily affected by redox states, and reducing the beam dosage by reducing frames from 50 to 2 yielded a similar resolution but reduced the damage remarkably. This study will serve as a good indicator for determining damage-free cryo-EM structures of not only PSII but also all biological samples, especially redox-active metalloproteins.

Download Full-text

Examining the structure of the mature amyloid fibril

Biochemical Society Transactions ◽

10.1042/bst0300521 ◽

2002 ◽

Vol 30 (4) ◽

pp. 521-525 ◽

Cited By ~ 43

Author(s):

O. S. Makin ◽

L. C. Serpell

Keyword(s):

Self Assembly ◽

Rational Design ◽

Amyloid Fibrils ◽

Structural Information ◽

X Ray ◽

X Ray Crystallography ◽

Cryo Electron Microscopy ◽

Fibre Diffraction ◽

Complementary Techniques ◽

Mature Fibril

The pathogenesis of the group of diseases known collectively as the amyloidoses is characterized by the deposition of insoluble amyloid fibrils. These are straight, unbranching structures about 70–120 å (1 å = 0.1 nm) in diameter and of indeterminate length formed by the self-assembly of a diverse group of normally soluble proteins. Knowledge of the structure of these fibrils is necessary for the understanding of their abnormal assembly and deposition, possibly leading to the rational design of therapeutic agents for their prevention or disaggregation. Structural elucidation is impeded by fibril insolubility and inability to crystallize, thus preventing the use of X-ray crystallography and solution NMR. CD, Fourier-transform infrared spectroscopy and light scattering have been used in the study of the mechanism of fibril formation. This review concentrates on the structural information about the final, mature fibril and in particular the complementary techniques of cryo-electron microscopy, solid-state NMR and X-ray fibre diffraction.

Download Full-text

Structural biology techniques: X-ray crystallography, cryo-electron microscopy, and small-angle X-ray scattering

Practical Approaches to Biological Inorganic Chemistry ◽

10.1016/b978-0-444-64225-7.00010-9 ◽

2020 ◽

pp. 375-416

Author(s):

José A. Brito ◽

Margarida Archer

Keyword(s):

Electron Microscopy ◽

Structural Biology ◽

Small Angle ◽

X Ray ◽

X Ray Crystallography ◽

X Ray Scattering ◽

Cryo Electron Microscopy ◽

Ray Scattering

Download Full-text

Predicting X-ray diffuse scattering from translation–libration–screw structural ensembles

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004715007415 ◽

2015 ◽

Vol 71 (8) ◽

pp. 1657-1667 ◽

Cited By ~ 11

Author(s):

Andrew H. Van Benschoten ◽

Pavel V. Afonine ◽

Thomas C. Terwilliger ◽

Michael E. Wall ◽

Colin J. Jackson ◽

...

Keyword(s):

Diffuse Scattering ◽

Positional Distribution ◽

Data Bank ◽

Bragg Diffraction ◽

Structural Refinement ◽

X Ray ◽

X Ray Crystallography ◽

Atomic Displacements ◽

Technological Advances ◽

Structural Ensembles

Identifying the intramolecular motions of proteins and nucleic acids is a major challenge in macromolecular X-ray crystallography. Because Bragg diffraction describes the average positional distribution of crystalline atoms with imperfect precision, the resulting electron density can be compatible with multiple models of motion. Diffuse X-ray scattering can reduce this degeneracy by reporting on correlated atomic displacements. Although recent technological advances are increasing the potential to accurately measure diffuse scattering, computational modeling and validation tools are still needed to quantify the agreement between experimental data and different parameterizations of crystalline disorder. A new tool,phenix.diffuse, addresses this need by employing Guinier's equation to calculate diffuse scattering from Protein Data Bank (PDB)-formatted structural ensembles. As an example case,phenix.diffuseis applied to translation–libration–screw (TLS) refinement, which models rigid-body displacement for segments of the macromolecule. To enable the calculation of diffuse scattering from TLS-refined structures,phenix.tls_as_xyzbuilds multi-model PDB files that sample the underlying T, L and S tensors. In the glycerophosphodiesterase GpdQ, alternative TLS-group partitioning and different motional correlations between groups yield markedly dissimilar diffuse scattering maps with distinct implications for molecular mechanism and allostery. These methods demonstrate how, in principle, X-ray diffuse scattering could extend macromolecular structural refinement, validation and analysis.

Download Full-text