scholarly journals PDBrenum: a webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences

2021 ◽  
Author(s):  
Bulat Faezov ◽  
Roland L. Dunbrack

AbstractThe Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In the beginning the archive held only seven structures but in early 2021, the database has more than 170,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions (e.g., binding partners such as ligands, nucleic acids, or other proteins; mutations and post-translational modifications), thus enabling comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. In addition to the coordinates, there are many fields that contain information regarding specific residues in the sequence of each protein in the entry. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe. The server can also take a list of UniProt identifiers (“P04637” or “P53_HUMAN”) and return the desired files.AvailabilitySource code is freely available at https://github.com/Faezov/PDBrenum. The webserver is located at: http://dunbrack3.fccc.edu/[email protected] or [email protected].

PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0253411
Author(s):  
Bulat Faezov ◽  
Roland L. Dunbrack

The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., “P04637” or “P53_HUMAN”) and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.


2020 ◽  
Vol 76 (5) ◽  
pp. 400-405 ◽  
Author(s):  
John H. Beale

The number of new X-ray crystallography-based submissions to the Protein Data Bank appears to be at the beginning of a decline, perhaps signalling an end to the era of the dominance of X-ray crystallography within structural biology. This letter, from the viewpoint of a young structural biologist, applies the Copernican method to the life expectancy of crystallography and asks whether the technique is still the mainstay of structural biology. A study of the rate of Protein Data Bank depositions allows a more nuanced analysis of the fortunes of macromolecular X-ray crystallography and shows that cryo-electron microscopy might now be outcompeting crystallography for new labour and talent, perhaps heralding a change in the landscape of the field.


2019 ◽  
Author(s):  
Sen Yao ◽  
Hunter N.B. Moseley

AbstractHigh-quality three-dimensional structural data is of great value for the functional interpretation of biomacromolecules, especially proteins; however, structural quality varies greatly across the entries in the worldwide Protein Data Bank (wwPDB). Since 2008, the wwPDB has required the inclusion of structure factors with the deposition of x-ray crystallographic structures to support the independent evaluation of structures with respect to the underlying experimental data used to derive those structures. However, interpreting the discrepancies between the structural model and its underlying electron density data is difficult, since derived electron density maps use arbitrary electron density units which are inconsistent between maps from different wwPDB entries. Therefore, we have developed a method that converts electron density values into units of electrons. With this conversion, we have developed new methods that can evaluate specific regions of an x-ray crystallographic structure with respect to a physicochemical interpretation of its corresponding electron density map. We have systematically compared all deposited x-ray crystallographic protein models in the wwPDB with their underlying electron density maps, if available, and characterized the electron density in terms of expected numbers of electrons based on the structural model. The methods generated coherent evaluation metrics throughout all PDB entries with associated electron density data, which are consistent with visualization software that would normally be used for manual quality assessment. To our knowledge, this is the first attempt to derive units of electrons directly from electron density maps without the aid of the underlying structure factors. These new metrics are biochemically-informative and can be extremely useful for filtering out low-quality structural regions from inclusion into systematic analyses that span large numbers of PDB entries. Furthermore, these new metrics will improve the ability of non-crystallographers to evaluate regions of interest within PDB entries, since only the PDB structure and the associated electron density maps are needed. These new methods are available as a well-documented Python package on GitHub and the Python Package Index under a modified Clear BSD open source license.Author summaryElectron density maps are very useful for validating the x-ray structure models in the Protein Data Bank (PDB). However, it is often daunting for non-crystallographers to use electron density maps, as it requires a lot of prior knowledge. This study provides methods that can infer chemical information solely from the electron density maps available from the PDB to interpret the electron density and electron density discrepancy values in terms of units of electrons. It also provides methods to evaluate regions of interest in terms of the number of missing or excessing electrons, so that a broader audience, such as biologists or bioinformaticians, can also make better use of the electron density information available in the PDB, especially for quality control purposes.Software and full results available athttps://github.com/MoseleyBioinformaticsLab/pdb_eda (software on GitHub)https://pypi.org/project/pdb-eda/ (software on PyPI)https://pdb-eda.readthedocs.io/en/latest/ (documentation on ReadTheDocs)https://doi.org/10.6084/m9.figshare.7994294 (code and results on FigShare)


2018 ◽  
Vol 74 (3) ◽  
pp. 237-244 ◽  
Author(s):  
Oliver S. Smart ◽  
Vladimír Horský ◽  
Swanand Gore ◽  
Radka Svobodová Vařeková ◽  
Veronika Bendová ◽  
...  

Realising the importance of assessing the quality of the biomolecular structures deposited in the Protein Data Bank (PDB), the Worldwide Protein Data Bank (wwPDB) partners established Validation Task Forces to obtain advice on the methods and standards to be used to validate structures determined by X-ray crystallography, nuclear magnetic resonance spectroscopy and three-dimensional electron cryo-microscopy. The resulting wwPDB validation pipeline is an integral part of the wwPDB OneDep deposition, biocuration and validation system. The wwPDB Validation Service webserver (https://validate.wwpdb.org) can be used to perform checks prior to deposition. Here, it is shown how validation metrics can be combined to produce an overall score that allows the ranking of macromolecular structures and domains in search results. The ValTrendsDBdatabase provides users with a convenient way to access and analyse validation information and other properties of X-ray crystal structures in the PDB, including investigating trends in and correlations between different structure properties and validation metrics.


Author(s):  
Michael Duszenko ◽  
Lars Redecke ◽  
Celestin Nzanzu Mudogo ◽  
Benjamin Philip Sommer ◽  
Stefan Mogk ◽  
...  

During the last decade, the number of three-dimensional structures solved by X-ray crystallography has increased dramatically. By 2014, it had crossed the landmark of 100 000 biomolecular structures deposited in the Protein Data Bank. This tremendous increase in successfully crystallized proteins is primarily owing to improvements in cloning strategies, the automation of the crystallization process and new innovative approaches to monitor crystallization. However, these improvements are mainly restricted to soluble proteins, while the crystallization and structural analysis of membrane proteins or proteins that undergo major post-translational modifications remains challenging. In addition, the need for relatively large crystals for conventional X-ray crystallography usually prevents the analysis of dynamic processes within cells. Thus, the advent of high-brilliance synchrotron and X-ray free-electron laser (XFEL) sources and the establishment of serial crystallography (SFX) have opened new avenues in structural analysis using crystals that were formerly unusable. The successful structure elucidation of cathepsin B, accomplished by the use of microcrystals obtained byin vivocrystallization in baculovirus-infected Sf9 insect cells, clearly proved that crystals grown intracellularly are very well suited for X-ray analysis. Here, methods by whichin vivocrystals can be obtained, isolated and used for structural analysis by novel highly brilliant XFEL and synchrotron-radiation sources are summarized and discussed.


2013 ◽  
Vol 69 (12) ◽  
pp. 2293-2295 ◽  
Author(s):  
Robbie P. Joosten ◽  
Hayssam Soueidan ◽  
Lodewyk F. A. Wessels ◽  
Anastassis Perrakis

Most of the macromolecular structures in the Protein Data Bank (PDB), which are used daily by thousands of educators and scientists alike, are determined by X-ray crystallography. It was examined whether the crystallographic models and data were deposited to the PDB at the same time as the publications that describe them were submitted for peer review. This condition is necessary to ensure pre-publication validation and the quality of the PDB public archive. It was found that a significant proportion of PDB entries were submitted to the PDB after peer review of the corresponding publication started, and many were only submitted after peer review had ended. It is argued that clear description of journal policies and effective policing is important for pre-publication validation, which is key in ensuring the quality of the PDB and of peer-reviewed literature.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Koji Kato ◽  
Naoyuki Miyazaki ◽  
Tasuku Hamaguchi ◽  
Yoshiki Nakajima ◽  
Fusamichi Akita ◽  
...  

AbstractPhotosystem II (PSII) plays a key role in water-splitting and oxygen evolution. X-ray crystallography has revealed its atomic structure and some intermediate structures. However, these structures are in the crystalline state and its final state structure has not been solved. Here we analyzed the structure of PSII in solution at 1.95 Å resolution by single-particle cryo-electron microscopy (cryo-EM). The structure obtained is similar to the crystal structure, but a PsbY subunit was visible in the cryo-EM structure, indicating that it represents its physiological state more closely. Electron beam damage was observed at a high-dose in the regions that were easily affected by redox states, and reducing the beam dosage by reducing frames from 50 to 2 yielded a similar resolution but reduced the damage remarkably. This study will serve as a good indicator for determining damage-free cryo-EM structures of not only PSII but also all biological samples, especially redox-active metalloproteins.


2002 ◽  
Vol 30 (4) ◽  
pp. 521-525 ◽  
Author(s):  
O. S. Makin ◽  
L. C. Serpell

The pathogenesis of the group of diseases known collectively as the amyloidoses is characterized by the deposition of insoluble amyloid fibrils. These are straight, unbranching structures about 70–120 å (1 å = 0.1 nm) in diameter and of indeterminate length formed by the self-assembly of a diverse group of normally soluble proteins. Knowledge of the structure of these fibrils is necessary for the understanding of their abnormal assembly and deposition, possibly leading to the rational design of therapeutic agents for their prevention or disaggregation. Structural elucidation is impeded by fibril insolubility and inability to crystallize, thus preventing the use of X-ray crystallography and solution NMR. CD, Fourier-transform infrared spectroscopy and light scattering have been used in the study of the mechanism of fibril formation. This review concentrates on the structural information about the final, mature fibril and in particular the complementary techniques of cryo-electron microscopy, solid-state NMR and X-ray fibre diffraction.


2015 ◽  
Vol 71 (8) ◽  
pp. 1657-1667 ◽  
Author(s):  
Andrew H. Van Benschoten ◽  
Pavel V. Afonine ◽  
Thomas C. Terwilliger ◽  
Michael E. Wall ◽  
Colin J. Jackson ◽  
...  

Identifying the intramolecular motions of proteins and nucleic acids is a major challenge in macromolecular X-ray crystallography. Because Bragg diffraction describes the average positional distribution of crystalline atoms with imperfect precision, the resulting electron density can be compatible with multiple models of motion. Diffuse X-ray scattering can reduce this degeneracy by reporting on correlated atomic displacements. Although recent technological advances are increasing the potential to accurately measure diffuse scattering, computational modeling and validation tools are still needed to quantify the agreement between experimental data and different parameterizations of crystalline disorder. A new tool,phenix.diffuse, addresses this need by employing Guinier's equation to calculate diffuse scattering from Protein Data Bank (PDB)-formatted structural ensembles. As an example case,phenix.diffuseis applied to translation–libration–screw (TLS) refinement, which models rigid-body displacement for segments of the macromolecule. To enable the calculation of diffuse scattering from TLS-refined structures,phenix.tls_as_xyzbuilds multi-model PDB files that sample the underlying T, L and S tensors. In the glycerophosphodiesterase GpdQ, alternative TLS-group partitioning and different motional correlations between groups yield markedly dissimilar diffuse scattering maps with distinct implications for molecular mechanism and allostery. These methods demonstrate how, in principle, X-ray diffuse scattering could extend macromolecular structural refinement, validation and analysis.


Sign in / Sign up

Export Citation Format

Share Document