THE RAMACHANDRAN MAP OF MORE THAN 6,500 PERFECT POLYPEPTIDE CHAINS

The Protein Data Bank (PDB) is the most important depository of protein structural information, containing more than 45,000 deposited entries today. Because of its inhomogeneous structure, its fully automated processing is almost impossible. In a previous work, we cleaned and re-structured the entries in the Protein Data Bank, and from the result we have built the RS-PDB database. Using the RS-PDB database, we draw a Ramachandran-plot from 6,593 "perfect" polypeptide chains found in the PDB, containing 1,192,689 residues. This is a more than tenfold increase in the size of data analyzed before this work. The density of the data points makes it possible to draw a logarithmic heat map enhanced Ramachandran map, showing the fine inner structure of the right-handed α-helix region.

Download Full-text

High throughput processing of the structural information in the protein data bank

Journal of Molecular Graphics and Modelling ◽

10.1016/j.jmgm.2006.08.004 ◽

2007 ◽

Vol 25 (6) ◽

pp. 831-836 ◽

Cited By ~ 10

Author(s):

Zoltan Szabadka ◽

Vince Grolmusz

Keyword(s):

Protein Data Bank ◽

High Throughput ◽

Structural Information ◽

Data Bank

Download Full-text

Protein Data Bank (PDB): Database of Three-Dimensional Structural Information of Biological Macromolecules

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444998009378 ◽

1998 ◽

Vol 54 (6) ◽

pp. 1078-1084 ◽

Cited By ~ 250

Author(s):

Joel L. Sussman ◽

Dawei Lin ◽

Jiansheng Jiang ◽

Nancy O. Manning ◽

Jaime Prilusky ◽

...

Keyword(s):

Nucleic Acids ◽

Protein Data Bank ◽

Structural Information ◽

National Laboratory ◽

Three Dimensional ◽

Data Bank ◽

Brookhaven National Laboratory ◽

Biological Macromolecules

The Protein Data Bank (PDB) at Brookhaven National Laboratory, is a database containing experimentally determined three-dimensional structures of proteins, nucleic acids and other biological macromolecules, with approximately 8000 entries. Data are easily submittedviaPDB's WWW-based toolAutoDep, in either mmCIF or PDB format, and are most conveniently examinedviaPDB's WWW-based tool3DB Browser.

Download Full-text

Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling?

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0024 ◽

2015 ◽

Vol 11 (1) ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Michal Brylinski

Keyword(s):

Protein Structure ◽

Protein Data Bank ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Structural Information ◽

Three Dimensional ◽

Data Bank ◽

Prediction Problem ◽

Three Dimensional Models ◽

Protein Structure Prediction Problem

AbstractThe Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.

Download Full-text

Interchain interaction of 13C NMR chemical shift and electronic structure of polypeptide chains in the solid state as studied by tight-binding MO theory: poly(l-alanine) with the right-handed and left-handed α-helix forms

Journal of Molecular Structure THEOCHEM ◽

10.1016/0166-1280(91)85222-s ◽

1991 ◽

Vol 231 ◽

pp. 231-242 ◽

Cited By ~ 9

Author(s):

Hiromichi Kurosu ◽

Iaao Ando

Keyword(s):

Electronic Structure ◽

Solid State ◽

Chemical Shift ◽

Tight Binding ◽

Interchain Interaction ◽

Left Handed ◽

Α Helix ◽

The Right ◽

Polypeptide Chains ◽

Mo Theory

Download Full-text

Lemon: a framework for rapidly mining structural information from the Protein Data Bank

Bioinformatics ◽

10.1093/bioinformatics/btz178 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4165-4167 ◽

Cited By ~ 1

Author(s):

Jonathan Fine ◽

Gaurav Chopra

Keyword(s):

Protein Data Bank ◽

Structural Information ◽

Computational Cost ◽

Data Bank ◽

Structural Features ◽

Supplementary Information ◽

Develop Software ◽

Reading Text ◽

Essential Resource ◽

3D Descriptors

Abstract Motivation The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB. Results Our solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in <10 min on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python bindings provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features. Availability and implementation The Lemon software is available as a C++ header library along with a PyPI package and example functions at https://github.com/chopralab/lemon. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Lemon: a framework for rapidly mining structural information from the Protein Data Bank

10.1101/379891 ◽

2018 ◽

Author(s):

Jonathan Fine ◽

Gaurav Chopra

Keyword(s):

Protein Data Bank ◽

Structural Information ◽

Computational Cost ◽

Data Bank ◽

Structural Features ◽

Develop Software ◽

Reading Text ◽

One Stop ◽

Essential Resource ◽

3D Descriptors

AbstractMotivationThe protein data bank (PDB) currently holds over 140,000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize, and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB.ApproachOur solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in less than ten minutes on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format (MMTF) to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python binds provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features. The Lemon software is available as a C++ header library along with example functions at https://github.com/chopralab/lemon.

Download Full-text

Protein Data Bank (PDB): A Database of 3D Structural Information of Biological Macromolecules

Encyclopedia of Computational Chemistry ◽

10.1002/0470845015.cpa022f ◽

2002 ◽

Author(s):

Joel L. Sussman ◽

Frances C. Bernstein ◽

Jiansheng Jiang ◽

Michael Libeson ◽

Dawei Lin ◽

...

Keyword(s):

Protein Data Bank ◽

Structural Information ◽

Data Bank ◽

Biological Macromolecules

Download Full-text

Lemon: a framework for rapidly mining structural information from the protein data bank for the development of virtual screening benchmarking sets

10.1021/scimeetings.0c06740 ◽

2020 ◽

Author(s):

Chopra Gaurav ◽

Matthew Muhoberac ◽

Jonathan Fine

Keyword(s):

Virtual Screening ◽

Protein Data Bank ◽

Structural Information ◽

Data Bank

Download Full-text

The accuracy of NMR protein structures in the Protein Data Bank

10.1101/2021.04.05.438442 ◽

2021 ◽

Author(s):

Nicholas J Fowler ◽

Adnan Sljoka ◽

Mike P Williamson

Keyword(s):

Protein Data Bank ◽

Chemical Shifts ◽

Protein Structures ◽

Data Bank ◽

Random Coil ◽

Accuracy Analysis ◽

Nmr Structures ◽

Wide Range ◽

Polypeptide Chains ◽

Current Accuracy

We recently described a method, ANSURR, for measuring the accuracy of NMR protein structures. It is based on comparing residue-specific measures of rigidity from backbone chemical shifts via the random coil index, and from structures. Here, we report the use of ANSURR to analyse NMR ensembles within the Protein Data Bank (PDB). NMR structures cover a wide range of accuracy, which improved over time until about 2005, since when accuracy has not improved. Most structures have accurate secondary structure, but are too floppy, particularly in loops. There is a need for more experimental restraints in loops. The best current accuracy measures are Ramachandran distribution and number of NOE restraints per residue. The precision of structure ensembles correlates with accuracy, as does the number of hydrogen bond restraints per residue. If a structure contains additional components (such as additional polypeptide chains or ligands), then their inclusion improves accuracy. Analysis of over 7000 PDB NMR ensembles is available via our website ansurr.com.

Download Full-text

Lithium-Protein Interactions: Analysis of Lithium-Containing Protein Crystal Structures Deposited in the Protein Data Bank

Protein and Peptide Letters ◽

10.2174/0929866527666200305144447 ◽

2020 ◽

Vol 27 (8) ◽

pp. 763-769

Author(s):

Oliviero Carugo

Keyword(s):

Small Molecules ◽

Crystal Structures ◽

Protein Data Bank ◽

Protein Interactions ◽

Structural Information ◽

Protein Complexes ◽

Aerospace Industry ◽

Data Bank ◽

Protein Crystal ◽

Side Chain

Background: Despite the fact that lithium is not a biologically essential metallic element, its pharmacological properties are well known and human exposure to lithium is increasingly possible because of its used in aerospace industry and in batteries. Objective: Lithium-protein interactions are therefore interesting and the surveys of the structures of lithium-protein complexes is described in this paper. Methods: A high quality non-redundant set of lithium containing protein crystal structures was extracted from the Protein Data Bank and the stereochemistry of the lithium first coordination sphere was examined in detail. Results: Four main observations were reported: (i) lithium interacts preferably with oxygen atoms; (ii) preferably with side-chain atoms; (iii) preferably with Asp or Glu carboxylates; (iv) the coordination number tends to be four with stereochemical parameters similar to those observed in small molecules containing lithium. Conclusion: Although structural information on lithium-protein, available from the Protein Data Bank, is relatively scarce, these trends appears to be so clear that one may suppose that they will be confirmed by further data that will join the Protein Data Bank in the future.

Download Full-text