PAP: a protein analysis package

A program package has been assembled for the analysis of protein coordinates which are in the Brookhaven Protein Data Bank (PDB) format. These programs can be used to make two types of φ–ψ plots: a Ramachandran-style scatter plot, and a plot of φ and ψ values as a function of the linear sequence. Programs are also available for the display of distance diagonal plots for proteins. Two protein structures can be compared and the resulting r.m.s. differences in the structures plotted as a function of sequence. Temperature factors can be analyzed and plotted as a function of the linear sequence. In addition, various utilities are supplied for splitting PDB files which contain multiple subunits into individual files and also for renumbering PDB files. A utility is also provided for converting Amber-style PDB files into standard PDB files. Priestle's program RIBBON [J. Appl. Cryst. (1988), 21, 572–576] has been converted to run in a stand-alone mode with interactive rotation of the three-dimensional ribbon picture. Programs are Silicon Graphics four-dimensional level and have been tested on 4D70/GT and personal Iris workstations, although programs which give Postscript output have been converted to run on Digital Equipment Corporation VAX computers and Sun workstations.

Download Full-text

Online_DPI: a web server to calculate the diffraction precision index for a protein structure

Journal of Applied Crystallography ◽

10.1107/s1600576715006287 ◽

2015 ◽

Vol 48 (3) ◽

pp. 939-942 ◽

Cited By ~ 39

Author(s):

K. S. Dinesh Kumar ◽

M. Gurusaran ◽

S. N. Satheesh ◽

P. Radha ◽

S. Pavithra ◽

...

Keyword(s):

Protein Data Bank ◽

Three Dimensional ◽

Web Server ◽

Data Bank ◽

Atomic Displacement ◽

Precision Index ◽

Client Machine ◽

Atomic Displacement Parameters ◽

Temperature Factors ◽

Coordinate Error

An online computing server,Online_DPI(where DPI denotes the diffraction precision index), has been created to calculate the `Cruickshank DPI' value for a given three-dimensional protein or macromolecular structure. It also estimates the atomic coordinate error for all the atoms available in the structure. It is an easy-to-use web server that enables users to visualize the computed values dynamically on the client machine. Users can provide the Protein Data Bank (PDB) identification code or upload the three-dimensional atomic coordinates from the client machine. The computed DPI value for the structure and the atomic coordinate errors for all the atoms are included in the revised PDB file. Further, users can graphically view the atomic coordinate error along with `temperature factors' (i.e.atomic displacement parameters). In addition, the computing engine is interfaced with an up-to-date local copy of the Protein Data Bank. New entries are updated every week, and thus users can access all the structures available in the Protein Data Bank. The computing engine is freely accessible online at http://cluster.physics.iisc.ernet.in/dpi/.

Download Full-text

Searching techniques for databases of protein secondary structures

Journal of Information Science ◽

10.1177/016555158901500411 ◽

1989 ◽

Vol 15 (4-5) ◽

pp. 287-298 ◽

Cited By ~ 11

Author(s):

Peter J. Artymiuk ◽

David W. Rice ◽

Eleanor M. Mitchell ◽

Peter Willett

Keyword(s):

Secondary Structure ◽

Protein Data Bank ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Secondary Structures ◽

British Library ◽

Geometric Information ◽

Protein Secondary Structures ◽

Funded Research Project

This paper summarizes the findings of a recent, British Library-funded research project into computer techniques for searching the three-dimensional protein structures that occur in the Protein Data Bank. The work focuses on the secondary structures of proteins and utilizes both angular and distance geometric information. Algorithms are presented for the auto matic identification of secondary structure elements, of sec ondary structure motifs and of proteins with similar secondary structures.

Download Full-text

Enriched Conformational Sampling of DNA and Proteins with a Hybrid Hamiltonian Derived from the Protein Data Bank

International Journal of Molecular Sciences ◽

10.3390/ijms19113405 ◽

2018 ◽

Vol 19 (11) ◽

pp. 3405 ◽

Cited By ~ 3

Author(s):

Emanuel Peter ◽

Jiří Černý

Keyword(s):

Partition Function ◽

Protein Data Bank ◽

Protein Structures ◽

Data Bank ◽

Weighting Factor ◽

Potential Of Mean Force ◽

Conformational Space ◽

Dynamics Simulation ◽

Conformational Sampling ◽

Speed Increase

In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson–Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.

Download Full-text

Conformational variability in proteins bound to single-stranded DNA: a new benchmark for new docking perspectives

10.22541/au.162040366.69255354/v1 ◽

2021 ◽

Author(s):

Dominique MIAS-LUCQUIN ◽

Isaure Chauvot de Beauchêne

Keyword(s):

Protein Data Bank ◽

Conformational Changes ◽

Molecular Interactions ◽

Protein Structures ◽

Data Bank ◽

Computational Docking ◽

Ssdna Binding ◽

Conformational Variability ◽

High Flexibility ◽

Docking Benchmark

We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included. For the 143 groups identified as bound-unbound structures of the same protein , we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions, and compared it to the supposedly binding-induced modifications. This benchmark is, to our knowledge, the first attempt made to peruse available structures of protein – ssDNA interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.

Download Full-text

[29] Protein data bank archives of three-dimensional macromolecular structures

Methods in Enzymology - Macromolecular Crystallography Part B ◽

10.1016/s0076-6879(97)77031-9 ◽

1997 ◽

pp. 556-571 ◽

Cited By ~ 118

Author(s):

Enrique E. Abola ◽

Joel L. Sussman ◽

Jaime Prilusky ◽

Nancy O. Manning

Keyword(s):

Protein Data Bank ◽

Three Dimensional ◽

Data Bank

Download Full-text

MRPC (Missing Regions in Polypeptide Chains): a knowledgebase

Journal of Applied Crystallography ◽

10.1107/s1600576719012330 ◽

2019 ◽

Vol 52 (6) ◽

pp. 1422-1426

Author(s):

Rajendran Santhosh ◽

Namrata Bankoti ◽

Adgonda Malgonnavar Padmashri ◽

Daliah Michael ◽

Jeyaraman Jeyakanthan ◽

...

Keyword(s):

Protein Structures ◽

Three Dimensional ◽

Protein Molecule ◽

Data Bank ◽

Protein Crystal ◽

Dimensional Structure ◽

Protein Structure Analysis ◽

Three Dimensional Structure ◽

X Ray Crystallography ◽

Polypeptide Chains

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.

Download Full-text

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Nucleic Acids Research ◽

10.1093/nar/gkaa1097 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D452-D457

Author(s):

Lisanna Paladin ◽

Martina Bevilacqua ◽

Sara Errigo ◽

Damiano Piovesan ◽

Ivan Mičetić ◽

...

Keyword(s):

Protein Data Bank ◽

Tandem Repeat ◽

Tandem Repeats ◽

Classification Scheme ◽

Sequence Similarity ◽

Protein Structures ◽

Hierarchical Classification ◽

Structural Similarity ◽

Data Bank ◽

Similarity Class

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

Download Full-text

BLASTing away preconceptions in crystallization trials

Acta Crystallographica Section F Structural Biology Communications ◽

10.1107/s2053230x19000141 ◽

2019 ◽

Vol 75 (3) ◽

pp. 184-192 ◽

Cited By ~ 9

Author(s):

Gabriel Jan Abrahams ◽

Janet Newman

Keyword(s):

Protein Data Bank ◽

Sequence Similarity ◽

Three Dimensional ◽

Protein Sequences ◽

Protein Molecule ◽

Data Bank ◽

Dimensional Structure ◽

Critical Step ◽

Quantitative Verification ◽

Better Than

Crystallization is in many cases a critical step for solving the three-dimensional structure of a protein molecule. Determining which set of chemicals to use in the initial screen is typically agnostic of the protein under investigation; however, crystallization efficiency could potentially be improved if this were not the case. Previous work has assumed that sequence similarity may provide useful information about appropriate crystallization cocktails; however, the authors are not aware of any quantitative verification of this assumption. This research investigates whether, given current information, one can detect any correlation between sequence similarity and crystallization cocktails. BLAST was used to quantitate the similarity between protein sequences in the Protein Data Bank, and this was compared with three estimations of the chemical similarities of the respective crystallization cocktails. No correlation was detected between proteins of similar (but not identical) sequence and their crystallization cocktails, suggesting that methods of determining screens based on this assumption are unlikely to result in screens that are better than those currently in use.

Download Full-text

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)

International Journal of Molecular Sciences ◽

10.3390/ijms21062243 ◽

2020 ◽

Vol 21 (6) ◽

pp. 2243

Author(s):

Nicolas K. Shinada ◽

Peter Schmidtke ◽

Alexandre G. de Brevern

Keyword(s):

Protein Data Bank ◽

Protein Sequence ◽

Large Scale ◽

Protein Structures ◽

Structural Diversity ◽

Data Bank ◽

Protein Distribution ◽

Research Areas ◽

Identity Threshold ◽

Protein Sequence Identity

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.

Download Full-text

TOP: a new method for protein structure comparisons and similarity searches

Journal of Applied Crystallography ◽

10.1107/s0021889899012339 ◽

2000 ◽

Vol 33 (1) ◽

pp. 176-183 ◽

Cited By ~ 149

Author(s):

Guoguang Lu

Keyword(s):

User Interface ◽

Protein Structure ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Structure Alignment ◽

Dimensional Structure ◽

Protein Structure Alignment ◽

Protein Structure Analysis ◽

Structure Comparison

In order to facilitate the three-dimensional structure comparison of proteins, software for making comparisons and searching for similarities to protein structures in databases has been developed. The program identifies the residues that share similar positions of both main-chain and side-chain atoms between two proteins. The unique functions of the software also include database processingviaInternet- and Web-based servers for different types of users. The developed method and its friendly user interface copes with many of the problems that frequently occur in protein structure comparisons, such as detecting structurally equivalent residues, misalignment caused by coincident match of Cαatoms, circular sequence permutations, tedious repetition of access, maintenance of the most recent database, and inconvenience of user interface. The program is also designed to cooperate with other tools in structural bioinformatics, such as the 3DB Browser software [Prilusky (1998).Protein Data Bank Q. Newslett.84, 3–4] and the SCOP database [Murzin, Brenner, Hubbard & Chothia (1995).J. Mol. Biol.247, 536–540], for convenient molecular modelling and protein structure analysis. A similarity ranking score of `structure diversity' is proposed in order to estimate the evolutionary distance between proteins based on the comparisons of their three-dimensional structures. The function of the program has been utilized as a part of an automated program for multiple protein structure alignment. In this paper, the algorithm of the program and results of systematic tests are presented and discussed.

Download Full-text