scholarly journals Disordered Residues and Patterns in the Protein Data Bank

Molecules ◽  
2020 ◽  
Vol 25 (7) ◽  
pp. 1522 ◽  
Author(s):  
Mikhail Yu. Lobanov ◽  
Ilya V. Likhachev ◽  
Oxana V. Galzitskaya

We created a new library of disordered patterns and disordered residues in the Protein Data Bank (PDB). To obtain such datasets, we clustered the PDB and obtained the groups of chains with different identities and marked disordered residues. We elaborated a new procedure for finding disordered patterns and created a new version of the library. This library includes three sets of patterns: unique patterns, patterns consisting of two kinds of amino acids, and homo-repeats. Using this database, the user can: (1) find homologues in the entire Protein Data Bank; (2) perform a statistical analysis of disordered residues in protein structures; (3) search for disordered patterns and homo-repeats; (4) search for disordered regions in different chains of the same protein; (5) download clusters of protein chains with different identity from our database and library of disordered patterns; and (6) observe 3D structure interactively using MView. A new library of disordered patterns will help improve the accuracy of predictions for residues that will be structured or unstructured in a given region.

Author(s):  
Luciano Andres Abriata

Protein X-ray structures with non-corrin cobalt(II)-containing sites, either natural or substituting another native ion, were downloaded from the Protein Data Bank and explored to (i) describe which amino acids are involved in their first ligand shells and (ii) analyze cobalt(II)–donor bond lengths in comparison with previously reported target distances, CSD data and EXAFS data. The set of amino acids involved in CoIIbinding is similar to that observed for catalytic ZnIIsites,i.e.with a large fraction of carboxylate O atoms from aspartate and glutamate and aromatic N atoms from histidine. The computed CoII–donor bond lengths were found to depend strongly on structure resolution, an artifact previously detected for other metal–donor distances. Small corrections are suggested for the target bond lengths to the aromatic N atoms of histidines and the O atoms of water and hydroxide. The available target distance for cysteine (Scys) is confirmed; those for backbone O and other donors remain uncertain and should be handled with caution in refinement and modeling protocols. Finally, a relationship between both CoII—O bond lengths in bidentate carboxylates is quantified.


Author(s):  
Д.А. Тихонов ◽  
D.A. Tikhonov

In this paper a statistical analysis of distributions of inter-helical angles in pairs of consecutive and connected α-helices in spatial structures of proteins is presented. A number of rules for selection of the helical pairs from a set of protein structures obtained from the Protein Data Bank (PDB) were developed. The set of helical pairs has been analyzed for the purpose of classification and finding out the features of protein structural organization. All pairs of connected helices were divided into three subsets according to the criterion of crossing of projections of the helices on parallel planes, which pass through the axes of the helices. It is shown that the distribution of all types of helical pairs, whose projections do not cross each others, covers almost the entire range of inter-helical angles. The distribution have a single maximum which is close to right angle. Most pairs in this set constitute helical pairs consisting of α- and 310-helices, and most pairs with the crossing projections of helices are helical pairs formed by two α-helices. It is also shown that a great amount of the pairs of connected α-helices has acute angle 20° ≤ φ ≤ 60° between the axes of the helices. The distribution of all types of helical pairs depending on the length of the inter-helical connections was also analyzed. It is shown that the structures with short connections occur most often in all the subsets.


2018 ◽  
Vol 19 (11) ◽  
pp. 3405 ◽  
Author(s):  
Emanuel Peter ◽  
Jiří Černý

In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson–Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.


Author(s):  
Dominique MIAS-LUCQUIN ◽  
Isaure Chauvot de Beauchêne

We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included. For the 143 groups identified as bound-unbound structures of the same protein , we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions, and compared it to the supposedly binding-induced modifications. This benchmark is, to our knowledge, the first attempt made to peruse available structures of protein – ssDNA interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.


2020 ◽  
Vol 49 (D1) ◽  
pp. D452-D457
Author(s):  
Lisanna Paladin ◽  
Martina Bevilacqua ◽  
Sara Errigo ◽  
Damiano Piovesan ◽  
Ivan Mičetić ◽  
...  

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.


Oncogene ◽  
2020 ◽  
Vol 39 (43) ◽  
pp. 6623-6632
Author(s):  
David S. Goodsell ◽  
Stephen K. Burley

Abstract Atomic-level three-dimensional (3D) structure data for biological macromolecules often prove critical to dissecting and understanding the precise mechanisms of action of cancer-related proteins and their diverse roles in oncogenic transformation, proliferation, and metastasis. They are also used extensively to identify potentially druggable targets and facilitate discovery and development of both small-molecule and biologic drugs that are today benefiting individuals diagnosed with cancer around the world. 3D structures of biomolecules (including proteins, DNA, RNA, and their complexes with one another, drugs, and other small molecules) are freely distributed by the open-access Protein Data Bank (PDB). This global data repository is used by millions of scientists and educators working in the areas of drug discovery, vaccine design, and biomedical and biotechnology research. The US Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) provides an integrated portal to the PDB archive that streamlines access for millions of worldwide PDB data consumers worldwide. Herein, we review online resources made available free of charge by the RCSB PDB to basic and applied researchers, healthcare providers, educators and their students, patients and their families, and the curious public. We exemplify the value of understanding cancer-related proteins in 3D with a case study focused on human papillomavirus.


2020 ◽  
Vol 21 (6) ◽  
pp. 2243
Author(s):  
Nicolas K. Shinada ◽  
Peter Schmidtke ◽  
Alexandre G. de Brevern

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.


2009 ◽  
Vol 07 (05) ◽  
pp. 755-771 ◽  
Author(s):  
ZAIXIN LU ◽  
ZHIYU ZHAO ◽  
SERGIO GARCIA ◽  
KRISHNAKUMAR KRISHNASWAMY ◽  
BIN FU

We have developed an algorithm and web tool to search similar protein structures in the PDB (Protein Data Bank). The algorithm is a combination of a series of methods including protein classification, geometric feature extraction, sequence alignment, and 3D structure alignment. Given a protein structure, the tool can efficiently discover similar structures from hundreds of thousands of structures stored in the PDB. Our experimental results show that it is more accurate than other well-known protein search systems including PSI-BLAST, 3D-BLAST, and SSM in finding proteins that are structurally similar to the query protein, and its speed is also competitive with those systems. The algorithm has been fully implemented and is accessible online at the address , which is supported by a cluster of computers.


2004 ◽  
Vol 1 (1) ◽  
pp. 80-89
Author(s):  
Guido Dieterich ◽  
Dirk W. Heinz ◽  
Joachim Reichelt

Abstract The 3D structures of biomacromolecules stored in the Protein Data Bank [1] were correlated with different external, biological information from public databases. We have matched the feature table of SWISS-PROT [2] entries as well InterPro [3] domains and function sites with the corresponding 3D-structures. OMIM [4] (Online Mendelian Inheritance in Man) records, containing information of genetic disorders, were extracted and linked to the structures. The exhaustive all-against-all 3D structure comparison of protein structures stored in DALI [5] was condensed into single files for each PDB entry. Results are stored in XML format facilitating its incorporation into related software. The resulting annotation of the protein structures allows functional sites to be identified upon visualization.


2015 ◽  
Vol 71 (8) ◽  
pp. 1604-1614 ◽  
Author(s):  
Wouter G. Touw ◽  
Robbie P. Joosten ◽  
Gert Vriend

A coordinate-based method is presented to detect peptide bonds that need correction either by a peptide-plane flip or by atrans–cisinversion of the peptide bond. When applied to the whole Protein Data Bank, the method predicts 4617trans–cisflips and many thousands of hitherto unknown peptide-plane flips. A few examples are highlighted for which a correction of the peptide-plane geometry leads to a correction of the understanding of the structure–function relation. All data, including 1088 manually validated cases, are freely available and the method is available from a web server, a web-service interface and throughWHAT_CHECK.


Sign in / Sign up

Export Citation Format

Share Document