PDB-2-PBv3.0: An updated protein block database

Our protein block (PB) sequence database PDB-2-PBv1.0 provides PB sequences and dihedral angles for 74,297 protein structures comprising of 103,252 protein chains of Protein Data Bank (PDB) as on 2011. Since there are a lot of practical applications of PB and also as the size of PDB database increases, it becomes necessary to provide the PB sequences for all PDB protein structures. The current updated PDB-2-PBv3.0 contains PB sequences for 147,602 PDB structures comprising of 400,355 protein chains as on October 2019. When compared to our previous version PDB-2-PBv1.0, the current PDB-2-PBv3.0 contains 2- and 4-fold increase in the number of protein structures and chains, respectively. Notably, it provides PB information for any protein chain, regardless of the missing atom records of protein structure data in PDB. It includes protein interaction information with DNA and RNA along with their corresponding functional classes from Nucleic Acid Database (NDB) and PDB. Now, the updated version allows the user to download multiple PB records by parameter search and/or by a given list. This database is freely accessible at http://bioinfo.bdu.ac.in/pb3 .

Download Full-text

PDB-2-PB: a curated online protein block sequence database

Journal of Applied Crystallography ◽

10.1107/s0021889811052356 ◽

2011 ◽

Vol 45 (1) ◽

pp. 127-129 ◽

Cited By ~ 4

Author(s):

V. Suresh ◽

K. Ganesan ◽

S. Parthasarathy

Keyword(s):

Amino Acid ◽

World Wide ◽

Protein Structures ◽

Primary Source ◽

Data Bank ◽

Sequence Database ◽

Block Sequence ◽

X Ray ◽

The World ◽

Protein Block

This article describes the development of a curated online protein block sequence database, PDB-2-PB. The protein block sequences for protein structures with complete backbone coordinates have been encoded using the encoding procedure of de Brevern, Etchebest & Hazout [Proteins(2000),41, 271–287]. In the current release of the PDB-2-PB database (version 1.0), the protein entries from a recent release of the World Wide Protein Data Bank (wwPDB), which has 74 297 solved PDB entries as of 7 July 2011, have been used as a primary source. The PDB-2-PB database stores the protein block sequences for all the chains present in a protein structure. PDB-2-PB version 1.0 has the curated protein block sequences for 103 252 PDB chain entries (93 547 X-ray, 7033 NMR and 2672 other experimental chain entries). From the PDB-2-PB database, users can extract the curated protein block sequence and its corresponding amino acid sequence, which is extracted from the PDB ATOM records. Users can download these sequences either by using the PDB code or by using various parameters listed in the database. The PDB-2-PB database is freely available at http://bioinfo.bdu.ac.in/~pb/.

Download Full-text

PRIGSA2: Improved version of Protein Repeat Identification by Graph Spectral Analysis

10.1101/803304 ◽

2019 ◽

Author(s):

Broto Chakrabarty ◽

Nita Parekh

Keyword(s):

Tertiary Structure ◽

De Novo ◽

Protein Complexes ◽

Repeat Unit ◽

Protein Structures ◽

Fold Increase ◽

Data Bank ◽

Topological Features ◽

Repeat Proteins ◽

Complete Protein

AbstractTandemly repeated structural motifs in proteins form highly stable structural folds and provide multiple binding sites associated with diverse functional roles. The tertiary structure and function of these proteins are determined by the type and copy number of the repeating units. Each repeat type exhibits a unique pattern of intra- and inter-repeat unit interactions that is well-captured by the topological features in the network representation of protein structures. Here we present an improved version of our graph based algorithm, PRIGSA, with structure-based validation and filtering steps incorporated for accurate detection of tandem structural repeats. The algorithm integrates available knowledge on repeat families with de novo prediction to detect repeats in single monomer chains as well as in multimeric protein complexes. Three levels of performance evaluation are presented: comparison with state-of-the-art algorithms on benchmark dataset of repeat and non-repeat proteins, accuracy in the detection of members of 13 known repeat families reported in UniProt and execution on the complete Protein Data Bank to show its ability to identify previously uncharacterized proteins. A ∼3-fold increase in the coverage of the members of 13 known families and 3,408 novel uncharacterized structural repeat proteins are identified on executing it on PDB. URL: http://bioinf.iiit.ac.in/PRIGSA2/.

Download Full-text

TactViz: A VMD Plugin for Tactile Visualization of Protein Structures

Journal of Science Education for Students with Disabilities ◽

10.14448/jsesd.12.0015 ◽

2020 ◽

Vol 23 (1) ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Olivia R. Shaw ◽

◽

Jodi A. Hadden-Perilla ◽

Keyword(s):

Visually Impaired ◽

Protein Structures ◽

Data Bank ◽

Tactile Display ◽

Display Devices ◽

Diversity And Inclusion ◽

Scientific Disciplines ◽

Visualization Software ◽

Academic Laboratory ◽

Protein Structure Data

Scientific disciplines spanning biology, biochemistry, and biophysics involve the study of proteins and their functions. Visualization of protein structures represents a barrier to education and research in these disciplines for students who are blind or visually impaired. Here, we present a software plugin for readily producing variable-height tactile graphics of proteins using the free biomolecular visualization software Visual Molecular Dynamics (VMD) and protein structure data that is publicly available through the Protein Data Bank. Our method also supports interactive tactile visualization of proteins with VMD on electronic refreshable tactile display devices. Employing our method in an academic laboratory has enabled an undergraduate student who is blind to carry out research alongside her sighted peers. By making the study of protein structures accessible to students who are blind or visually impaired, we aim to promote diversity and inclusion in STEM education and research.

Download Full-text

Macromolecular Structure Databases: Past Progress and Future Challenges

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444998009846 ◽

1998 ◽

Vol 54 (6) ◽

pp. 1085-1094 ◽

Cited By ~ 2

Author(s):

Helge Weissig ◽

Ilya N. Shindyalov ◽

Philip E. Bourne

Keyword(s):

Management Practices ◽

Temporal Trends ◽

Protein Structures ◽

Data Bank ◽

Macromolecular Structure ◽

Paper Briefly ◽

Future Challenges ◽

Structure Databases ◽

Full Discussion ◽

Self Consistency

Databases containing macromolecular structure data provide a crystallographer with important tools for use in solving, refining and understanding the functional significance of their protein structures. Given this importance, this paper briefly summarizes past progress by outlining the features of the significant number of relevant databases developed to date. One recent database, PDB+, containing all current and obsolete structures deposited with the Protein Data Bank (PDB) is discussed in more detail. PDB+ has been used to analyze the self-consistency of the current (1 January 1998) corpus of over 7000 structures. A summary of those findings is presented (a full discussion will appear elsewhere) in the form of global and temporal trends within the data. These trends indicate that challenges exist if crystallographers are to provide the community with complete and consistent structural results in the future. It is argued that better information management practices are required to meet these challenges.

Download Full-text

Enriched Conformational Sampling of DNA and Proteins with a Hybrid Hamiltonian Derived from the Protein Data Bank

International Journal of Molecular Sciences ◽

10.3390/ijms19113405 ◽

2018 ◽

Vol 19 (11) ◽

pp. 3405 ◽

Cited By ~ 3

Author(s):

Emanuel Peter ◽

Jiří Černý

Keyword(s):

Partition Function ◽

Protein Data Bank ◽

Protein Structures ◽

Data Bank ◽

Weighting Factor ◽

Potential Of Mean Force ◽

Conformational Space ◽

Dynamics Simulation ◽

Conformational Sampling ◽

Speed Increase

In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson–Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.

Download Full-text

Expanding our knowledge of the protein universe: Modelling of protein structures

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314095084 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C491-C491

Author(s):

Jürgen Haas ◽

Alessandro Barbato ◽

Tobias Schmidt ◽

Steven Roth ◽

Andrew Waterhouse ◽

...

Keyword(s):

Computational Modeling ◽

Structure Prediction ◽

Structural Information ◽

Protein Structures ◽

Model Organism ◽

Data Bank ◽

Continuous Model ◽

Structure Modeling ◽

Structure Comparison ◽

Modeling And Prediction

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.

Download Full-text

Conformational variability in proteins bound to single-stranded DNA: a new benchmark for new docking perspectives

10.22541/au.162040366.69255354/v1 ◽

2021 ◽

Author(s):

Dominique MIAS-LUCQUIN ◽

Isaure Chauvot de Beauchêne

Keyword(s):

Protein Data Bank ◽

Conformational Changes ◽

Molecular Interactions ◽

Protein Structures ◽

Data Bank ◽

Computational Docking ◽

Ssdna Binding ◽

Conformational Variability ◽

High Flexibility ◽

Docking Benchmark

We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included. For the 143 groups identified as bound-unbound structures of the same protein , we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions, and compared it to the supposedly binding-induced modifications. This benchmark is, to our knowledge, the first attempt made to peruse available structures of protein – ssDNA interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.

Download Full-text

Structure-Guided Computational Approaches to Unravel Druggable Proteomic Landscape of Mycobacterium leprae

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.663301 ◽

2021 ◽

Vol 8 ◽

Author(s):

Sundeep Chaitanya Vedithi ◽

Sony Malhotra ◽

Marta Acebrón-García-de-Eulate ◽

Modestas Matusevicius ◽

Pedro Henrique Monteiro Torres ◽

...

Keyword(s):

Drug Discovery ◽

Schwann Cells ◽

Protein Structures ◽

Mycobacterium Leprae ◽

Data Bank ◽

Nerve Damage ◽

Structural Proteomics ◽

Bacterial Survival ◽

Functional Sites

Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.

Download Full-text

Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2010100103 ◽

2010 ◽

Vol 1 (4) ◽

pp. 54-68

Author(s):

Majid Masso

Keyword(s):

Bacteriophage T4 ◽

Protein Structures ◽

Dimensional Subspace ◽

Sequence Length ◽

Protein Chain ◽

Contact Potential ◽

Knowledge Based ◽

Environmental Perturbations ◽

Subspace Modeling ◽

Residue Substitution

A computational mutagenesis is detailed whereby each single residue substitution in a protein chain of primary sequence length N is represented as a sparse N-dimensional feature vector, whose M << N nonzero components locally quantify environmental perturbations occurring at the mutated position and its neighbors in the protein structure. The methodology makes use of both the Delaunay tessellation algorithm for representing protein structures, as well as a four-body, knowledge based, statistical contact potential. Feature vectors for each subset of mutants due to all possible residue substitutions at a particular position cohabit the same M-dimensional subspace, where the value of M and the identities of the M nonzero components are similarly position dependent. The approach is used to characterize a large experimental dataset of single residue substitutions in bacteriophage T4 lysozyme, each categorized as either unaffected or affected based on the measured level of mutant activity relative to that of the native protein. Performance of a single classifier trained with the collective set of mutants in N-space is compared to that of an ensemble of position-specific classifiers trained using disjoint mutant subsets residing in significantly smaller subspaces. Results suggest that significant improvements can be achieved through subspace modeling.

Download Full-text

MRPC (Missing Regions in Polypeptide Chains): a knowledgebase

Journal of Applied Crystallography ◽

10.1107/s1600576719012330 ◽

2019 ◽

Vol 52 (6) ◽

pp. 1422-1426

Author(s):

Rajendran Santhosh ◽

Namrata Bankoti ◽

Adgonda Malgonnavar Padmashri ◽

Daliah Michael ◽

Jeyaraman Jeyakanthan ◽

...

Keyword(s):

Protein Structures ◽

Three Dimensional ◽

Protein Molecule ◽

Data Bank ◽

Protein Crystal ◽

Dimensional Structure ◽

Protein Structure Analysis ◽

Three Dimensional Structure ◽

X Ray Crystallography ◽

Polypeptide Chains

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.

Download Full-text