Enriched Conformational Sampling of DNA and Proteins with a Hybrid Hamiltonian Derived from the Protein Data Bank

In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson–Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.

Download Full-text

Improved Sampling Strategies for Protein Model Refinement based on Molecular Dynamics Simulation

10.26434/chemrxiv.13299197.v1 ◽

2020 ◽

Author(s):

Lim Heo ◽

Collin Arbour ◽

Michael Feig

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Structure Prediction ◽

Protein Structures ◽

Conformational Space ◽

Dynamics Simulation ◽

Model Refinement ◽

Protein Model ◽

Lower Accuracy ◽

Simulation Based

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>

Download Full-text

Conformational variability in proteins bound to single-stranded DNA: a new benchmark for new docking perspectives

10.22541/au.162040366.69255354/v1 ◽

2021 ◽

Author(s):

Dominique MIAS-LUCQUIN ◽

Isaure Chauvot de Beauchêne

Keyword(s):

Protein Data Bank ◽

Conformational Changes ◽

Molecular Interactions ◽

Protein Structures ◽

Data Bank ◽

Computational Docking ◽

Ssdna Binding ◽

Conformational Variability ◽

High Flexibility ◽

Docking Benchmark

We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included. For the 143 groups identified as bound-unbound structures of the same protein , we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions, and compared it to the supposedly binding-induced modifications. This benchmark is, to our knowledge, the first attempt made to peruse available structures of protein – ssDNA interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.

Download Full-text

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Nucleic Acids Research ◽

10.1093/nar/gkaa1097 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D452-D457

Author(s):

Lisanna Paladin ◽

Martina Bevilacqua ◽

Sara Errigo ◽

Damiano Piovesan ◽

Ivan Mičetić ◽

...

Keyword(s):

Protein Data Bank ◽

Tandem Repeat ◽

Tandem Repeats ◽

Classification Scheme ◽

Sequence Similarity ◽

Protein Structures ◽

Hierarchical Classification ◽

Structural Similarity ◽

Data Bank ◽

Similarity Class

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

Download Full-text

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)

International Journal of Molecular Sciences ◽

10.3390/ijms21062243 ◽

2020 ◽

Vol 21 (6) ◽

pp. 2243

Author(s):

Nicolas K. Shinada ◽

Peter Schmidtke ◽

Alexandre G. de Brevern

Keyword(s):

Protein Data Bank ◽

Protein Sequence ◽

Large Scale ◽

Protein Structures ◽

Structural Diversity ◽

Data Bank ◽

Protein Distribution ◽

Research Areas ◽

Identity Threshold ◽

Protein Sequence Identity

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.

Download Full-text

Detection oftrans–cisflips and peptide-plane flips in protein structures

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004715008263 ◽

2015 ◽

Vol 71 (8) ◽

pp. 1604-1614 ◽

Cited By ~ 16

Author(s):

Wouter G. Touw ◽

Robbie P. Joosten ◽

Gert Vriend

Keyword(s):

Structure Function ◽

Web Service ◽

Protein Data Bank ◽

Protein Structures ◽

Data Bank ◽

Peptide Bond ◽

Peptide Bonds ◽

Unknown Peptide ◽

Peptide Plane ◽

Service Interface

A coordinate-based method is presented to detect peptide bonds that need correction either by a peptide-plane flip or by atrans–cisinversion of the peptide bond. When applied to the whole Protein Data Bank, the method predicts 4617trans–cisflips and many thousands of hitherto unknown peptide-plane flips. A few examples are highlighted for which a correction of the peptide-plane geometry leads to a correction of the understanding of the structure–function relation. All data, including 1088 manually validated cases, are freely available and the method is available from a web server, a web-service interface and throughWHAT_CHECK.

Download Full-text

Effects of N-glycosylation on protein conformation and dynamics: Protein Data Bank analysis and molecular dynamics simulation study

Scientific Reports ◽

10.1038/srep08926 ◽

2015 ◽

Vol 5 (1) ◽

Cited By ~ 70

Author(s):

Hui Sun Lee ◽

Yifei Qi ◽

Wonpil Im

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Protein Data Bank ◽

Simulation Study ◽

Protein Conformation ◽

Data Bank ◽

Dynamics Simulation

Download Full-text

Real time structural search of the Protein Data Bank

10.1101/845123 ◽

2019 ◽

Cited By ~ 1

Author(s):

Dmytro Guzenko ◽

Stephen K. Burley ◽

Jose M. Duarte

Keyword(s):

Real Time ◽

Protein Data Bank ◽

Electron Density ◽

Polypeptide Chain ◽

Protein Structures ◽

Data Bank ◽

Zernike Moment ◽

Search Problem ◽

Mathematical Tool ◽

Protein Assemblies

AbstractDetection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density volumes directly. But, brute force alignment of 3D data is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density volumes and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http://shape.rcsb.org).Author SummaryProtein structures possess wildly varied shapes, but patterns at different levels are frequently reused by nature. Finding and classifying these similarities is fundamental to understand evolution. Given the continued growth in the number of known protein structures in the Protein Data Bank, the task of comparing them to find the common patterns is becoming increasingly complicated. This is especially true when considering complete protein assemblies with several polypeptide chains, where the large sizes further complicate the issue. Here we present a novel method that can detect similarity between protein shapes and that works equally fast for any size of proteins or assemblies. The method looks at proteins as volumes of density distribution, departing from what is more usual in the field: similarity assessment based on atomic coordinates and chain connectivity. A volumetric function is amenable to be decomposed with a mathematical tool known as 3D Zernike polynomials, resulting in a compact description as vectors of Zernike moments. The tool was introduced in the 1990s, when it was suggested that the moments could be normalized to be invariant to rotations without losing information. Here we demonstrate that in fact this normalization is possible and that it offers a much more accurate method for assessing similarity between shapes, when compared to previous attempts.

Download Full-text

Disordered Residues and Patterns in the Protein Data Bank

Molecules ◽

10.3390/molecules25071522 ◽

2020 ◽

Vol 25 (7) ◽

pp. 1522 ◽

Cited By ~ 2

Author(s):

Mikhail Yu. Lobanov ◽

Ilya V. Likhachev ◽

Oxana V. Galzitskaya

Keyword(s):

Amino Acids ◽

Statistical Analysis ◽

Protein Data Bank ◽

Protein Structures ◽

3D Structure ◽

Data Bank ◽

Disordered Regions

We created a new library of disordered patterns and disordered residues in the Protein Data Bank (PDB). To obtain such datasets, we clustered the PDB and obtained the groups of chains with different identities and marked disordered residues. We elaborated a new procedure for finding disordered patterns and created a new version of the library. This library includes three sets of patterns: unique patterns, patterns consisting of two kinds of amino acids, and homo-repeats. Using this database, the user can: (1) find homologues in the entire Protein Data Bank; (2) perform a statistical analysis of disordered residues in protein structures; (3) search for disordered patterns and homo-repeats; (4) search for disordered regions in different chains of the same protein; (5) download clusters of protein chains with different identity from our database and library of disordered patterns; and (6) observe 3D structure interactively using MView. A new library of disordered patterns will help improve the accuracy of predictions for residues that will be structured or unstructured in a given region.

Download Full-text

Conformational Boosting

Australian Journal of Chemistry ◽

10.1071/ch06217 ◽

2006 ◽

Vol 59 (12) ◽

pp. 874 ◽

Cited By ~ 5

Author(s):

Dimitris K. Agrafiotis ◽

Alan Gibbs ◽

Fangqiang Zhu ◽

Sergei Izrailev ◽

Eric Martin

Keyword(s):

Full Range ◽

Data Bank ◽

Geometric Constraints ◽

Conformational Space ◽

Conformational Search ◽

Conformational Sampling ◽

Sampling Techniques ◽

Molecular Connectivity ◽

Reference Set ◽

Bioactive Ligands

Stochastic proximity embedding (SPE) is a novel self-organizing algorithm for sampling conformational space using geometric constraints derived from the molecular connectivity table. Here, we describe a simple heuristic that can be used in conjunction with SPE to bias the conformational search towards more extended or compact conformations, and thus greatly expand the range of geometries sampled during the search. The method uses a boosting strategy to generate a series of conformations, each of which is at least as extended (or compact) as the previous one. The approach is compared to several popular conformational sampling techniques using a reference set of 59 bioactive ligands extracted from the Protein Data Bank, and is shown to be significantly more effective in sampling the full range of molecular radii, with the exception of the Catalyst program, which was equally effective.

Download Full-text

Revisiting Chameleon Sequences in the Protein Data Bank

Algorithms ◽

10.3390/a11080114 ◽

2018 ◽

Vol 11 (8) ◽

pp. 114 ◽

Cited By ~ 3

Author(s):

Mihaly Mezei

Keyword(s):

Protein Data Bank ◽

Protein Structures ◽

Data Bank ◽

Secondary Structures ◽

Steady Growth ◽

Periodic Repetition

The steady growth of the Protein Data Bank (PDB) suggests the periodic repetition of searches for sequences that form different secondary structures in different protein structures; these are called chameleon sequences. This paper presents a fast (nlog(n)) algorithm for such searches and presents the results on all protein structures in the PDB. The longest such sequence found consists of 20 residues.

Download Full-text