scholarly journals Enriched Conformational Sampling of DNA and Proteins with a Hybrid Hamiltonian Derived from the Protein Data Bank

2018 ◽  
Vol 19 (11) ◽  
pp. 3405 ◽  
Author(s):  
Emanuel Peter ◽  
Jiří Černý

In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson–Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.

2020 ◽  
Author(s):  
Lim Heo ◽  
Collin Arbour ◽  
Michael Feig

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>


Author(s):  
Dominique MIAS-LUCQUIN ◽  
Isaure Chauvot de Beauchêne

We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included. For the 143 groups identified as bound-unbound structures of the same protein , we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions, and compared it to the supposedly binding-induced modifications. This benchmark is, to our knowledge, the first attempt made to peruse available structures of protein – ssDNA interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.


2020 ◽  
Vol 49 (D1) ◽  
pp. D452-D457
Author(s):  
Lisanna Paladin ◽  
Martina Bevilacqua ◽  
Sara Errigo ◽  
Damiano Piovesan ◽  
Ivan Mičetić ◽  
...  

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class &gt; Topology &gt; Fold) with two new levels (Clan &gt; Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.


2020 ◽  
Vol 21 (6) ◽  
pp. 2243
Author(s):  
Nicolas K. Shinada ◽  
Peter Schmidtke ◽  
Alexandre G. de Brevern

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.


2015 ◽  
Vol 71 (8) ◽  
pp. 1604-1614 ◽  
Author(s):  
Wouter G. Touw ◽  
Robbie P. Joosten ◽  
Gert Vriend

A coordinate-based method is presented to detect peptide bonds that need correction either by a peptide-plane flip or by atrans–cisinversion of the peptide bond. When applied to the whole Protein Data Bank, the method predicts 4617trans–cisflips and many thousands of hitherto unknown peptide-plane flips. A few examples are highlighted for which a correction of the peptide-plane geometry leads to a correction of the understanding of the structure–function relation. All data, including 1088 manually validated cases, are freely available and the method is available from a web server, a web-service interface and throughWHAT_CHECK.


2019 ◽  
Author(s):  
Dmytro Guzenko ◽  
Stephen K. Burley ◽  
Jose M. Duarte

AbstractDetection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density volumes directly. But, brute force alignment of 3D data is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density volumes and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http://shape.rcsb.org).Author SummaryProtein structures possess wildly varied shapes, but patterns at different levels are frequently reused by nature. Finding and classifying these similarities is fundamental to understand evolution. Given the continued growth in the number of known protein structures in the Protein Data Bank, the task of comparing them to find the common patterns is becoming increasingly complicated. This is especially true when considering complete protein assemblies with several polypeptide chains, where the large sizes further complicate the issue. Here we present a novel method that can detect similarity between protein shapes and that works equally fast for any size of proteins or assemblies. The method looks at proteins as volumes of density distribution, departing from what is more usual in the field: similarity assessment based on atomic coordinates and chain connectivity. A volumetric function is amenable to be decomposed with a mathematical tool known as 3D Zernike polynomials, resulting in a compact description as vectors of Zernike moments. The tool was introduced in the 1990s, when it was suggested that the moments could be normalized to be invariant to rotations without losing information. Here we demonstrate that in fact this normalization is possible and that it offers a much more accurate method for assessing similarity between shapes, when compared to previous attempts.


Molecules ◽  
2020 ◽  
Vol 25 (7) ◽  
pp. 1522 ◽  
Author(s):  
Mikhail Yu. Lobanov ◽  
Ilya V. Likhachev ◽  
Oxana V. Galzitskaya

We created a new library of disordered patterns and disordered residues in the Protein Data Bank (PDB). To obtain such datasets, we clustered the PDB and obtained the groups of chains with different identities and marked disordered residues. We elaborated a new procedure for finding disordered patterns and created a new version of the library. This library includes three sets of patterns: unique patterns, patterns consisting of two kinds of amino acids, and homo-repeats. Using this database, the user can: (1) find homologues in the entire Protein Data Bank; (2) perform a statistical analysis of disordered residues in protein structures; (3) search for disordered patterns and homo-repeats; (4) search for disordered regions in different chains of the same protein; (5) download clusters of protein chains with different identity from our database and library of disordered patterns; and (6) observe 3D structure interactively using MView. A new library of disordered patterns will help improve the accuracy of predictions for residues that will be structured or unstructured in a given region.


2006 ◽  
Vol 59 (12) ◽  
pp. 874 ◽  
Author(s):  
Dimitris K. Agrafiotis ◽  
Alan Gibbs ◽  
Fangqiang Zhu ◽  
Sergei Izrailev ◽  
Eric Martin

Stochastic proximity embedding (SPE) is a novel self-organizing algorithm for sampling conformational space using geometric constraints derived from the molecular connectivity table. Here, we describe a simple heuristic that can be used in conjunction with SPE to bias the conformational search towards more extended or compact conformations, and thus greatly expand the range of geometries sampled during the search. The method uses a boosting strategy to generate a series of conformations, each of which is at least as extended (or compact) as the previous one. The approach is compared to several popular conformational sampling techniques using a reference set of 59 bioactive ligands extracted from the Protein Data Bank, and is shown to be significantly more effective in sampling the full range of molecular radii, with the exception of the Catalyst program, which was equally effective.


Algorithms ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 114 ◽  
Author(s):  
Mihaly Mezei

The steady growth of the Protein Data Bank (PDB) suggests the periodic repetition of searches for sequences that form different secondary structures in different protein structures; these are called chameleon sequences. This paper presents a fast (nlog(n)) algorithm for such searches and presents the results on all protein structures in the PDB. The longest such sequence found consists of 20 residues.


Sign in / Sign up

Export Citation Format

Share Document