Molecular replacement using structure predictions from databases

Molecular replacement (MR) is the predominant route to solution of the phase problem in macromolecular crystallography. Where the lack of a suitable homologue precludes conventional MR, one option is to predict the target structure using bioinformatics. Such modelling, in the absence of homologous templates, is called ab initio or de novo modelling. Recently, the accuracy of such models has improved significantly as a result of the availability, in many cases, of residue-contact predictions derived from evolutionary covariance analysis. Covariance-assisted ab initio models representing structurally uncharacterized Pfam families are now available on a large scale in databases, potentially representing a valuable and easily accessible supplement to the PDB as a source of search models. Here, the unconventional MR pipeline AMPLE is employed to explore the value of structure predictions in the GREMLIN and PconsFam databases. It was tested whether these deposited predictions, processed in various ways, could solve the structures of PDB entries that were subsequently deposited. The results were encouraging: nine of 27 GREMLIN cases were solved, covering target lengths of 109–355 residues and a resolution range of 1.4–2.9 Å, and with target–model shared sequence identity as low as 20%. The cluster-and-truncate approach in AMPLE proved to be essential for most successes. For the overall lower quality structure predictions in the PconsFam database, remodelling with Rosetta within the AMPLE pipeline proved to be the best approach, generating ensemble search models from single-structure deposits. Finally, it is shown that the AMPLE-obtained search models deriving from GREMLIN deposits are of sufficiently high quality to be selected by the sequence-independent MR pipeline SIMBAD. Overall, the results help to point the way towards the optimal use of the expanding databases of ab initio structure predictions.

Download Full-text

Crystallographic molecular replacement using an in silico-generated search model of SARS-CoV-2 ORF8

10.1101/2021.01.05.425441 ◽

2021 ◽

Author(s):

Thomas G. Flower ◽

James H. Hurley

Keyword(s):

Ab Initio ◽

Structural Parameters ◽

Search Model ◽

Problem Solution ◽

Molecular Replacement ◽

Phase Problem ◽

Search Models ◽

Sad Phasing ◽

Predicted Model ◽

Β Sheet

AbstractThe majority of crystal structures are determined by the method of molecular replacement (MR). The range of application of MR is limited mainly by the need for an accurate search model. In most cases, pre-existing experimentally determined structures are used as search models. In favorable cases, ab initio predicted structures have yielded search models adequate for molecular replacement. The ORF8 protein of SARS-CoV-2 represents a challenging case for MR using an ab initio prediction because ORF8 has an all β-sheet fold and few orthologs. We previously determined experimentally the structure of ORF8 using the single anomalous dispersion (SAD) phasing method, having been unable to find an MR solution to the crystallographic phase problem. Following a report of an accurate prediction of the ORF8 structure, we assessed whether the predicted model would have succeeded as an MR search model. A phase problem solution was found, and the resulting structure was refined, yielding structural parameters equivalent to the original experimental solution.

Download Full-text

Error-estimation-guided rebuilding ofde novomodels increases the success rate ofab initiophasing

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444912037961 ◽

2012 ◽

Vol 68 (11) ◽

pp. 1522-1534 ◽

Cited By ~ 5

Author(s):

Rojan Shrestha ◽

David Simoncini ◽

Kam Y. J. Zhang

Keyword(s):

Protein Structure ◽

Ab Initio ◽

Diffraction Data ◽

Structure Prediction ◽

De Novo ◽

Coarse Grained ◽

Data Sets ◽

Molecular Replacement ◽

High Quality ◽

Protein Targets

Recent advancements in computational methods for protein-structure prediction have made it possible to generate the high-qualityde novomodels required forab initiophasing of crystallographic diffraction data using molecular replacement. Despite those encouraging achievements inab initiophasing usingde novomodels, its success is limited only to those targets for which high-qualityde novomodels can be generated. In order to increase the scope of targets to whichab initiophasing withde novomodels can be successfully applied, it is necessary to reduce the errors in thede novomodels that are used as templates for molecular replacement. Here, an approach is introduced that can identify and rebuild the residues with larger errors, which subsequently reduces the overall Cαroot-mean-square deviation (CA-RMSD) from the native protein structure. The error in a predicted model is estimated from the average pairwise geometric distance per residue computed among selected lowest energy coarse-grained models. This score is subsequently employed to guide a rebuilding process that focuses on more error-prone residues in the coarse-grained models. This rebuilding methodology has been tested on ten protein targets that were unsuccessful using previous methods. The average CA-RMSD of the coarse-grained models was improved from 4.93 to 4.06 Å. For those models with CA-RMSD less than 3.0 Å, the average CA-RMSD was improved from 3.38 to 2.60 Å. These rebuilt coarse-grained models were then converted into all-atom models and refined to produce improvedde novomodels for molecular replacement. Seven diffraction data sets were successfully phased using rebuiltde novomodels, indicating the improved quality of these rebuiltde novomodels and the effectiveness of the rebuilding process. Software implementing this method, calledMORPHEUS, can be downloaded from http://www.riken.jp/zhangiru/software.html.

Download Full-text

Uniqueness of the macromolecular crystallographic phase problem

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273315015387 ◽

2015 ◽

Vol 71 (6) ◽

pp. 592-598 ◽

Cited By ~ 15

Author(s):

Rick P. Millane ◽

Romain D. Arnal

Keyword(s):

Ab Initio ◽

Single Particle ◽

Real Space ◽

Unit Cell ◽

Macromolecular Crystallography ◽

Phase Problem ◽

Particle Imaging ◽

Single Particle Imaging ◽

Molecular Envelope ◽

Constraint Ratio

Uniqueness of the phase problem in macromolecular crystallography, and its relationship to the case of single particle imaging, is considered. The crystallographic problem is characterized by a constraint ratio that depends only on the size and symmetry of the molecule and the unit cell. The results are used to evaluate the effect of various real-space constraints. The case of an unknown molecular envelope is considered in detail. The results indicate the quite wide circumstances under whichab initiophasing should be possible.

Download Full-text

Uniqueness and the ab initio phase problem in macromolecular crystallography

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444992008801 ◽

1993 ◽

Vol 49 (1) ◽

pp. 186-192 ◽

Cited By ~ 11

Author(s):

D. Baker ◽

A. E. Krukowski ◽

D. A. Agard

Keyword(s):

Ab Initio ◽

Macromolecular Crystallography ◽

Phase Problem

Download Full-text

A fragmentation and reassembly method forab initiophasing

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004714025449 ◽

2015 ◽

Vol 71 (2) ◽

pp. 304-312 ◽

Cited By ~ 14

Author(s):

Rojan Shrestha ◽

Kam Y. J. Zhang

Keyword(s):

Ab Initio ◽

Structure Determination ◽

Model Building ◽

De Novo ◽

Sequence Information ◽

Molecular Replacement ◽

Protein Targets ◽

Current State ◽

Viable Approach ◽

Automated Model Building

Ab initiophasing withde novomodels has become a viable approach for structural solution from protein crystallographic diffraction data. This approach takes advantage of the known protein sequence information, predictsde novomodels and uses them for structure determination by molecular replacement. However, even the current state-of-the-artde novomodelling method has a limit as to the accuracy of the model predicted, which is sometimes insufficient to be used as a template for successful molecular replacement. A fragment-assembly phasing method has been developed that starts from an ensemble of low-accuracyde novomodels, disassembles them into fragments, places them independently in the crystallographic unit cell by molecular replacement and then reassembles them into a whole structure that can provide sufficient phase information to enable complete structure determination by automated model building. Tests on ten protein targets showed that the method could solve structures for eight of these targets, although the predictedde novomodels cannot be used as templates for successful molecular replacement since the best model for each target is on average more than 4.0 Å away from the native structure. The method has extended the applicability of theab initiophasing byde novomodels approach. The method can be used to solve structures when the bestde novomodels are still of low accuracy.

Download Full-text

Exploring the speed and performance of molecular replacement withAMPLEusingQUARK ab initioprotein models

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004714025784 ◽

2015 ◽

Vol 71 (2) ◽

pp. 338-343 ◽

Cited By ~ 20

Author(s):

Ronan M. Keegan ◽

Jaclyn Bibby ◽

Jens Thomas ◽

Dong Xu ◽

Yang Zhang ◽

...

Keyword(s):

Protein Structure ◽

Ab Initio ◽

Molecular Replacement ◽

Test Set ◽

Search Models ◽

Structure Solution ◽

And Performance ◽

Almost All

AMPLEclusters and truncatesab initioprotein structure predictions, producing search models for molecular replacement. Here, an interesting degree of complementarity is shown between targets solved using the differentab initiomodelling programsQUARKandROSETTA. Search models derived from either program collectively solve almost all of the all-helical targets in the test set. Initial solutions produced byPhaserafter only 5 min perform surprisingly well, improving the prospects forin situstructure solution byAMPLEduring synchrotron visits. Taken together, the results show the potential forAMPLEto run more quickly and successfully solve more targets than previously suspected.

Download Full-text

Rapid molecular replacement of coiled-coil and transmembrane proteins with AMPLE

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314096521 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C347-C347

Author(s):

Jens Thomas ◽

Ronan Keegan ◽

Jaclyn Bibby ◽

Martyn Winn ◽

Olga Mayans ◽

...

Keyword(s):

Ab Initio ◽

Protein Structures ◽

Coiled Coil ◽

Molecular Replacement ◽

Helical Structures ◽

Desktop Computer ◽

Search Models ◽

Nmr Structures ◽

Structure Solution ◽

Helical Geometry

Molecular Replacement (MR) is an increasingly popular route to protein structure solution. AMPLE[1] is a software pipeline that uses either cheaply obtained ab inito protein models, or NMR structures to extend the scope of MR, allowing it to solve entirely novel protein structures in a completely automated pipeline on a standard desktop computer. AMPLE employs a cluster-and-truncate approach, combined with multiple modes of side chain treatment, to analyse the candidate models and extract the consensual features most likely to solve the structure. The search models generated in this way are screened by MrBump using Phaser and Molrep and correct solutions are detected using main chain tracing and phase modification with Shelxe. AMPLE proved capable of processing rapidly obtained ab initio structure predictions into successful search models and more recently proved effective in assembling NMR structures for MR[2]. Coiled-coil proteins are a distinct class of protein fold whose structure solution by MR is not typically straightforward. We show here that AMPLE can quickly and routinely solve most coiled-coil structures using ab initio predictions from Rosetta. The predictions are generally not globally accurate, but by encompassing different degrees of truncation of clustered models, AMPLE succeeds by sampling across a range of search models. These sometimes succeed through capturing locally well-modelled conformations, but often simply contain small helical units. Remarkably, the latter regularly succeed despite out-of-register placement and poor MR statistics. We demonstrate that single structures derived from successful ensembles perform less well, and comparable ideal helices solve few targets. Thus, both modelling of distortions from ideal helical geometry and the ensemble nature of the search models contribute to success. AMPLE is a framework applicable to any set of input structures in which variability is correlated with inaccuracy. We also present preliminary data demonstrating structure solution of transmembrane helical structures using Rosetta modelling. We finally consider future sources of starting models which offer the hope that MR with AMPLE, in the absence of close homology between a known structure and the target, may soon be possible with larger proteins.

Download Full-text

Residue contacts predicted by evolutionary covariance extend the application ofab initiomolecular replacement to larger and more challenging protein folds

IUCrJ ◽

10.1107/s2052252516008113 ◽

2016 ◽

Vol 3 (4) ◽

pp. 259-270 ◽

Cited By ~ 11

Author(s):

Felix Simkovic ◽

Jens M. H. Thomas ◽

Ronan M. Keegan ◽

Martyn D. Winn ◽

Olga Mayans ◽

...

Keyword(s):

Ab Initio ◽

Structure Prediction ◽

Sequence Information ◽

Protein Targets ◽

Residue Contact ◽

Residue Contacts ◽

Structure Solution ◽

Improved Performance ◽

Model Ensembles ◽

Contact Predictions

For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurateab initio(non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here,AMPLE, an MR pipeline that assembles search-model ensembles fromab initiostructure predictions (`decoys'), is employed to assess the value of contact-assistedab initiomodels to the crystallographer. It is demonstrated that evolutionary covariance-derived residue–residue contact predictions improve the quality ofab initiomodels and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simpleRosettadecoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.

Download Full-text

Applications of contact predictions to structural biology

IUCrJ ◽

10.1107/s2052252517005115 ◽

2017 ◽

Vol 4 (3) ◽

pp. 291-300 ◽

Cited By ~ 27

Author(s):

Felix Simkovic ◽

Sergey Ovchinnikov ◽

David Baker ◽

Daniel J. Rigden

Keyword(s):

Large Scale ◽

Structural Bioinformatics ◽

Structural Domains ◽

Biologically Relevant ◽

Search Models ◽

X Ray Crystallography ◽

Contact Information ◽

Structure Solution ◽

Contact Predictions

Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallography, cryo-EM or NMR. Integrative structural bioinformatics packages such asRosettacan already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR) benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed usingab initiomodelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental methods. Finally, predicted contacts can distinguish between biologically relevant interfaces and mere lattice contacts in a final crystal structure, and have potential in the identification of functionally important regions and in foreseeing the consequences of mutations.

Download Full-text

Application of theAMPLEcluster-and-truncate approach to NMR structures for molecular replacement

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913018453 ◽

2013 ◽

Vol 69 (11) ◽

pp. 2194-2201 ◽

Cited By ~ 11

Author(s):

Jaclyn Bibby ◽

Ronan M. Keegan ◽

Olga Mayans ◽

Martyn D. Winn ◽

Daniel J. Rigden

Keyword(s):

Protein Structure ◽

Ab Initio ◽

Target Protein ◽

Molecular Replacement ◽

Search Models ◽

Sequence Identity ◽

Nmr Structures ◽

Core Cluster ◽

Structural Divergence ◽

Improved Performance

AMPLEis a program developed for clustering and truncatingab initioprotein structure predictions into search models for molecular replacement. Here, it is shown that its core cluster-and-truncate methods also work well for processing NMR ensembles into search models.Rosettaremodelling helps to extend success to NMR structures bearing low sequence identity or high structural divergence from the target protein. Potential future routes to improved performance are considered and practical, general guidelines on usingAMPLEare provided.

Download Full-text