New protein main-chain conformational descriptors on the validation and improvement of automatic protein model building

Manually identifying and correcting errors in protein models can be a slow process, but improvements in validation tools and automated model-building software can contribute to reducing this burden. This article presents a new correctness score that is produced by combining multiple sources of information using a neural network. The residues in 639 automatically built models were marked as correct or incorrect by comparing them with the coordinates deposited in the PDB. A number of features were also calculated for each residue using Coot, including map-to-model correlation, density values, B factors, clashes, Ramachandran scores, rotamer scores and resolution. Two neural networks were created using these features as inputs: one to predict the correctness of main-chain atoms and the other for side chains. The 639 structures were split into 511 that were used to train the neural networks and 128 that were used to test performance. The predicted correctness scores could correctly categorize 92.3% of the main-chain atoms and 87.6% of the side chains. A Coot ML Correctness script was written to display the scores in a graphical user interface as well as for the automatic pruning of chains, residues and side chains with low scores. The automatic pruning function was added to the CCP4i2 Buccaneer automated model-building pipeline, leading to significant improvements, especially for high-resolution structures.

Download Full-text

When good ligands go bad

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314087300 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C1269-C1269

Author(s):

Ethan Merritt

Keyword(s):

Task Force ◽

Structure Refinement ◽

Binding Pocket ◽

Software Packages ◽

Protein Model ◽

Protein Components ◽

First Time ◽

Standard Software ◽

New Protein

"Tools for validating structural models of proteins are relatively mature and widely implemented. New protein crystallographers are introduced early on to the importance of monitoring conformance with expected φ/ψ values, favored rotamers, and local stereochemistry. The protein model is validated by the PDB at the time of deposition using criteria that are also available in the standard software packages used to refine the model being deposited. By contrast, crystallographers are typically much less familiar with procedures to validate key non-protein components of the model – cofactors, substrates, inhibitors, etc. It has been estimated that as many as a third of all ligands in the PDB exhibit preventable errors of some sort, ranging from minor deviations in expected bond angles to wholly implausible placement in the binding pocket. Following recommendations from the wwPDB Validation Task Force, the PDB recently began validating ligand geometry as an integral part of deposition processing. This means that many crystallographers will soon receive for the first time a ""grade"" on the quality of ligands in the structure they have just deposited. Some will be surprised, as I was following my first PDB deposition of 2014, at how easily bad ligand geometry can slip through the cracks in supposedly robust structure refinement protocols that their lab has used for many years. I will illustrate use of current tools for generating ligand restraints to guide model refinement. One is the jligand+coot+cprodrg pipeline integrated into the CCP4 suite. Another is the Grade web server provided as a community resource by Global Phasing Ltd. Furthermore I will show examples from recent in-house refinements of how things can still go wrong even if you do use these tools, and how we recovered. The new PDB deposition checks may expose errors in your ligand descriptions after the fact. This presentation may help you avoid introducing those errors in the first place."

Download Full-text

Cryo‐EM map interpretation and protein model‐building using iterative map segmentation

Protein Science ◽

10.1002/pro.3740 ◽

2019 ◽

Vol 29 (1) ◽

pp. 87-99 ◽

Cited By ~ 12

Author(s):

Thomas C. Terwilliger ◽

Paul D. Adams ◽

Pavel V. Afonine ◽

Oleg V. Sobolev

Keyword(s):

Model Building ◽

Map Interpretation ◽

Protein Model ◽

Map Segmentation

Download Full-text

Rapid model building of α-helices in electron-density maps

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444910000314 ◽

2010 ◽

Vol 66 (3) ◽

pp. 268-275 ◽

Cited By ~ 18

Author(s):

Thomas C. Terwilliger

Keyword(s):

High Resolution ◽

Electron Density ◽

Model Building ◽

Main Chain ◽

Rapid Identification ◽

Side Chains ◽

Low Resolution ◽

Density Maps ◽

Experimental Electron Density

A method for the identification of α-helices in electron-density maps at low resolution followed by interpretation at moderate to high resolution is presented. Rapid identification is achieved at low resolution, where α-helices appear as tubes of density. The positioning and direction of the α-helices is obtained at moderate to high resolution, where the positions of side chains can be seen. The method was tested on a set of 42 experimental electron-density maps at resolutions ranging from 1.5 to 3.8 Å. An average of 63% of the α-helical residues in these proteins were built and an average of 76% of the residues built matched helical residues in the refined models of the proteins. The overall average r.m.s.d. between main-chain atoms in the modeled α-helices and the nearest atom with the same name in the refined models of the proteins was 1.3 Å.

Download Full-text

Pairwise running of automated crystallographic model-building pipelines

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798320010542 ◽

2020 ◽

Vol 76 (9) ◽

pp. 814-823 ◽

Cited By ~ 1

Author(s):

Emad Alharbi ◽

Radu Calinescu ◽

Kevin Cowtan

Keyword(s):

Model Building ◽

Protein Structures ◽

Data Sets ◽

Protein Model

For the last two decades, researchers have worked independently to automate protein model building, and four widely used software pipelines have been developed for this purpose: ARP/wARP, Buccaneer, Phenix AutoBuild and SHELXE. Here, the usefulness of combining these pipelines to improve the built protein structures by running them in pairwise combinations is examined. The results show that integrating these pipelines can lead to significant improvements in structure completeness and R free. In particular, running Phenix AutoBuild after Buccaneer improved structure completeness for 29% and 75% of the data sets that were examined at the original resolution and at a simulated lower resolution, respectively, compared with running Phenix AutoBuild on its own. In contrast, Phenix AutoBuild alone produced better structure completeness than the two pipelines combined for only 7% and 3% of these data sets.

Download Full-text

Using known substructures in protein model building and crystallography.

The EMBO Journal ◽

10.1002/j.1460-2075.1986.tb04287.x ◽

1986 ◽

Vol 5 (4) ◽

pp. 819-822 ◽

Cited By ~ 595

Author(s):

T.A. Jones ◽

S. Thirup

Keyword(s):

Model Building ◽

Protein Model

Download Full-text

Crystallographic protein model-building on the web

Bioinformatics ◽

10.1093/bioinformatics/btl584 ◽

2006 ◽

Vol 23 (3) ◽

pp. 375-377 ◽

Cited By ~ 2

Author(s):

K. Gopal ◽

E. McKee ◽

T. Romo ◽

R. Pai ◽

J. Smith ◽

...

Keyword(s):

Model Building ◽

Protein Model ◽

The Web

Download Full-text

Comparison of automated crystallographic model-building pipelines

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798319014918 ◽

2019 ◽

Vol 75 (12) ◽

pp. 1119-1128 ◽

Cited By ~ 3

Author(s):

Emad Alharbi ◽

Paul S. Bond ◽

Radu Calinescu ◽

Kevin Cowtan

Keyword(s):

Model Building ◽

Protein Structures ◽

Data Sets ◽

Protein Model ◽

Using Data ◽

Complete Protein

A comparison of four protein model-building pipelines (ARP/wARP, Buccaneer, PHENIX AutoBuild and SHELXE) was performed using data sets from 202 experimentally phased cases, both with the data as observed and truncated to simulate lower resolutions. All pipelines were run using default parameters. Additionally, an ARP/wARP run was completed using models from Buccaneer. All pipelines achieved nearly complete protein structures and low R work/R free at resolutions between 1.2 and 1.9 Å, with PHENIX AutoBuild and ARP/wARP producing slightly lower R factors. At lower resolutions, Buccaneer leads to significantly more complete models.

Download Full-text