Evolutionary couplings detect side-chain interactions

Patterns of amino acid covariation in large protein sequence alignments can inform the prediction of de novo protein structures, binding interfaces, and mutational effects. While algorithms that detect these so-called evolutionary couplings between residues have proven useful for practical applications, less is known about how and why these methods perform so well, and what insights into biological processes can be gained from their application. Evolutionary coupling algorithms are commonly benchmarked by comparison to true structural contacts derived from solved protein structures. However, the methods used to determine true structural contacts are not standardized and different definitions of structural contacts may have important consequences for interpreting the results from evolutionary coupling analyses and understanding their overall utility. Here, we show that evolutionary coupling analyses are significantly more likely to identify structural contacts between side-chain atoms than between backbone atoms. We use both simulations and empirical analyses to highlight that purely backbone-based definitions of true residue–residue contacts (i.e., based on the distance between Cα atoms) may underestimate the accuracy of evolutionary coupling algorithms by as much as 40% and that a commonly used reference point (Cβ atoms) underestimates the accuracy by 10–15%. These findings show that co-evolutionary outcomes differ according to which atoms participate in residue–residue interactions and suggest that accounting for different interaction types may lead to further improvements to contact-prediction methods.Significance StatementEvolutionary couplings between residues within a protein can provide valuable information about protein structures, protein-protein interactions, and the mutability of individual residues. However, the mechanistic factors that determine whether two residues will co-evolve remains unknown. We show that structural proximity by itself is not sufficient for co-evolution to occur between residues. Rather, evolutionary couplings between residues are specifically governed by interactions between side-chain atoms. By contrast, intramolecular contacts between atoms in the protein backbone display only a weak signature of evolutionary coupling. These findings highlight that different types of stabilizing contacts exist within protein structures and that these types have a differential impact on the evolution of protein structures that should be considered in co-evolutionary applications.

Download Full-text

All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1419956112 ◽

2015 ◽

Vol 112 (17) ◽

pp. 5413-5418 ◽

Cited By ~ 41

Author(s):

Sikander Hayat ◽

Chris Sander ◽

Debora S. Marks ◽

Arne Elofsson

Keyword(s):

Structure Prediction ◽

De Novo ◽

3D Structure ◽

3D Models ◽

Sequence Information ◽

Sequence Alignments ◽

Residue Contacts ◽

Machine Learning Approach ◽

3D Structure Prediction ◽

Structure Accuracy

Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

Download Full-text

Co-evolutionary Distance Prediction for Flexibility Prediction

10.1101/2020.10.15.340752 ◽

2020 ◽

Author(s):

Dominik Schwarz ◽

Guy Georges ◽

Sebastian Kelm ◽

Jiye Shi ◽

Anna Vangone ◽

...

Keyword(s):

De Novo ◽

Protein Structures ◽

Distance Distribution ◽

Residue Pair ◽

Machine Learning Techniques ◽

Reference Structure ◽

Sequence Alignments ◽

Static Structure ◽

Local Maxima ◽

Distance Distributions

ABSTRACTCo-evolution analysis can be used to accurately predict residue-residue contacts from multiple sequence alignments. The introduction of machine-learning techniques has enabled substantial improvements in precision and a shift from predicting binary contacts to predicting distances between pairs of residues. These developments have significantly improved the accuracy of de novo prediction of static protein structures. Here we examine the potential of these residue-residue distance predictions to predict protein flexibility rather than static structure. We used DMPfold to predict distance distributions for every residue pair in a set of proteins that showed both rigid and flexible behaviour. Residue pairs that were in contact in at least one reference structure were considered and classified as rigid, flexible or neither. The predicted distance distribution of each residue pair was analysed for local maxima of probability indicating the most likely distance or distances between a pair of residues. The average number of local maxima per residue pair was found to be different between the sets of rigid and flexible residue pairs. Flexible residue pairs more often had multiple local maxima in their predicted distance distribution than rigid residue pairs suggesting that the shape of predicted distance distributions is predictive of rigidity or flexibility of residue pairs.

Download Full-text

De novo design of a reversible phosphorylation-dependent switch for membrane targeting

Nature Communications ◽

10.1038/s41467-021-21622-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Leon Harrington ◽

Jordan M. Fletcher ◽

Tamara Heermann ◽

Derek N. Woolfson ◽

Petra Schwille

Keyword(s):

Protein Interactions ◽

Lipid Membrane ◽

De Novo ◽

Protein Localization ◽

Protein Structures ◽

Spatiotemporal Pattern ◽

Membrane Targeting ◽

Protein Protein Interactions ◽

Reversible Phosphorylation ◽

Potential Applications

AbstractModules that switch protein-protein interactions on and off are essential to develop synthetic biology; for example, to construct orthogonal signaling pathways, to control artificial protein structures dynamically, and for protein localization in cells or protocells. In nature, the E. coli MinCDE system couples nucleotide-dependent switching of MinD dimerization to membrane targeting to trigger spatiotemporal pattern formation. Here we present a de novo peptide-based molecular switch that toggles reversibly between monomer and dimer in response to phosphorylation and dephosphorylation. In combination with other modules, we construct fusion proteins that couple switching to lipid-membrane targeting by: (i) tethering a ‘cargo’ molecule reversibly to a permanent membrane ‘anchor’; and (ii) creating a ‘membrane-avidity switch’ that mimics the MinD system but operates by reversible phosphorylation. These minimal, de novo molecular switches have potential applications for introducing dynamic processes into designed and engineered proteins to augment functions in living cells and add functionality to protocells.

Download Full-text

DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning

Scientific Reports ◽

10.1038/s41598-021-91827-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Farhan Quadir ◽

Raj S. Roy ◽

Randal Halfmann ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Tertiary Structure ◽

Protein Complexes ◽

Complex Structure ◽

Great Success ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Residue Contacts ◽

Evolutionary Features

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.

Download Full-text

Optimization in S-SAD phasing - difference between solved and unsolved structure

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314093863 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C613-C613

Author(s):

Jan Stránský ◽

Tomáš Kovaľ ◽

Lars Østergaard ◽

Jarmila Dušková ◽

Tereza Skálová ◽

...

Keyword(s):

Data Processing ◽

De Novo ◽

Protein Structures ◽

X Ray Diffraction ◽

X Ray ◽

Structure Solution ◽

Sad Phasing ◽

Optimal Values ◽

Grant Agency ◽

Intensity Reading

Development of X-ray diffraction technologies have made de novo phasing of protein structures by single-wavelength anomalous dispersion by sulphur (S-SAD) more common. As anomalous differences in the sulphur atomic factors are in the order of errors of measurement, careful intensity reading and data processing are crucial. S-SAD was used for de novo phasing of a small 12 kDa protein with 4 sulphur atoms per molecule at 2.3 Å, where the data did not enable a straightforward structure solution. Data processing was performed using XDS [1] and scaling using XSCALE. The sulphur substructure was determined by SHELXD [2] and phases were obtained from SHELXE [2]. Both algorithms strongly depend on input parameters and default values did not lead to the correct phases. Therefore a systematic search of optimal values of several parameters was used to find a solution. This method helped to confirm sulphur substructure and to differentiate the handedness of the solutions. Moreover, a script for comfortable conversion of SHELX outputs to MTZ format was developed, using programmes included in the CCP4 package [3]. The previously unsolvable protein structure was successfully resolved with the described procedure. This work was supported by the Grant Agency of the Czech Technical University in Prague, (SGS13/219/OHK4/3T/14), the Czech Science Foundation (P302/11/0855), project BIOCEV CZ.1.05/1.1.00/02.0109 from the ERDF.

Download Full-text

Get Phases from Arsenic Anomalous Scattering: de novo SAD Phasing of Two Protein Structures Crystallized in Cacodylate Buffer

PLoS ONE ◽

10.1371/journal.pone.0024227 ◽

2011 ◽

Vol 6 (9) ◽

pp. e24227 ◽

Cited By ~ 20

Author(s):

Xiang Liu ◽

Heng Zhang ◽

Xiao-Jun Wang ◽

Lan-Fen Li ◽

Xiao-Dong Su

Keyword(s):

De Novo ◽

Protein Structures ◽

Anomalous Scattering ◽

Cacodylate Buffer ◽

Sad Phasing

Download Full-text

CLP-based protein fragment assembly

Theory and Practice of Logic Programming ◽

10.1017/s1471068410000372 ◽

2010 ◽

Vol 10 (4-6) ◽

pp. 709-724 ◽

Cited By ~ 3

Author(s):

ALESSANDRO DAL PALÙ ◽

AGOSTINO DOVIER ◽

FEDERICO FOGOLARI ◽

ENRICO PONTELLI

Keyword(s):

Search Strategy ◽

Protein Structures ◽

Constraint Solving ◽

Side Chain ◽

Space Filling ◽

Protein Fragment ◽

Fragment Assembly ◽

Energy Models ◽

Novel Approach ◽

Protein Model

AbstractThe paper investigates a novel approach, based on Constraint Logic Programming (CLP), to predict the 3D conformation of a protein via fragments assembly. The fragments are extracted by a preprocessor—also developed for this work—from a database of known protein structures that clusters and classifies the fragments according to similarity and frequency. The problem of assembling fragments into a complete conformation is mapped to a constraint solving problem and solved using CLP. The constraint-based model uses a medium discretization degree Cα-side chain centroid protein model that offers efficiency and a good approximation for space filling. The approach and adapts existing energy models to the protein representation used and applies a large neighboring search strategy. The results shows the feasibility and efficiency of the method. The declarative nature of the solution allows to include future extensions, e.g., different size fragments for better accuracy.

Download Full-text

De Novo Protein Design of Photochemical Reaction Centers

10.21203/rs.3.rs-932621/v1 ◽

2021 ◽

Author(s):

Nathan Ennist ◽

Zhenyu Zhao ◽

Steven Stayrook ◽

Bohdana Discher ◽

P Leslie 'Les' Dutton ◽

...

Keyword(s):

Reaction Center ◽

Charge Separation ◽

Rational Design ◽

De Novo ◽

Metal Cluster ◽

Protein Structures ◽

Essential Elements ◽

Reaction Centers ◽

Transient Absorption Spectroscopy

Abstract Natural photosynthetic protein complexes capture sunlight to power the energetic catalysis that supports life on Earth. Yet these natural protein structures carry an evolutionary legacy of complexity and fragility that encumbers protein reengineering efforts and obfuscates the underlying design rules for light-driven charge separation. De novo development of a simplified photosynthetic reaction center protein can clarify practical engineering principles needed to build new enzymes for efficient solar-to-fuel energy conversion. Here we report the rational design, X-ray crystal structure, and electron transfer activity of a multi-cofactor protein that incorporates essential elements of photosynthetic reaction centers. This highly stable, modular artificial protein framework can be reconstituted in vitro with interchangeable redox centers for nanometer-scale photochemical charge separation. Transient absorption spectroscopy demonstrates Photosystem II-like tyrosine and metal cluster oxidation, and we measure charge separation lifetimes exceeding 100 ms, ideal for light-activated catalysis. This de novo-designed reaction center builds upon engineering guidelines established for charge separation in earlier synthetic photochemical triads and modified natural proteins, and it shows how synthetic biology may lead to a new generation of genetically encoded, light-powered catalysts for solar fuel production.

Download Full-text

FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution

10.1101/2020.04.13.037234 ◽

2020 ◽

Author(s):

Jiangyan Feng ◽

Diwakar Shukla

Keyword(s):

Ligand Binding ◽

Structure Prediction ◽

De Novo ◽

Three Dimensional ◽

Sequence Information ◽

Structural Constraints ◽

Complex Signals ◽

Residue Contacts ◽

Small Clusters ◽

Functional Mechanisms

AbstractProteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e. spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints, and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.

Download Full-text