Phylogenetic correlations have limited effect on coevolution-based contact prediction in proteins

AbstractCoevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop two strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. An analysis of these data shows that the strongest coevolutionary couplings, i.e. those used by Direct Coupling Analysis to predict contacts, are only weakly influenced by phylogeny. However, phylogeny-induced spurious couplings are of similar size to the bulk of coevolutionary couplings, and dissecting functional from phylogeny-induced couplings might lead to more accurate contact predictions in the range of intermediate-size couplings.The code is available at https://github.com/ed-rodh/Null_models_I_and_II.Author summaryMany homologous protein families contain thousands of highly diverged amino-acid sequences, which fold in close-to-identical three-dimensional structures and fulfill almost identical biological tasks. Global coevolutionary models, like those inferred by the Direct Coupling Analysis (DCA), assume that families can be considered as samples of some unknown statistical model, and that the parameters of these models represent evolutionary constraints acting on protein sequences. To learn these models from data, DCA and related approaches have to also assume that the distinct sequences in a protein family are close to independent, while in reality they are characterized by involved hierarchical phylogenetic relationships. Here we propose Null models for sequence alignments, which maintain patterns of amino-acid conservation and phylogeny contained in the data, but destroy any coevolutionary couplings, frequently used in protein structure prediction. We find that phylogeny actually induces spurious non-zero couplings. These are, however, significantly smaller that the largest couplings derived from natural sequences, and therefore have only little influence on the first predicted contacts. However, in the range of intermediate couplings, they may lead to statistically significant effects. Dissecting phylogenetic from functional couplings might therefore extend the range of accurately predicted structural contacts down to smaller coupling strengths than those currently used.

Download Full-text

Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction

Biophysics and Physicobiology ◽

10.2142/biophysico.12.0_117 ◽

2015 ◽

Vol 12 (0) ◽

pp. 117-119 ◽

Cited By ~ 1

Author(s):

Akira R. Kinjo

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Direct Coupling ◽

Coupling Analysis ◽

Multiple Sequence ◽

Liquid Theory ◽

Direct Coupling Analysis

Download Full-text

Protein structure prediction and design in a biologically-realistic implicit membrane

10.1101/630715 ◽

2019 ◽

Author(s):

Rebecca F. Alford ◽

Patrick J. Fleming ◽

Karen G. Fleming ◽

Jeffrey J. Gray

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Membrane Proteins ◽

Membrane Protein ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

De Novo ◽

Computational Design ◽

Amino Acid Distribution

ABSTRACTProtein design is a powerful tool for elucidating mechanisms of function and engineering new therapeutics and nanotechnologies. While soluble protein design has advanced, membrane protein design remains challenging due to difficulties in modeling the lipid bilayer. In this work, we developed an implicit approach that captures the anisotropic structure, shape of water-filled pores, and nanoscale dimensions of membranes with different lipid compositions. The model improves performance in computational bench-marks against experimental targets including prediction of protein orientations in the bilayer, ΔΔG calculations, native structure dis-crimination, and native sequence recovery. When applied to de novo protein design, this approach designs sequences with an amino acid distribution near the native amino acid distribution in membrane proteins, overcoming a critical flaw in previous membrane models that were prone to generating leucine-rich designs. Further, the proteins designed in the new membrane model exhibit native-like features including interfacial aromatic side chains, hydrophobic lengths compatible with bilayer thickness, and polar pores. Our method advances high-resolution membrane protein structure prediction and design toward tackling key biological questions and engineering challenges.Significance StatementMembrane proteins participate in many life processes including transport, signaling, and catalysis. They constitute over 30% of all proteins and are targets for over 60% of pharmaceuticals. Computational design tools for membrane proteins will transform the interrogation of basic science questions such as membrane protein thermodynamics and the pipeline for engineering new therapeutics and nanotechnologies. Existing tools are either too expensive to compute or rely on manual design strategies. In this work, we developed a fast and accurate method for membrane protein design. The tool is available to the public and will accelerate the experimental design pipeline for membrane proteins.

Download Full-text

Protein Structure Prediction: Recognition of Primary, Secondary, and Tertiary Structural Features from Amino Acid Sequence

Critical Reviews in Biochemistry and Molecular Biology ◽

10.3109/10409239509085139 ◽

1995 ◽

Vol 30 (1) ◽

pp. 1-94 ◽

Cited By ~ 105

Author(s):

Frank Eisenhaber ◽

Bengt Persson ◽

Patrick Argos

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Amino Acid Sequence ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Structural Features

Download Full-text

LZerD Protein-Protein Docking Webserver Enhanced With de novo Structure Prediction

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.724947 ◽

2021 ◽

Vol 8 ◽

Author(s):

Charles Christoffer ◽

Vijay Bharadwaj ◽

Ryan Luu ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Protein Complexes ◽

Protein Sequences ◽

Data Bank ◽

Protein Docking ◽

Functional Mechanisms ◽

Established Technique

Protein-protein docking is a useful tool for modeling the structures of protein complexes that have yet to be experimentally determined. Understanding the structures of protein complexes is a key component for formulating hypotheses in biophysics regarding the functional mechanisms of complexes. Protein-protein docking is an established technique for cases where the structures of the subunits have been determined. While the number of known structures deposited in the Protein Data Bank is increasing, there are still many cases where the structures of individual proteins that users want to dock are not determined yet. Here, we have integrated the AttentiveDist method for protein structure prediction into our LZerD webserver for protein-protein docking, which enables users to simply submit protein sequences and obtain full-complex atomic models, without having to supply any structure themselves. We have further extended the LZerD docking interface with a symmetrical homodimer mode. The LZerD server is available at https://lzerd.kiharalab.org/.

Download Full-text

Study of Real-Valued Distance Prediction For Protein Structure Prediction with Deep Learning

10.1101/2020.11.26.400523 ◽

2020 ◽

Author(s):

Jin Li ◽

Jinbo Xu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

3D Structure ◽

Prediction Method ◽

Structure Modeling ◽

Contact Prediction ◽

Real Value ◽

3D Structure Modeling ◽

Distance Prediction

AbstractInter-residue distance prediction by deep ResNet (convolutional residual neural network) has greatly advanced protein structure prediction. Currently the most successful structure prediction methods predict distance by discretizing it into dozens of bins. Here we study how well real-valued distance can be predicted and how useful it is for 3D structure modeling by comparing it with discrete-valued prediction based upon the same deep ResNet. Different from the recent methods that predict only a single real value for the distance of an atom pair, we predict both the mean and standard deviation of a distance and then employ a novel method to fold a protein by the predicted mean and deviation. Our findings include: 1) tested on the CASP13 FM (free-modeling) targets, our real-valued distance prediction obtains 81% precision on top L/5 long-range contact prediction, much better than the best CASP13 results (70%); 2) our real-valued prediction can predict correct folds for the same number of CASP13 FM targets as the best CASP13 group, despite generating only 20 decoys for each target; 3) our method greatly outperforms a very new real-valued prediction method DeepDist in both contact prediction and 3D structure modeling; and 4) when the same deep ResNet is used, our real-valued distance prediction has 1-6% higher contact and distance accuracy than our own discrete-valued prediction, but less accurate 3D structure models.

Download Full-text

Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design

10.1101/2021.08.24.457549 ◽

2021 ◽

Cited By ~ 1

Author(s):

Lewis Moffat ◽

Joe G. Greener ◽

David T. Jones

Keyword(s):

Protein Structure ◽

Ab Initio ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

Predictive Power ◽

Protein Sequences ◽

Supervised Methods ◽

New Generation ◽

Novel Protein

AbstractThe prediction of protein structure and the design of novel protein sequences and structures have long been intertwined. The recently released AlphaFold has heralded a new generation of accurate protein structure prediction, but the extent to which this affects protein design stands yet unexplored. Here we develop a rapid and effective approach for fixed backbone computational protein design, leveraging the predictive power of AlphaFold. For several designs we demonstrate that not only are the AlphaFold predicted structures in agreement with the desired backbones, but they are also supported by the structure predictions of other supervised methods as well as ab initio folding. These results suggest that AlphaFold, and methods like it, are able to facilitate the development of a new range of novel and accurate protein design methodologies.

Download Full-text

Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008798 ◽

2021 ◽

Vol 17 (4) ◽

pp. e1008798

Author(s):

Claudio Bassot ◽

Arne Elofsson

Keyword(s):

Deep Learning ◽

Protein Structure ◽

High Accuracy ◽

Unique Sequence ◽

Direct Coupling ◽

Protein Families ◽

Coupling Analysis ◽

Repeat Proteins ◽

Eukaryotic Proteomes ◽

Direct Coupling Analysis

Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.

Download Full-text

Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences

Journal of Computational Physics ◽

10.1016/j.jcp.2014.07.024 ◽

2014 ◽

Vol 276 ◽

pp. 341-356 ◽

Cited By ~ 85

Author(s):

Magnus Ekeberg ◽

Tuomo Hartonen ◽

Erik Aurell

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Amino Acid Sequences ◽

Direct Coupling ◽

Coupling Analysis ◽

Direct Coupling Analysis ◽

Homologous Amino Acid

Download Full-text

Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de-novo protein structure prediction

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.21779 ◽

2007 ◽

Vol 71 (2) ◽

pp. 950-959 ◽

Cited By ~ 14

Author(s):

Gail J. Bartlett ◽

William R. Taylor

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Coupling Analysis ◽

Statistical Coupling ◽

Statistical Coupling Analysis

Download Full-text

Protein Structure Prediction Based on Improved Genetic Algorithm

International Journal of Environmental Science and Development ◽

10.18178/ijesd.2020.11.9.1289 ◽

2020 ◽

Vol 11 (9) ◽

pp. 450-454

Author(s):

Jiaxi Liu ◽

Keyword(s):

Genetic Algorithm ◽

Protein Structure ◽

Amino Acid ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Improved Genetic Algorithm ◽

Research Areas

The prediction of protein three-dimensional structure from amino acid sequence has been a challenge problem in bioinformatics, owing to the many potential applications for robust protein structure prediction methods. Protein structure prediction is essential to bioscience, and its research results are important for other research areas. Methods for the prediction an才d design of protein structures have advanced dramatically. The prediction of protein structure based on average hydrophobic values is discussed and an improved genetic algorithm is proposed to solve the optimization problem of hydrophobic protein structure prediction. An adjustment operator is designed with the average hydrophobic value to prevent the overlapping of amino acid positions. Finally, some numerical experiments are conducted to verify the feasibility and effectiveness of the proposed algorithm by comparing with the traditional HNN algorithm.

Download Full-text