Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning

AbstractCoevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop two strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. An analysis of these data shows that the strongest coevolutionary couplings, i.e. those used by Direct Coupling Analysis to predict contacts, are only weakly influenced by phylogeny. However, phylogeny-induced spurious couplings are of similar size to the bulk of coevolutionary couplings, and dissecting functional from phylogeny-induced couplings might lead to more accurate contact predictions in the range of intermediate-size couplings.The code is available at https://github.com/ed-rodh/Null_models_I_and_II.Author summaryMany homologous protein families contain thousands of highly diverged amino-acid sequences, which fold in close-to-identical three-dimensional structures and fulfill almost identical biological tasks. Global coevolutionary models, like those inferred by the Direct Coupling Analysis (DCA), assume that families can be considered as samples of some unknown statistical model, and that the parameters of these models represent evolutionary constraints acting on protein sequences. To learn these models from data, DCA and related approaches have to also assume that the distinct sequences in a protein family are close to independent, while in reality they are characterized by involved hierarchical phylogenetic relationships. Here we propose Null models for sequence alignments, which maintain patterns of amino-acid conservation and phylogeny contained in the data, but destroy any coevolutionary couplings, frequently used in protein structure prediction. We find that phylogeny actually induces spurious non-zero couplings. These are, however, significantly smaller that the largest couplings derived from natural sequences, and therefore have only little influence on the first predicted contacts. However, in the range of intermediate couplings, they may lead to statistically significant effects. Dissecting phylogenetic from functional couplings might therefore extend the range of accurately predicted structural contacts down to smaller coupling strengths than those currently used.

Download Full-text

Deep Neural Network for Protein Contact Prediction by Weighting Sequences in a Multiple Sequence Alignment

10.1101/331926 ◽

2018 ◽

Author(s):

Hiroyuki Fukuda ◽

Kentaro Tomii

Keyword(s):

Neural Network ◽

Supervised Learning ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Deep Neural Network ◽

Multiple Sequence ◽

Contact Prediction ◽

Meta Learning ◽

Correlation Information

AbstractProtein contact prediction is a crucially important step for protein structure prediction. To predict a contact, approaches of two types are used: evolutionary coupling analysis (ECA) and supervised learning. ECA uses a large multiple sequence alignment (MSA) of homologue sequences and extract correlation information between residues. Supervised learning uses ECA analysis results as input features and can produce higher accuracy. As described herein, we present a new approach to contact prediction which can both extract correlation information and predict contacts in a supervised manner directly from MSA using a deep neural network (DNN). Using DNN, we can obtain higher accuracy than with earlier ECA methods. Simultaneously, we can weight each sequence in MSA to eliminate noise sequences automatically in a supervised way. It is expected that the combination of our method and other meta-learning methods can provide much higher accuracy of contact prediction.

Download Full-text

Assessing the accuracy of direct-coupling analysis for RNA contact prediction

RNA ◽

10.1261/rna.074179.119 ◽

2020 ◽

Vol 26 (5) ◽

pp. 637-647 ◽

Cited By ~ 4

Author(s):

Francesca Cuturello ◽

Guido Tiana ◽

Giovanni Bussi

Keyword(s):

Direct Coupling ◽

Coupling Analysis ◽

Contact Prediction ◽

Direct Coupling Analysis

Download Full-text

Faculty Opinions recommendation of Assessing the accuracy of direct-coupling analysis for RNA contact prediction.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.737468653.793577502 ◽

2020 ◽

Author(s):

Janusz Bujnicki ◽

Pritha Ghosh

Keyword(s):

Direct Coupling ◽

Coupling Analysis ◽

Contact Prediction ◽

Direct Coupling Analysis

Download Full-text

PconsC4: fast, free, easy, and accurate contact predictions

10.1101/383133 ◽

2018 ◽

Cited By ~ 2

Author(s):

Mirco Michel ◽

David Menéndez Hurtado ◽

Arne Elofsson

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Prediction Methods ◽

Coupling Analysis ◽

Learning Methods ◽

Contact Prediction ◽

Residue Contact ◽

Direct Coupling Analysis ◽

Computationally Expensive ◽

Contact Predictions

AbstractMotivationResidue contact prediction was revolutionized recently by the introduction of direct coupling analysis (DCA). Further improvements, in particular for small families, have been obtained by the combination of DCA and deep learning methods. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive.ResultsHere, we introduce a novel contact predictor, PconsC4, which performs on par with state of the art methods. PconsC4 is heavily optimized, does not use any external programs and therefore is significantly faster and easier to use than other methods.AvailabilityPconsC4 is freely available under the GPL license from https://github.com/ElofssonLab/PconsC4. Installation is easy using the pip command and works on any system with Python 3.5 or later and a modern GCC [email protected]

Download Full-text

Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites

Nucleic Acids Research ◽

10.1093/nar/gkz536 ◽

2019 ◽

Vol 47 (16) ◽

pp. e94-e94

Author(s):

Donghyo Kim ◽

Seong Kyu Han ◽

Kwanghwan Lee ◽

Inhae Kim ◽

JungHo Kong ◽

...

Keyword(s):

Association Studies ◽

Genome Wide Association Studies ◽

Loss Of Function ◽

Coupling Analysis ◽

Mutation Site ◽

Genome Wide ◽

Species Specific ◽

Evolutionary Coupling ◽

The Impact ◽

The Relationship

Abstract Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.

Download Full-text

Three-body interactions improve contact prediction within direct-coupling analysis

Physical Review E ◽

10.1103/physreve.96.052405 ◽

2017 ◽

Vol 96 (5) ◽

Cited By ~ 6

Author(s):

Michael Schmidt ◽

Kay Hamacher

Keyword(s):

Direct Coupling ◽

Coupling Analysis ◽

Contact Prediction ◽

Direct Coupling Analysis ◽

Three Body

Download Full-text