scholarly journals Multiple Sequence Alignments as Tools for Protein Structure and Function Prediction

2003 ◽  
Vol 4 (4) ◽  
pp. 424-427 ◽  
Author(s):  
Alfonso Valencia

Multiple sequence alignments have much to offer to the understanding of protein structure, evolution and function. We are developing approaches to use this information in predicting protein-binding specificity, intra-protein and protein-protein interactions, and in reconstructing protein interaction networks.


2015 ◽  
Author(s):  
Hugo Jacquin ◽  
Amy Gilson ◽  
Eugene Shakhnovich ◽  
Simona Cocco ◽  
Rémi Monasson

Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of `true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons of the power of inverse approaches to the modelling of proteins from sequence data, and their limitations; we show, in particular, that their success crucially depend on the accurate inference of the Potts pairwise couplings.



2021 ◽  
Author(s):  
Liang Hong ◽  
Siqi Sun ◽  
Liangzhen Zheng ◽  
Qingxiong Tan ◽  
Yu Li

Evolutionarily related sequences provide information for the protein structure and function. Multiple sequence alignment, which includes homolog searching from large databases and sequence alignment, is efficient to dig out the information and assist protein structure and function prediction, whose efficiency has been proved by AlphaFold. Despite the existing tools for multiple sequence alignment, searching homologs from the entire UniProt is still time-consuming. Considering the success of AlphaFold, foreseeably, large- scale multiple sequence alignments against massive databases will be a trend in the field. It is very desirable to accelerate this step. Here, we propose a novel method, fastMSA, to improve the speed significantly. Our idea is orthogonal to all the previous accelerating methods. Taking advantage of the protein language model based on BERT, we propose a novel dual encoder architecture that can embed the protein sequences into a low-dimension space and filter the unrelated sequences efficiently before running BLAST. Extensive experimental results suggest that we can recall most of the homologs with a 34-fold speed-up. Moreover, our method is compatible with the downstream tasks, such as structure prediction using AlphaFold. Using multiple sequence alignments generated from our method, we have little performance compromise on the protein structure prediction with much less running time. fastMSA will effectively assist protein sequence, structure, and function analysis based on homologs and multiple sequence alignment.



2016 ◽  
Vol 113 (52) ◽  
pp. 15018-15023 ◽  
Author(s):  
Juan Rodriguez-Rivas ◽  
Simone Marsili ◽  
David Juan ◽  
Alfonso Valencia

Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.



2004 ◽  
Vol 01 (04) ◽  
pp. 711-741 ◽  
Author(s):  
SEE-KIONG NG ◽  
SOON-HENG TAN

The ongoing genomics and proteomics efforts have helped identify many new genes and proteins in living organisms. However, simply knowing the existence of genes and proteins does not tell us much about the biological processes in which they participate. Many major biological processes are controlled by protein interaction networks. A comprehensive description of protein–protein interactions is therefore necessary to understand the genetic program of life. In this tutorial, we provide an overview of the various current high-throughput methods for discovering protein–protein interactions, covering both the conventional experimental methods and new computational approaches.



2007 ◽  
Vol 189 (14) ◽  
pp. 5130-5141 ◽  
Author(s):  
Damon S. Anderson ◽  
Pratima Adhikari ◽  
Katherine D. Weaver ◽  
Alvin L. Crumbliss ◽  
Timothy A. Mietzner

ABSTRACT The obligate human pathogen Haemophilus influenzae utilizes a siderophore-independent (free) Fe3+ transport system to obtain this essential element from the host iron-binding protein transferrin. The hFbpABC transporter is a binding protein-dependent ABC transporter that functions to shuttle (free) Fe3+ through the periplasm and across the inner membrane of H. influenzae. This investigation focuses on the structure and function of the hFbpB membrane permease component of the transporter, a protein that has eluded prior characterization. Based on multiple-sequence alignments between permease orthologs, a series of site-directed mutations targeted at residues within the two conserved permease motifs were generated. The hFbpABC transporter was expressed in a siderophore-deficient Escherichia coli background, and effects of mutations were analyzed using growth rescue and radiolabeled 55Fe3+ transport assays. Results demonstrate that mutation of the invariant glycine (G418A) within motif 2 led to attenuated transport activity, while mutation of the invariant glycine (G155A/V/E) within motif 1 had no discernible effect on activity. Individual mutations of well-conserved leucines (L154D and L417D) led to attenuated and null transport activities, respectively. As a complement to site-directed methods, a mutant screen based on resistance to the toxic iron analog gallium, an hFbpABC inhibitor, was devised. The screen led to the identification of several significant hFbpB mutations; V497I, I174F, and S475I led to null transport activities, while S146Y resulted in attenuated activity. Significant residues were mapped to a topological model of the hFbpB permease, and the implications of mutations are discussed in light of structural and functional data from related ABC transporters.



2021 ◽  
Author(s):  
A. Alcalá ◽  
G. Riera ◽  
I. García ◽  
R. Alberich ◽  
M. Llabrés

AbstractMotivationSeveral protein-protein interaction networks (PPIN) aligners have been developed during the last 15 years. One of their goals is to help the functional annotation of proteins and the prediction of protein-protein interactions. A correct aligner must preserve the network’s topology as well as the biological coherence. However, this is a trade-off that is hard to achieve. In addition, most aligners require a considerable effort to use in practice and many researchers must choose an aligner without the opportunity to previously compare the performance of different aligners.ResultsWe developed PINAWeb, a user-friendly web-based tool to obtain and compare the results produced by the aligners: AligNet, HubAlign, L-GRAAL, PINALOG and SPINAL. PPINs can be uploaded either from the STRING database or from a user database. The source code of PINAWeb is freely available on GitHub to enable researchers to add other aligners, network databases or alignment score metrics. In addition, PINAWeb provides a report with the analysis for every alignment in terms of topological and functional information scores, as well as the visualization of the alignments’ comparison (agreement/differences) when more than one aligner are considered.Availabilityhttps://bioinfo.uib.es/~recerca/PINAWeb



2016 ◽  
Author(s):  
Juan Rodriguez-Rivas ◽  
Simone Marsili ◽  
David Juan ◽  
Alfonso Valencia

AbstractProtein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue co-evolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that co-evolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a novel domain-centred protocol to study the interplay between residue co-evolution and structural conservation of protein-protein interfaces. We show that sequence-based co-evolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence, where standard homology modelling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic co-evolutionary analysis to the prediction of eukaryotic interfaces further illustrates the potential of this novel approach.Significance statementInteracting proteins tend to co-evolve through interdependent changes at the interaction interface. This phenomenon leads to patterns of coordinated mutations that can be exploited to systematically predict contacts between interacting proteins in prokaryotes. We explore the hypothesis that co-evolving contacts at protein interfaces are preferentially conserved through long evolutionary periods. We demonstrate that co-evolving residues in prokaryotes identify inter-protein contacts that are particularly well conserved in the corresponding structure of their eukaryotic homologues. Therefore, these contacts have likely been important to maintain protein-protein interactions during evolution. We show that this property can be used to reliably predict interacting residues between eukaryotic proteins with homologues in prokaryotes even if they are very distantly related in sequence.



2015 ◽  
Vol 32 (6) ◽  
pp. 814-820 ◽  
Author(s):  
Gearóid Fox ◽  
Fabian Sievers ◽  
Desmond G. Higgins

Abstract Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.



2009 ◽  
Vol 37 (4) ◽  
pp. 768-771 ◽  
Author(s):  
David L. Robertson ◽  
Simon C. Lovell

Molecular function is the result of proteins working together, mediated by highly specific interactions. Maintenance and change of protein interactions can thus be considered one of the main links between molecular function and mutation. As a consequence, protein interaction datasets can be used to study functional evolution directly. In terms of constraining change, the co-evolution of interacting molecules is a very subtle process. This has implications for the signal being used to predict protein–protein interactions. In terms of functional change, the ‘rewiring’ of interaction networks, gene duplication is critically important. Interestingly, once duplication has occurred, the genes involved have different probabilities of being retained related to how they were generated. In the present paper, we discuss some of our recent work in this area.



2012 ◽  
Vol 13 (1) ◽  
pp. 55 ◽  
Author(s):  
Jan-Oliver Janda ◽  
Markus Busch ◽  
Fabian Kück ◽  
Mikhail Porfenenko ◽  
Rainer Merkl


Sign in / Sign up

Export Citation Format

Share Document