scholarly journals HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families

2001 ◽  
Vol 17 (8) ◽  
pp. 748-749 ◽  
Author(s):  
P. I. W. de Bakker ◽  
A. Bateman ◽  
D. F. Burke ◽  
R. N. Miguel ◽  
K. Mizuguchi ◽  
...  
2014 ◽  
Vol 23 (9) ◽  
pp. 1220-1234 ◽  
Author(s):  
Jimin Pei ◽  
Wenlin Li ◽  
Lisa N. Kinch ◽  
Nick V. Grishin

eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Jérôme Tubiana ◽  
Simona Cocco ◽  
Rémi Monasson

Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (α-helixes and β-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and 'turning up' or 'turning down' the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype–phenotype relationship for protein families.


1988 ◽  
Vol 2 (3) ◽  
pp. 193-199 ◽  
Author(s):  
D. Altschuh ◽  
T. Vernet ◽  
P. Berti ◽  
D. Moras ◽  
K. Nagai

2019 ◽  
Vol 13 ◽  
pp. 117793221882136 ◽  
Author(s):  
Atul Kumar Upadhyay ◽  
Ramanathan Sowdhamini

Computational approaches to high-throughput data are gaining importance because of explosion of sequences in the post-genomic era. This explosion of sequence data creates a huge gap among the domains of sequence structure and function, since the experimental techniques to determine the structure and function are very expensive, time taking, and laborious in nature. Therefore, there is an urgent need to emphasize on the development of computational approaches in the field of biological systems. Engagement of proteins in quaternary arrangements, such as domain swapping, might be relevant for higher compatibility of such genes at stress conditions. In this study, the capacity to engage in domain swapping was predicted from mere sequence information in the whole genome of holy Basil ( Ocimum tenuiflorum), which is well known to be an anti-stress agent. Approximately, one-fourth of the proteins of O tenuiflorum are predicted to undergo three-dimensional (3D)-domain swapping. Furthermore, function annotation was carried out on all the predicted domain-swap sequences from the O tenuiflorum and Arabidopsis thaliana for their distribution in different Pfam protein families and gene ontology (GO) terms. These domain-swapped protein sequences are associated with many Pfam protein families with a wide range of GO annotation terms. A comparative analysis of domain-swap-predicted sequences in O tenuiflorum with gene products in A thaliana reveals that around 26% (2522 sequences) are close homologues across the 2 genomes. Functional annotation of predicted domain-swapped sequences infers that predicted domain-swap sequences are involved in diverse molecular functions, such as in gene regulation of abiotic stress conditions and adaptation to different environmental niches. Finally, the positively predicted sequences of A thaliana and O tenuiflorum were also examined for their presence in stress regulome, as recorded in our STIFDB database, to check the involvement of these proteins in different abiotic stresses.


2017 ◽  
Vol 114 (13) ◽  
pp. E2662-E2671 ◽  
Author(s):  
Guido Uguzzoni ◽  
Shalini John Lovis ◽  
Francesco Oteri ◽  
Alexander Schug ◽  
Hendrik Szurmant ◽  
...  

Proteins have evolved to perform diverse cellular functions, from serving as reaction catalysts to coordinating cellular propagation and development. Frequently, proteins do not exert their full potential as monomers but rather undergo concerted interactions as either homo-oligomers or with other proteins as hetero-oligomers. The experimental study of such protein complexes and interactions has been arduous. Theoretical structure prediction methods are an attractive alternative. Here, we investigate homo-oligomeric interfaces by tracing residue coevolution via the global statistical direct coupling analysis (DCA). DCA can accurately infer spatial adjacencies between residues. These adjacencies can be included as constraints in structure prediction techniques to predict high-resolution models. By taking advantage of the ongoing exponential growth of sequence databases, we go significantly beyond anecdotal cases of a few protein families and apply DCA to a systematic large-scale study of nearly 2,000 Pfam protein families with sufficient sequence information and structurally resolved homo-oligomeric interfaces. We find that large interfaces are commonly identified by DCA. We further demonstrate that DCA can differentiate between subfamilies with different binding modes within one large Pfam family. Sequence-derived contact information for the subfamilies proves sufficient to assemble accurate structural models of the diverse protein-oligomers. Thus, we provide an approach to investigate oligomerization for arbitrary protein families leading to structural models complementary to often-difficult experimental methods. Combined with ever more abundant sequential data, we anticipate that this study will be instrumental to allow the structural description of many heteroprotein complexes in the future.


2016 ◽  
Author(s):  
Alice Coucke ◽  
Guido Uguzzoni ◽  
Francesco Oteri ◽  
Simona Cocco ◽  
Remi Monasson ◽  
...  

AbstractCoevolution of residues in contact imposes strong statistical constraints on the sequence variability between homologous proteins. Direct-Coupling Analysis (DCA), a global statistical inference method, successfully models this variability across homologous protein families to infer structural information about proteins. For each residue pair, DCA infers 21×21 matrices describing the coevolutionary coupling for each pair of amino acids (or gaps). To achieve the residue-residue contact prediction, these matrices are mapped onto simple scalar parameters; the full information they contain gets lost. Here, we perform a detailed spectral analysis of the coupling matrices resulting from 70 protein families, to show that they contain quantitative information about the physico-chemical properties of amino-acid interactions. Results for protein families are corroborated by the analysis of synthetic data from lattice-protein models, which emphasizes the critical effect of sampling quality and regularization on the biochemical features of the statistical coupling matrices.


2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Juliana S Bernardes ◽  
Fabio RJ Vieira ◽  
Lygia MM Costa ◽  
Gerson Zaverucha

Sign in / Sign up

Export Citation Format

Share Document