scholarly journals Efficient alternatives to PSI-BLAST

2012 ◽  
Vol 60 (3) ◽  
pp. 495-505
Author(s):  
M. Startek ◽  
S. Lasota ◽  
M. Sykulski ◽  
A. Bułak ◽  
L. Noé ◽  
...  

Abstract In this paper we present two algorithms that may serve as efficient alternatives to the well-known PSI BLAST tool: SeedBLAST and CTX-PSI Blast. Both may benefit from the knowledge about amino acid composition specific to a given protein family: SeedBLAST uses the advisedly designed seed, while CTX-PSI BLAST extends PSI BLAST with the context-specific substitution model. The seeding technique became central in the theory of sequence alignment. There are several efficient tools applying seeds to DNA homology search, but not to protein homology search. In this paper we fill this gap. We advocate the use of multiple subset seeds derived from a hierarchical tree of amino acid residues. Our method computes, by an evolutionary algorithm, seeds that are specifically designed for a given protein family. The seeds are represented by deterministic finite automata (DFAs) and built into the NCBI-BLAST software. This extended tool, named SeedBLAST, is compared to the original BLAST and PSI-BLAST on several protein families. Our results demonstrate a superiority of SeedBLAST in terms of efficiency, especially in the case of twilight zone hits. The contextual substitution model has been proven to increase sensitivity of protein alignment. In this paper we perform a next step in the contextual alignment program. We announce a contextual version of the PSI-BLAST algorithm, an iterative version of the NCBI-BLAST tool. The experimental evaluation has been performed demonstrating a significantly higher sensitivity compared to the ordinary PSI-BLAST algorithm.

2003 ◽  
Vol 69 (4) ◽  
pp. 2349-2355 ◽  
Author(s):  
Yuji Nagata ◽  
Zbyněk Prokop ◽  
Soňa Marvanová ◽  
Jana Sýkorová ◽  
Marta Monincová ◽  
...  

ABSTRACT The homology model of protein Rv2579 from Mycobacterium tuberculosis H37Rv was compared with the crystal structure of haloalkane dehalogenase LinB from Sphingomonas paucimobilis UT26, and this analysis revealed that 6 of 19 amino acid residues which form an active site and entrance tunnel are different in LinB and Rv2579. To characterize the effect of replacement of these six amino acid residues, mutations were introduced cumulatively into the six amino acid residues of LinB. The sixfold mutant, which was supposed to have the active site of Rv2579, exhibited haloalkane dehalogenase activity with the haloalkanes tested, confirming that Rv2579 is a member of the haloalkane dehalogenase protein family.


Genes ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1455
Author(s):  
Kazuki Takabatake ◽  
Kazuki Izawa ◽  
Motohiro Akikawa ◽  
Keisuke Yanagisawa ◽  
Masahito Ohue ◽  
...  

Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.


2015 ◽  
Author(s):  
Olivier Rivoire ◽  
Kimberly A. Reynolds ◽  
Rama Ranganathan

The essential biological properties of proteins - folding, biochemical activities, and the capacity to adapt - arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function, but requires broad further testing by the scientific community. To facilitate this, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package. We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment - a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for understanding the structural basis for protein function and for generally testing the concept of sectors as the principal functional units of proteins.


2021 ◽  
Vol 12 ◽  
Author(s):  
Sergio Diez-Hermano ◽  
Maria D. Ganfornina ◽  
Arne Skerra ◽  
Gabriel Gutiérrez ◽  
Diego Sanchez

The protein family of Lipocalins is ubiquitously present throughout the tree of life, with the exception of the phylum Archaea. Phylogenetic relationships of chordate Lipocalins have been proposed in the past based on protein sequence similarities, but their highly divergent primary structures and a shortage of experimental annotations in genome projects have precluded a well-supported hypothesis for their evolution. In this work we propose a novel topology for the phylogenetic tree of chordate Lipocalins, inferred from multiple amino acid sequence alignments. Sixteen jawed vertebrates with fair coverage by genomic sequencing were compared. The selected species span an evolutionary range of ∼400 million years, allowing for a balanced representation of all major vertebrate clades. A consensus phylogenetic tree is proposed following a comparison of sequence-based maximum-likelihood trees and protein structure dendrograms. This new phylogeny suggests an APOD-like common ancestor in early chordates, which gave rise, via whole-genome or tandem duplications, to the six Lipocalins currently present in fish (APOD, RBP4, PTGDS, AMBP, C8G, and APOM). Further gene duplications of APOM and PTGDS resulted in the altogether 15 Lipocalins found in contemporary mammals. Insights into the functional impact of relevant amino acid residues in early diverging Lipocalins are also discussed. These results should foster the experimental exploration of novel functions alongside the identification of new members of the Lipocalin family.


1987 ◽  
Vol 57 (01) ◽  
pp. 017-019 ◽  
Author(s):  
Magda M W Ulrich ◽  
Berry A M Soute ◽  
L Johan M van Haarlem ◽  
Cees Vermeer

SummaryDecarboxylated osteocalcins were prepared and purified from bovine, chicken, human and monkey bones and assayed for their ability to serve as a substrate for vitamin K-dependent carboxylase from bovine liver. Substantial differences were observed, especially between bovine and monkey d-osteocalcin. Since these substrates differ only in their amino acid residues 3 and 4, it seems that these residues play a role in the recognition of a substrate by hepatic carboxylase.


2018 ◽  
Author(s):  
Allan J. R. Ferrari ◽  
Fabio C. Gozzo ◽  
Leandro Martinez

<div><p>Chemical cross-linking/Mass Spectrometry (XLMS) is an experimental method to obtain distance constraints between amino acid residues, which can be applied to structural modeling of tertiary and quaternary biomolecular structures. These constraints provide, in principle, only upper limits to the distance between amino acid residues along the surface of the biomolecule. In practice, attempts to use of XLMS constraints for tertiary protein structure determination have not been widely successful. This indicates the need of specifically designed strategies for the representation of these constraints within modeling algorithms. Here, a force-field designed to represent XLMS-derived constraints is proposed. The potential energy functions are obtained by computing, in the database of known protein structures, the probability of satisfaction of a topological cross-linking distance as a function of the Euclidean distance between amino acid residues. The force-field can be easily incorporated into current modeling methods and software. In this work, the force-field was implemented within the Rosetta ab initio relax protocol. We show a significant improvement in the quality of the models obtained relative to current strategies for constraint representation. This force-field contributes to the long-desired goal of obtaining the tertiary structures of proteins using XLMS data. Force-field parameters and usage instructions are freely available at http://m3g.iqm.unicamp.br/topolink/xlff <br></p></div><p></p><p></p>


Sign in / Sign up

Export Citation Format

Share Document