Efficient alternatives to PSI-BLAST

Abstract In this paper we present two algorithms that may serve as efficient alternatives to the well-known PSI BLAST tool: SeedBLAST and CTX-PSI Blast. Both may benefit from the knowledge about amino acid composition specific to a given protein family: SeedBLAST uses the advisedly designed seed, while CTX-PSI BLAST extends PSI BLAST with the context-specific substitution model. The seeding technique became central in the theory of sequence alignment. There are several efficient tools applying seeds to DNA homology search, but not to protein homology search. In this paper we fill this gap. We advocate the use of multiple subset seeds derived from a hierarchical tree of amino acid residues. Our method computes, by an evolutionary algorithm, seeds that are specifically designed for a given protein family. The seeds are represented by deterministic finite automata (DFAs) and built into the NCBI-BLAST software. This extended tool, named SeedBLAST, is compared to the original BLAST and PSI-BLAST on several protein families. Our results demonstrate a superiority of SeedBLAST in terms of efficiency, especially in the case of twilight zone hits. The contextual substitution model has been proven to increase sensitivity of protein alignment. In this paper we perform a next step in the contextual alignment program. We announce a contextual version of the PSI-BLAST algorithm, an iterative version of the NCBI-BLAST tool. The experimental evaluation has been performed demonstrating a significantly higher sensitivity compared to the ordinary PSI-BLAST algorithm.

Download Full-text

Reconstruction of Mycobacterial Dehalogenase Rv2579 by Cumulative Mutagenesis of Haloalkane Dehalogenase LinB

Applied and Environmental Microbiology ◽

10.1128/aem.69.4.2349-2355.2003 ◽

2003 ◽

Vol 69 (4) ◽

pp. 2349-2355 ◽

Cited By ~ 18

Author(s):

Yuji Nagata ◽

Zbyněk Prokop ◽

Soňa Marvanová ◽

Jana Sýkorová ◽

Marta Monincová ◽

...

Keyword(s):

Crystal Structure ◽

Mycobacterium Tuberculosis ◽

Amino Acid ◽

Active Site ◽

Homology Model ◽

Protein Family ◽

Sphingomonas Paucimobilis ◽

Amino Acid Residues ◽

Haloalkane Dehalogenase

ABSTRACT The homology model of protein Rv2579 from Mycobacterium tuberculosis H37Rv was compared with the crystal structure of haloalkane dehalogenase LinB from Sphingomonas paucimobilis UT26, and this analysis revealed that 6 of 19 amino acid residues which form an active site and entrance tunnel are different in LinB and Rv2579. To characterize the effect of replacement of these six amino acid residues, mutations were introduced cumulatively into the six amino acid residues of LinB. The sixfold mutant, which was supposed to have the active site of Rv2579, exhibited haloalkane dehalogenase activity with the haloalkanes tested, confirming that Rv2579 is a member of the haloalkane dehalogenase protein family.

Download Full-text

Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets

Genes ◽

10.3390/genes12091455 ◽

2021 ◽

Vol 12 (9) ◽

pp. 1455

Author(s):

Kazuki Takabatake ◽

Kazuki Izawa ◽

Motohiro Akikawa ◽

Keisuke Yanagisawa ◽

Masahito Ohue ◽

...

Keyword(s):

Next Generation Sequencing ◽

Amino Acid ◽

High Performance ◽

Large Scale ◽

Search Strategy ◽

Search Algorithm ◽

Homology Search ◽

Sequencing Data ◽

Ncbi Blast ◽

Generation Sequencing

Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.

Download Full-text

Evolution-Based Functional Decomposition of Proteins

10.1101/022525 ◽

2015 ◽

Cited By ~ 11

Author(s):

Olivier Rivoire ◽

Kimberly A. Reynolds ◽

Rama Ranganathan

Keyword(s):

Amino Acid ◽

Protein Function ◽

Biological Properties ◽

Protein Family ◽

Structural Basis ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Global Pattern ◽

Sector Analysis ◽

Statistical Coupling

The essential biological properties of proteins - folding, biochemical activities, and the capacity to adapt - arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function, but requires broad further testing by the scientific community. To facilitate this, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package. We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment - a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for understanding the structural basis for protein function and for generally testing the concept of sectors as the principal functional units of proteins.

Download Full-text

An Evolutionary Perspective of the Lipocalin Protein Family

Frontiers in Physiology ◽

10.3389/fphys.2021.718983 ◽

2021 ◽

Vol 12 ◽

Cited By ~ 1

Author(s):

Sergio Diez-Hermano ◽

Maria D. Ganfornina ◽

Arne Skerra ◽

Gabriel Gutiérrez ◽

Diego Sanchez

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Family ◽

Gene Duplications ◽

Amino Acid Residues ◽

Sequence Alignments ◽

New Members ◽

Multiple Amino Acid ◽

Sequence Similarities ◽

Tandem Duplications

The protein family of Lipocalins is ubiquitously present throughout the tree of life, with the exception of the phylum Archaea. Phylogenetic relationships of chordate Lipocalins have been proposed in the past based on protein sequence similarities, but their highly divergent primary structures and a shortage of experimental annotations in genome projects have precluded a well-supported hypothesis for their evolution. In this work we propose a novel topology for the phylogenetic tree of chordate Lipocalins, inferred from multiple amino acid sequence alignments. Sixteen jawed vertebrates with fair coverage by genomic sequencing were compared. The selected species span an evolutionary range of ∼400 million years, allowing for a balanced representation of all major vertebrate clades. A consensus phylogenetic tree is proposed following a comparison of sequence-based maximum-likelihood trees and protein structure dendrograms. This new phylogeny suggests an APOD-like common ancestor in early chordates, which gave rise, via whole-genome or tandem duplications, to the six Lipocalins currently present in fish (APOD, RBP4, PTGDS, AMBP, C8G, and APOM). Further gene duplications of APOM and PTGDS resulted in the altogether 15 Lipocalins found in contemporary mammals. Insights into the functional impact of relevant amino acid residues in early diverging Lipocalins are also discussed. These results should foster the experimental exploration of novel functions alongside the identification of new members of the Lipocalin family.

Download Full-text

Identification of amino acid residues in the C-terminal domain of the natriuretic peptide clearance receptor (NPR-C) that determine G protein coupling by site-directed mutagenesis

Gastroenterology ◽

10.1016/s0016-5085(01)82530-0 ◽

2001 ◽

Vol 120 (5) ◽

pp. A510-A510

Author(s):

H ZHOU ◽

K MURTHY

Keyword(s):

Amino Acid ◽

G Protein ◽

Natriuretic Peptide ◽

Site Directed Mutagenesis ◽

Directed Mutagenesis ◽

Amino Acid Residues ◽

G Protein Coupling ◽

Protein Coupling ◽

Terminal Domain ◽

Clearance Receptor

Download Full-text

Identification of amino acid residues within the E3/19K protein of adenovirus type 2 critically involved in HLA binding

Immunology Letters ◽

10.1016/s0165-2478(97)88605-1 ◽

1997 ◽

Vol 56 (1-3) ◽

pp. 435

Author(s):

M Sester

Keyword(s):

Amino Acid ◽

Adenovirus Type ◽

Amino Acid Residues ◽

Hla Binding ◽

Adenovirus Type 2

Download Full-text

Construction of a single chain Fv (scFv195) antibody fragment against the human acetylcholine receptor. Contribution of light chain amino acid residues in receptor recognition

Immunology Letters ◽

10.1016/s0165-2478(97)88354-x ◽

1997 ◽

Vol 56 (1-3) ◽

pp. 375-376

Author(s):

P Tsantili

Keyword(s):

Amino Acid ◽

Acetylcholine Receptor ◽

Light Chain ◽

Antibody Fragment ◽

Amino Acid Residues ◽

Single Chain ◽

Single Chain Fv ◽

Receptor Recognition

Download Full-text

Substrate Recognition by Vitamin K-Dependent Carboxylase

Thrombosis and Haemostasis ◽

10.1055/s-0038-1651053 ◽

1987 ◽

Vol 57 (01) ◽

pp. 017-019 ◽

Cited By ~ 2

Author(s):

Magda M W Ulrich ◽

Berry A M Soute ◽

L Johan M van Haarlem ◽

Cees Vermeer

Keyword(s):

Amino Acid ◽

Vitamin K ◽

Substrate Recognition ◽

Bovine Liver ◽

Amino Acid Residues

SummaryDecarboxylated osteocalcins were prepared and purified from bovine, chicken, human and monkey bones and assayed for their ability to serve as a substrate for vitamin K-dependent carboxylase from bovine liver. Substantial differences were observed, especially between bovine and monkey d-osteocalcin. Since these substrates differ only in their amino acid residues 3 and 4, it seems that these residues play a role in the recognition of a substrate by hepatic carboxylase.

Download Full-text

Spectral Changes Induced by Alkaline pH and Specific Chemical Modification of Amino Acid Residues in the Light-Harvesting II Antenna Complex from Ectothiorhodospira sp.

Photochemistry and Photobiology ◽

10.1562/0031-8655(1999)069<0275:scibap>2.3.co;2 ◽

1999 ◽

Vol 69 (3) ◽

pp. 275 ◽

Cited By ~ 2

Author(s):

André Buche ◽

Rafael Picorel

Keyword(s):

Amino Acid ◽

Chemical Modification ◽

Light Harvesting ◽

Amino Acid Residues ◽

Alkaline Ph ◽

Specific Chemical ◽

Antenna Complex ◽

Spectral Changes ◽

Specific Chemical Modification

Download Full-text

Statistical Force-Field for Structural Modeling Using Chemical Cross-Linking/mass Spectrometry Distance Constraints

10.26434/chemrxiv.6030563 ◽

2018 ◽

Author(s):

Allan J. R. Ferrari ◽

Fabio C. Gozzo ◽

Leandro Martinez

Keyword(s):

Mass Spectrometry ◽

Amino Acid ◽

Force Field ◽

Protein Structures ◽

Protein Structure Determination ◽

Structural Modeling ◽

Cross Linking ◽

Amino Acid Residues ◽

Distance Constraints ◽

Chemical Cross Linking

<div><p>Chemical cross-linking/Mass Spectrometry (XLMS) is an experimental method to obtain distance constraints between amino acid residues, which can be applied to structural modeling of tertiary and quaternary biomolecular structures. These constraints provide, in principle, only upper limits to the distance between amino acid residues along the surface of the biomolecule. In practice, attempts to use of XLMS constraints for tertiary protein structure determination have not been widely successful. This indicates the need of specifically designed strategies for the representation of these constraints within modeling algorithms. Here, a force-field designed to represent XLMS-derived constraints is proposed. The potential energy functions are obtained by computing, in the database of known protein structures, the probability of satisfaction of a topological cross-linking distance as a function of the Euclidean distance between amino acid residues. The force-field can be easily incorporated into current modeling methods and software. In this work, the force-field was implemented within the Rosetta ab initio relax protocol. We show a significant improvement in the quality of the models obtained relative to current strategies for constraint representation. This force-field contributes to the long-desired goal of obtaining the tertiary structures of proteins using XLMS data. Force-field parameters and usage instructions are freely available at http://m3g.iqm.unicamp.br/topolink/xlff <br></p></div><p></p><p></p>

Download Full-text