Fast and sensitive protein sequence homology searches using hierarchical cluster BLAST

Mapping Intimacies ◽

10.1101/426098 ◽

2018 ◽

Author(s):

Daniel J. Nasko ◽

K. Eric Wommack ◽

Barbra D. Ferrell ◽

Shawn W. Polson

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Hierarchical Cluster ◽

Unintended Consequence ◽

Amino Acid Sequence Homology ◽

Homology Searching ◽

Large Databases ◽

Protein Sequence Homology ◽

Reference Databases ◽

Sequence Similarities

AbstractThe throughput of DNA sequencing continues to increase, allowing researchers to analyze genomes of interest at greater depths. An unintended consequence of this data deluge is the increased cost of analyzing these datasets. As a result, genome and metagenome annotation pipelines are left with a few options: (i) search against smaller reference databases, (ii) use faster, but less sensitive, algorithms to assess sequence similarities, or (iii) invest in computing hardware specifically designed to improve BLAST searches such as GPGPU systems and/or large CPU-rich clusters.We present a pipeline that improves the speed of amino acid sequence homology searches with a minimal decrease in sensitivity and specificity by searching against hierarchical clusters. Briefly, the pipeline requires two homology searches: the first search is against a clustered version of the database and the second is against sequences belonging to clusters with a hit from the first search. We tested this method using two assembled viral metagenomes and three databases (Swiss-Prot, Metagenomes Online, and UniRef100). Hierarchical cluster homology searching proved to be 12-times faster than BLASTp and produced alignments that were nearly identical to BLASTp (precision=0.99; recall=0.97). This approach is ideal when searching large collections of sequences against large databases.

SEQUENCES OF INTEREST: Molecular Cloning of Complementary Deoxyribonucleic Acid for an Androgen-Regulated Epididymal Protein: Sequence Homology with Metalloproteins

Molecular Endocrinology ◽

10.1210/mend-2-10-999 ◽

1988 ◽

Vol 2 (10) ◽

pp. 999-1004 ◽

Cited By ~ 47

Author(s):

Nancy J. Charest ◽

David R. Joseph ◽

Elizabeth M. Wilson ◽

Frank S. French

Keyword(s):

Molecular Cloning ◽

Sequence Homology ◽

Protein Sequence ◽

Deoxyribonucleic Acid ◽

Protein Sequence Homology ◽

Complementary Deoxyribonucleic Acid

Application of Query-Based Qualitative Descriptors in Conjunction with Protein Sequence Homology for Prediction of Residue Solvent Accessibility

10.31979/etd.em6c-qn2b ◽

2013 ◽

Author(s):

Reecha Nepal

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Solvent Accessibility ◽

Protein Sequence Homology

Human prostatic acid phosphatase: cDNA cloning, gene mapping and protein sequence homology with lysosomal acid phosphatase

Biochemical and Biophysical Research Communications ◽

10.1016/0006-291x(89)91623-9 ◽

1989 ◽

Vol 160 (1) ◽

pp. 79-86 ◽

Cited By ~ 31

Author(s):

Farida S. Sharief ◽

Hansoo Lee ◽

Mary M. Leuderman ◽

Ake Lundwall ◽

Larry L. Deaven ◽

...

Keyword(s):

Acid Phosphatase ◽

Gene Mapping ◽

Cdna Cloning ◽

Sequence Homology ◽

Protein Sequence ◽

Prostatic Acid Phosphatase ◽

Lysosomal Acid ◽

Protein Sequence Homology

Chlamydia trachomatis60 kDa cysteine rich outer membrane protein: sequence homology between trachoma and LGV biovars

FEMS Microbiology Letters ◽

10.1111/j.1574-6968.1989.tb03676.x ◽

1989 ◽

Vol 65 (3) ◽

pp. 293-297 ◽

Cited By ~ 8

Author(s):

M.W. Watson ◽

P.R. Lambden ◽

M.E. Ward ◽

I.N. Clarke

Keyword(s):

Membrane Protein ◽

Outer Membrane ◽

Outer Membrane Protein ◽

Sequence Homology ◽

Protein Sequence ◽

Protein Sequence Homology

Variation in Resistance to Benzimidizole in Different Biocontrol Agents Based on Protein Sequence Homology

Protein and Peptide Letters ◽

10.2174/092986607780782821 ◽

2007 ◽

Vol 14 (5) ◽

pp. 461-464

Author(s):

B. Jarullah ◽

R. Subramanian ◽

M. Jummanah

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Biocontrol Agents ◽

Protein Sequence Homology

Cloning of cDNA coding for the beta chain of human complement component C4b-binding protein: sequence homology with the alpha chain.

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.87.3.1183 ◽

1990 ◽

Vol 87 (3) ◽

pp. 1183-1187 ◽

Cited By ~ 63

Author(s):

A. Hillarp ◽

B. Dahlback

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Binding Protein ◽

Complement Component ◽

Human Complement ◽

Beta Chain ◽

Alpha Chain ◽

Protein Sequence Homology ◽

Human Complement Component

Protein sequence homology between plant 4-coumarate: CoA llgase and firefly luciferase

Nucleic Acids Research ◽

10.1093/nar/17.1.460 ◽

1989 ◽

Vol 17 (1) ◽

pp. 460-460 ◽

Cited By ~ 25

Author(s):

Joachim Schröder

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Firefly Luciferase ◽

Protein Sequence Homology

Identification of protein sequence homology by consensus template alignment

Journal of Molecular Biology ◽

10.1016/0022-2836(86)90308-6 ◽

1986 ◽

Vol 188 (2) ◽

pp. 233-258 ◽

Cited By ~ 213

Author(s):

William Ramsay Taylor

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Protein Sequence Homology

Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology

Bioinformatics ◽

10.1093/bioinformatics/12.4.327 ◽

1996 ◽

Vol 12 (4) ◽

pp. 327-345 ◽

Cited By ~ 55

Author(s):

Kimmen Sjölander ◽

Kevin Karplus ◽

Michael Brown ◽

Richard Hughey ◽

Anders Krogh ◽

...

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Dirichlet Mixtures ◽

Protein Sequence Homology

Novel Application of Query-Based Qualitative Predictors for Characterization of Solvent Accessible Residues in Conjunction with Protein Sequence Homology

2011 22nd International Workshop on Database and Expert Systems Applications ◽

10.1109/dexa.2011.57 ◽

2011 ◽

Author(s):

Daniel A. Rose ◽

Reecha Nepal ◽

Radhika Mishra ◽

Robert Lau ◽

Shabnam Gholizadeh ◽

...

Keyword(s):

Sequence Homology ◽

Protein Sequence ◽

Protein Sequence Homology