reciprocal best hit
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 1)

H-INDEX

3
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Baoxing Song ◽  
Santiago Marco-Sola ◽  
Miquel Moreto ◽  
Lynn Johnson ◽  
Edward S. Buckler ◽  
...  

Millions of species are currently being sequenced and their genomes are being compared. Many of them have more complex genomes than model systems and raised novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication or polyploidy levels. Here we introduce AnchorWave, which performs whole-genome duplication informed collinear anchor identification between genomes and performs base-pair resolution global alignments for collinear blocks using the wavefront algorithm and a 2-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multi-kilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs between two maize lines. By contrast, other genome alignment tools showed almost zero power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome than the closest competitive approach, when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor binding sites (TFBSs) at a rate of 1.05-74.85 fold higher than other tools, while with significantly lower false positive alignments. AnchorWave shows obvious improvement when applied to genomes with dispersed repeats, active transposable elements, high sequence diversity and whole-genome duplication variation.


2017 ◽  
Vol 73 (2) ◽  
pp. 159-176 ◽  
Author(s):  
Yi-Dan Mo ◽  
Si-Xia Yang ◽  
Jing-Yu Zhao ◽  
Peng-Yu Jin ◽  
Xiao-Yue Hong

2008 ◽  
Vol 06 (04) ◽  
pp. 811-824 ◽  
Author(s):  
ALEXANDER E. IVLIEV ◽  
MARINA G. SERGEEVA

The identification of orthologs to a set of known genes is often the starting point for evolutionary studies focused on gene families of interest. To date, the existing orthology detection tools (COG, InParanoid, OrthoMCL, etc.) are aimed at genome-wide ortholog identification and lack flexibility for the purposes of case studies. We developed a program OrthoFocus, which employs an extended reciprocal best hit approach to quickly search for orthologs in a pair of genomes. A group of paralogs from the input genome is used as the start for the forward search and the criterion for the reverse search, which allows handling many-to-one and many-to-many relationships. By pairwise comparison of genomes with the input species genome, OrthoFocus enables quick identification of orthologs in multiple genomes and generates a multiple alignment of orthologs so that it can further be used in phylogenetic analysis. The program is available at .


2006 ◽  
Vol 72 (10) ◽  
pp. 6841-6844 ◽  
Author(s):  
Clara A. Fuchsman ◽  
Gabrielle Rocap

ABSTRACT The genome sequences of Rhodopirellula baltica, formerly Pirellula sp. strain 1, Blastopirellula marina, Gemmata obscuriglobus, and Kuenenia stuttgartiensis were used in a series of pairwise reciprocal best-hit analyses to evaluate the contested evolutionary position of Planctomycetes. Contrary to previous reports which suggested that R. baltica had a high percentage of genes with closest matches to Archaea and Eukarya, we show here that these Planctomycetes do not share an unusually large number of genes with the Archaea or Eukarya, compared with other Bacteria. Thus, best-hit analyses may assign phylogenetic affinities incorrectly if close relatives are absent from the sequence database.


2005 ◽  
Vol 187 (18) ◽  
pp. 6488-6498 ◽  
Author(s):  
Vinita Joardar ◽  
Magdalen Lindeberg ◽  
Robert W. Jackson ◽  
Jeremy Selengut ◽  
Robert Dodson ◽  
...  

ABSTRACT Pseudomonas syringae pv. phaseolicola, a gram-negative bacterial plant pathogen, is the causal agent of halo blight of bean. In this study, we report on the genome sequence of P. syringae pv. phaseolicola isolate 1448A, which encodes 5,353 open reading frames (ORFs) on one circular chromosome (5,928,787 bp) and two plasmids (131,950 bp and 51,711 bp). Comparative analyses with a phylogenetically divergent pathovar, P. syringae pv. tomato DC3000, revealed a strong degree of conservation at the gene and genome levels. In total, 4,133 ORFs were identified as putative orthologs in these two pathovars using a reciprocal best-hit method, with 3,941 ORFs present in conserved, syntenic blocks. Although these two pathovars are highly similar at the physiological level, they have distinct host ranges; 1448A causes disease in beans, and DC3000 is pathogenic on tomato and Arabidopsis. Examination of the complement of ORFs encoding virulence, fitness, and survival factors revealed a substantial, but not complete, overlap between these two pathovars. Another distinguishing feature between the two pathovars is their distinctive sets of transposable elements. With access to a fifth complete pseudomonad genome sequence, we were able to identify 3,567 ORFs that likely comprise the core Pseudomonas genome and 365 ORFs that are P. syringae specific.


Sign in / Sign up

Export Citation Format

Share Document