scholarly journals Predicting Alignment Distances via Continuous Sequence Matching

2020 ◽  
Author(s):  
Jian Chen ◽  
Le Yang ◽  
Lu Li ◽  
Yijun Sun

AbstractSequence comparison is the basis of various applications in bioinformatics. Recently, the increase in the number and length of sequences has allowed us to extract more and more accurate information from the data. However, the premise of obtaining such information is that we can compare a large number of long sequences accurately and quickly. Neither the traditional dynamic programming-based algorithms nor the alignment-free algorithms proposed in recent years can satisfy both the requirements of accuracy and speed. Recently, in order to meet the requirements, researchers have proposed a data-dependent approach to learn sequence embeddings, but its capability is limited by the structure of its embedding function. In this paper, we propose a new embedding function specifically designed for biological sequences to map sequences into embedding vectors. Combined with the neural network structure, we can adjust this embedding function so that it can be used to quickly and reliably predict the alignment distance between sequences. We illustrated the effectiveness and efficiency of the proposed method on various types of amplicon sequences. More importantly, our experiment on full length 16S rRNA sequences shows that our approach would lead to a general model that can quickly and reliably predict the pairwise alignment distance of any pair of full-length 16S rRNA sequences with high accuracy. We believe such a model can greatly facilitate large scale sequence analysis.

Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Ju-Hyeong Park ◽  
Angela R. Lemons ◽  
Jerry Roseman ◽  
Brett J. Green ◽  
Jean M. Cox-Ganser

An amendment to this paper has been published and can be accessed via the original article.


1990 ◽  
Vol 75 (2-3) ◽  
pp. 105-115 ◽  
Author(s):  
David M. Ward ◽  
Roland Weller ◽  
Mary M. Bateson

2004 ◽  
Vol 186 (9) ◽  
pp. 2629-2635 ◽  
Author(s):  
Silvia G. Acinas ◽  
Luisa A. Marcelino ◽  
Vanja Klepac-Ceraj ◽  
Martin F. Polz

ABSTRACT The level of sequence heterogeneity among rrn operons within genomes determines the accuracy of diversity estimation by 16S rRNA-based methods. Furthermore, the occurrence of widespread horizontal gene transfer (HGT) between distantly related rrn operons casts doubt on reconstructions of phylogenetic relationships. For this study, patterns of distribution of rrn copy numbers, interoperonic divergence, and redundancy of 16S rRNA sequences were evaluated. Bacterial genomes display up to 15 operons and operon numbers up to 7 are commonly found, but ∼40% of the organisms analyzed have either one or two operons. Among the Archaea, a single operon appears to dominate and the highest number of operons is five. About 40% of sequences among 380 operons in 76 bacterial genomes with multiple operons were identical to at least one other 16S rRNA sequence in the same genome, and in 38% of the genomes all 16S rRNAs were invariant. For Archaea, the number of identical operons was only 25%, but only five genomes with 21 operons are currently available. These considerations suggest an upper bound of roughly threefold overestimation of bacterial diversity resulting from cloning and sequencing of 16S rRNA genes from the environment; however, the inclusion of genomes with a single rrn operon may lower this correction factor to ∼2.5. Divergence among operons appears to be small overall for both Bacteria and Archaea, with the vast majority of 16S rRNA sequences showing <1% nucleotide differences. Only five genomes with operons with a higher level of nucleotide divergence were detected, and Thermoanaerobacter tengcongensis exhibited the highest level of divergence (11.6%) noted to date. Overall, four of the five extreme cases of operon differences occurred among thermophilic bacteria, suggesting a much higher incidence of HGT in these bacteria than in other groups.


2007 ◽  
Vol 64 (3) ◽  
pp. 303-304 ◽  
Author(s):  
Rafaela de Fátima Neroni ◽  
Elke Jurandy Bran Nogueira Cardoso

Araucaria angustifolia is an environmentally threatened tree and the whole biota of the Araucaria Forest should be investigated with the aim of its preservation. Diazotrophic bacteria are extremely important for the maintenance of ecosystems, but they have never been studied in Araucaria Forests. In this study, diazotrophic bacteria were isolated from Araucaria roots and soil, when grown in semi-specific, semi-solid media. The diazotrophic character of some recovered isolates could be confirmed using the acetylene reduction assay. According to their 16S rRNA sequences, most of these isolates belong to the genus Burkholderia.


2020 ◽  
Vol 9 (29) ◽  
Author(s):  
Joseph Wambui ◽  
Marina Morach ◽  
Nicole Cernela ◽  
Marc J. A. Stevens ◽  
Giovanni Ghielmetti ◽  
...  

ABSTRACT We present the draft genome sequence of Psychrobacter okhotskensis strain 5179-1A, which was isolated from a raw cured ham storage crate. Its size and GC content are 3.4 Mb and 43.4%, respectively. The 16S rRNA sequences of strain 5179-1A and P. okhotskensis MD17T are 100% identical.


Sign in / Sign up

Export Citation Format

Share Document