scholarly journals LAF: Logic Alignment Free and its application to bacterial genomes classification

2015 ◽  
Vol 8 (1) ◽  
Author(s):  
Emanuel Weitschek ◽  
Fabio Cunial ◽  
Giovanni Felici
2021 ◽  
Author(s):  
Oliver Schwengers ◽  
Lukas Jelonek ◽  
Marius Dieckmann ◽  
Sebastian Beyvers ◽  
Jochen Blom ◽  
...  

AbstractCommand line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command line software pipelines heavily depend on taxon specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command line software tool for the robust, taxon-independent, thorough and nonetheless fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross references. Annotation results are exported in GFF3 and INSDC-compliant flat files as well as comprehensive JSON files facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references whilst providing comparable wall clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.


2019 ◽  
Vol 20 (S20) ◽  
Author(s):  
Anna-Katharina Lau ◽  
Svenja Dörrer ◽  
Chris-André Leimeister ◽  
Christoph Bleidorn ◽  
Burkhard Morgenstern

Abstract Background In many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads. Major applications are, for example, phylogeny reconstruction, species identification from small sequencing samples, or bacterial strain typing in medical diagnostics. Results We adapted our previously developed software program Filtered Spaced-Word Matches (FSWM) for alignment-free phylogeny reconstruction to take unassembled reads as input; we call this implementation Read-SpaM. Conclusions Test runs on simulated reads from semi-artificial and real-world bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and for very low sequencing coverage.


2013 ◽  
Vol 11 (06) ◽  
pp. 1343005 ◽  
Author(s):  
SEBASTIAN MAURER-STROH ◽  
VITHIAGARAN GUNALAN ◽  
WING-CHEONG WONG ◽  
FRANK EISENHABER

We propose an extension to alignment-free approaches that can produce reasonably accurate phylogenetic groupings starting from unaligned genomes, for example, as fast as 1 min on a standard desktop computer for 25 bacterial genomes. A 6-fold speed-up and 11-fold reduction in memory requirements compared to previous alignment-free methods is achieved by reducing the comparison space to a representative sample of k-mers of optimal length and with specific tag motifs. This approach was applied to the test case of fitting the enterohemorrhagic O104:H4 E.coli strain from the 2011 outbreak in Germany into the phylogenetic network of previously known E.coli-related strains and extend the method to allow assigning any new strain to the correct phylogenetic group even directly from unassembled short sequence reads from next generation sequencing data. Hence, this approach is also useful to quickly identify the most suitable reference genome for subsequent assembly steps.


2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Oliver Schwengers ◽  
Lukas Jelonek ◽  
Marius Alfred Dieckmann ◽  
Sebastian Beyvers ◽  
Jochen Blom ◽  
...  

Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.


2019 ◽  
Author(s):  
Anna Katharina Lau ◽  
Chris-André Leimeister ◽  
Burkhard Morgenstern

AbstractIn many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads. Major applications are, for example, phylogeny reconstruction, species identification from small sequencing samples, or bacterial strain typing in medical diagnostics. Herein, we adapt our previously developed software program Filtered Spaced-Word Matches (FSWM) for alignment-free phylogeny reconstruction to work on unassembled reads; we call this implementation Read-SpaM. Test runs on simulated reads from bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and for very low sequencing coverage.Contact: [email protected]


2019 ◽  
Author(s):  
Chengyuan Wu ◽  
Shiquan Ren ◽  
Jie Wu ◽  
Kelin Xia

AbstractWe introduce an alignment-free method, the Magnus Representation, to analyze genome sequences. The Magnus Representation captures higher-order information in genome sequences. We combine our approach with the idea ofk-mers to define an effectively computable Mean Magnus Vector. We perform phylogenetic analysis on three datasets: mosquito-borne viruses, filoviruses, and bacterial genomes. Our results on ebolaviruses are consistent with previous phylogenetic analyses, and confirm the modern viewpoint that the 2014 West African Ebola outbreak likely originated from Central Africa. Our analysis also confirms the close relationship betweenBundibugyo ebolavirusandTaï Forest ebolavirus. For bacterial genomes, our method is able to classify relatively well at the family and genus level, as well as at higher levels such as phylum level. The bacterial genomes are also separated well into Gram-positive and Gram-negative subgroups.


2011 ◽  
Vol 27 (11) ◽  
pp. 1466-1472 ◽  
Author(s):  
Mirjana Domazet-Lošo ◽  
Bernhard Haubold

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Michael Sheinman ◽  
Ksenia Arkhipova ◽  
Peter F Arndt ◽  
Bas Dutilh ◽  
Rutger Hermsen ◽  
...  

Horizontal Gene Transfer (HGT) is an essential force in microbial evolution. Despite detailed studies on a variety of systems, a global picture of HGT in the microbial world is still missing. Here, we exploit that HGT creates long identical DNA sequences in the genomes of distant species, which can be found efficiently using alignment-free methods. Our pairwise analysis of 93 481 bacterial genomes identified 138 273 HGT events. We developed a model to explain their statistical properties as well as estimate the transfer rate between pairs of taxa. This reveals that long-distance HGT is frequent: our results indicate that HGT between species from different phyla has occurred in at least 8% of the species. Finally, our results confirm that the function of sequences strongly impacts their transfer rate, which varies by more than 3 orders of magnitude between different functional categories. Overall, we provide a comprehensive view of HGT, illuminating a fundamental process driving bacterial evolution.


Planta Medica ◽  
2014 ◽  
Vol 80 (10) ◽  
Author(s):  
IJ Miller ◽  
T Weyna ◽  
C Mlot ◽  
SS Fong ◽  
K McPhail ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document