Rapid Core-Genome Alignment and Visualization for Thousands of Intraspecific Microbial Genomes

Though many microbial species or clades now have hundreds of sequenced genomes, existing whole-genome alignment methods do not efficiently handle comparisons on this scale. Here we present the Harvest suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Combined they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

Download Full-text

Discrimination of hospital isolates of Acinetobacter baumannii using repeated sequences and whole genome alignment differential analysis

Journal of Applied Genetics ◽

10.1007/s13353-021-00640-5 ◽

2021 ◽

Author(s):

Roman Kotłowski ◽

Alicja Nowak-Zaleska ◽

Grzegorz Węgrzyn

Keyword(s):

Acinetobacter Baumannii ◽

Time Frame ◽

Repeated Sequences ◽

Hospital Environment ◽

Genome Alignment ◽

Whole Genome ◽

Differential Analysis ◽

Gene Encoding ◽

Resistance Patterns ◽

Whole Genome Alignment

AbstractAn optimized method for bacterial strain differentiation, based on combination of Repeated Sequences and Whole Genome Alignment Differential Analysis (RS&WGADA), is presented in this report. In this analysis, 51 Acinetobacter baumannii multidrug-resistance strains from one hospital environment and patients from 14 hospital wards were classified on the basis of polymorphisms of repeated sequences located in CRISPR region, variation in the gene encoding the EmrA-homologue of E. coli, and antibiotic resistance patterns, in combination with three newly identified polymorphic regions in the genomes of A. baumannii clinical isolates. Differential analysis of two similarity matrices between different genotypes and resistance patterns allowed to distinguish three significant correlations (p < 0.05) between 172 bp DNA insertion combined with resistance to chloramphenicol and gentamycin. Interestingly, 45 and 55 bp DNA insertions within the CRISPR region were identified, and combined during analyses with resistance/susceptibility to trimethoprim/sulfamethoxazole. Moreover, 184 or 1374 bp DNA length polymorphisms in the genomic region located upstream of the GTP cyclohydrolase I gene, associated mainly with imipenem susceptibility, was identified. In addition, considerable nucleotide polymorphism of the gene encoding the gamma/tau subunit of DNA polymerase III, an enzyme crucial for bacterial DNA replication, was discovered. The differentiation analysis performed using the above described approach allowed us to monitor the distribution of A. baumannii isolates in different wards of the hospital in the time frame of several years, indicating that the optimized method may be useful in hospital epidemiological studies, particularly in identification of the source of primary infections.

Download Full-text

MUM&Co: accurate detection of all SV types through whole-genome alignment

Bioinformatics ◽

10.1093/bioinformatics/btaa115 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3242-3243 ◽

Cited By ~ 2

Author(s):

Samuel O’Donnell ◽

Gilles Fischer

Keyword(s):

De Novo ◽

Supplementary Information ◽

Genome Alignment ◽

Whole Genome ◽

Structural Variations ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Human Genomes ◽

Whole Genome Alignment ◽

Primary Output

Abstract Summary MUM&Co is a single bash script to detect structural variations (SVs) utilizing whole-genome alignment (WGA). Using MUMmer’s nucmer alignment, MUM&Co can detect insertions, deletions, tandem duplications, inversions and translocations greater than 50 bp. Its versatility depends upon the WGA and therefore benefits from contiguous de-novo assemblies generated by third generation sequencing technologies. Benchmarked against five WGA SV-calling tools, MUM&Co outperforms all tools on simulated SVs in yeast, plant and human genomes and performs similarly in two real human datasets. Additionally, MUM&Co is particularly unique in its ability to find inversions in both simulated and real datasets. Lastly, MUM&Co’s primary output is an intuitive tabulated file containing a list of SVs with only necessary genomic details. Availability and implementation https://github.com/SAMtoBAM/MUMandCo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Whole Genome Alignment with BLAST on Grid Environment

The Sixth IEEE International Conference on Computer and Information Technology (CIT'06) ◽

10.1109/cit.2006.196 ◽

2006 ◽

Author(s):

Min-sung Kim ◽

Choong-hyun Sun ◽

Jin-ki Kim ◽

Gwan-su Yi

Keyword(s):

Genome Alignment ◽

Whole Genome ◽

Grid Environment ◽

Whole Genome Alignment

Download Full-text

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

10.1101/173146 ◽

2017 ◽

Cited By ~ 5

Author(s):

Mickael Silva ◽

Miguel Machado ◽

Diogo N. Silva ◽

Mirko Rossi ◽

Jacob Moran-Gilad ◽

...

Keyword(s):

Open Source ◽

Core Genome ◽

Bacterial Species ◽

Outbreak Detection ◽

Strain Identification ◽

List Type ◽

Whole Genome ◽

Link Type ◽

The Creation ◽

Allele Calling

ABSTRACTGene-by-gene approaches are becoming increasingly popular in bacterial genomic epidemiology and outbreak detection. However, there is a lack of open-source scalable software for schema definition and allele calling for these methodologies. The chewBBACA suite was designed to assist users in the creation and evaluation of novel whole-genome or core-genome gene-by-gene typing schemas and subsequent allele calling in bacterial strains of interest. The software can run in a laptop or in high performance clusters making it useful for both small laboratories and large reference centers. ChewBBACA is available athttps://github.com/B-UMMI/chewBBACAor as a docker image athttps://hub.docker.com/r/ummidock/chewbbaca/.DATA SUMMARYAssembled genomes used for the tutorial were downloaded from NCBI in August 2016 by selecting those submitted asStreptococcus agalactiaetaxon or sub-taxa. All the assemblies have been deposited as a zip file in FigShare (https://figshare.com/s/9cbe1d422805db54cd52), where a file with the original ftp link for each NCBI directory is also available.Code for the chewBBACA suite is available athttps://github.com/B-UMMI/chewBBACAwhile the tutorial example is found athttps://github.com/B-UMMI/chewBBACA_tutorial.I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTThe chewBBACA software offers a computational solution for the creation, evaluation and use of whole genome (wg) and core genome (cg) multilocus sequence typing (MLST) schemas. It allows researchers to develop wg/cgMLST schemes for any bacterial species from a set of genomes of interest. The alleles identified by chewBBACA correspond to potential coding sequences, possibly offering insights into the correspondence between the genetic variability identified and phenotypic variability. The software performs allele calling in a matter of seconds to minutes per strain in a laptop but is easily scalable for the analysis of large datasets of hundreds of thousands of strains using multiprocessing options. The chewBBACA software thus provides an efficient and freely available open source solution for gene-by-gene methods. Moreover, the ability to perform these tasks locally is desirable when the submission of raw data to a central repository or web services is hindered by data protection policies or ethical or legal concerns.

Download Full-text

Efficient Algorithms for Optimizing Whole Genome Alignment with Noise

Algorithms and Computation - Lecture Notes in Computer Science ◽

10.1007/978-3-540-24587-2_38 ◽

2003 ◽

pp. 364-374 ◽

Cited By ~ 1

Author(s):

T. W. Lam ◽

N. Lu ◽

H. F. Ting ◽

Prudence W. H. Wong ◽

S. M. Yiu

Keyword(s):

Efficient Algorithms ◽

Genome Alignment ◽

Whole Genome ◽

Whole Genome Alignment

Download Full-text

Whole Genome Alignment

Statistics for Bioinformatics ◽

10.1016/b978-1-78548-216-8.50008-7 ◽

2016 ◽

pp. 75-86

Author(s):

Julie Dawn Thompson

Keyword(s):

Genome Alignment ◽

Whole Genome ◽

Whole Genome Alignment

Download Full-text

ALLOWING MISMATCHES IN ANCHORS FOR WHOLE GENOME ALIGNMENT: GENERATION AND EFFECTIVENESS

Proceedings of the 3rd Asia-Pacific Bioinformatics Conference ◽

10.1142/9781860947322_0001 ◽

2005 ◽

Author(s):

SM YIU ◽

PY CHAN ◽

TW LAM ◽

WK SUNG ◽

HF TING ◽

...

Keyword(s):

Genome Alignment ◽

Whole Genome ◽

Whole Genome Alignment

Download Full-text

Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ

10.1101/548123 ◽

2019 ◽

Cited By ~ 5

Author(s):

Ilia Minkin ◽

Paul Medvedev

Keyword(s):

Single Machine ◽

De Bruijn Graph ◽

Genome Alignment ◽

Whole Genome ◽

Reconstruction Algorithms ◽

De Bruijn Graphs ◽

Significant Step ◽

De Bruijn ◽

Whole Genome Alignment ◽

Computational Resources

AbstractMultiple whole-genome alignment is a challenging problem in bioinformatics. Despite many successes, current methods are not able to keep up with the growing number, length, and complexity of assembled genomes, especially when computational resources are limited. Approaches based on compacted de Bruijn graphs to identify and extend anchors into locally collinear blocks have potential for scalability, but current methods do not scale to mammalian genomes. We present an algorithm, SibeliaZ-LCB, for identifying collinear blocks in closely related genomes based on analysis of the de Bruijn graph. We further incorporate this into a multiple whole-genome alignment pipeline called SibeliaZ. SibeliaZ shows run-time improvements over other methods while maintaining accuracy. On sixteen recently-assembled strains of mice, SibeliaZ runs in under 16 hours on a single machine, while other tools did not run to completion for eight mice within a week. SibeliaZ makes a significant step towards improving scalability of multiple whole-genome alignment and collinear block reconstruction algorithms on a single machine.

Download Full-text

Efficient masking of plant genomes by combining kmer counting and curated repeats

10.1101/2021.03.22.436504 ◽

2021 ◽

Author(s):

Bruno Contreras-Moreira ◽

Carla V Filippi ◽

Guy Naamati ◽

Carlos García Girón ◽

James E Allen ◽

...

Keyword(s):

Repetitive Sequences ◽

Specific Protein ◽

Genome Alignment ◽

Whole Genome ◽

Overlapping Genes ◽

Plant Genomes ◽

Repeat Masking ◽

Computationally Expensive ◽

Whole Genome Alignment ◽

Repeat Library

Ii.Summary/AbstractThe annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis or pangenome exploration. While homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here we benchmark a two-step approach, where repeats are first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, using the kmer-based Repeat Detector (Red) and two repeat libraries (REdat and nrTEplants, curated for this work). We obtained repeated genome fractions that match those reported in the literature, but with shorter repeated elements than those produced with conventional annotators. Inspection of masked regions overlapping genes revealed no preference for specific protein domains. Half of Red masked sequences can be successfully classified with nrTEplants, with the complete protocol taking less than 2h on a desktop Linux box. The repeat library and the scripts to mask and annotate plant genomes can be obtained at https://github.com/Ensembl/plant-scripts.

Download Full-text