Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures

Kerstin Neubert; Eric Zuchantke; Robert Maximilian Leidenfrost; Roebbe Wuenschiers; Josephine Grützke; Burkhard Malorny; Holger Brendebach; Sascha Al Dahouk; Timo Homeier; Helmut Hotzel; Knut Reinert; Herbert Tomaso; Anne Busch

doi:10.1186/s12864-021-08115-x

Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures

BMC Genomics ◽

10.1186/s12864-021-08115-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kerstin Neubert ◽

Eric Zuchantke ◽

Robert Maximilian Leidenfrost ◽

Roebbe Wuenschiers ◽

Josephine Grützke ◽

...

Keyword(s):

Francisella Tularensis ◽

High Throughput Sequencing ◽

Genomic Structure ◽

Evolutionary Conservation ◽

Pathogenicity Islands ◽

Insertion Sequences ◽

Hybrid Assembly ◽

Short Read ◽

Conservation Analysis ◽

Long Read

Abstract Background We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods. Results We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach. Conclusions Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis.

Get full-text (via PubEx)

Complete Genome Resequencing of Thermus thermophilus Strain TMY by Hybrid Assembly of Long- and Short-Read Sequencing Technologies

Microbiology Resource Announcements ◽

10.1128/mra.00979-21 ◽

2021 ◽

Vol 10 (46) ◽

Author(s):

Kentaro Miyazaki ◽

Natsuko Tokito

Keyword(s):

Complete Genome ◽

Thermus Thermophilus ◽

Genomic Analysis ◽

Comparative Genomic ◽

Hybrid Assembly ◽

Genome Resequencing ◽

Short Read ◽

Content Type ◽

Sequencing Technologies ◽

Long Read

Complete genome resequencing was conducted for Thermus thermophilus strain TMY by hybrid assembly of Oxford Nanopore Technologies long-read and MGI short-read data. Errors in the previously reported genome sequence determined by PacBio technology alone were corrected, allowing for high-quality comparative genomic analysis of closely related T. thermophilus genomes.

Get full-text (via PubEx)

Fast-SG: An alignment-free algorithm for hybrid assembly

10.1101/209122 ◽

2017 ◽

Author(s):

Alex Di Genova ◽

Gonzalo A. Ruz ◽

Marie-France Sagot ◽

Alejandro Maass

Keyword(s):

De Novo ◽

Reference Level ◽

Hybrid Assembly ◽

Short Read ◽

Sequencing Technologies ◽

Alignment Free ◽

Long Reads ◽

Long Read ◽

Definition Of ◽

Large Genomes

ABSTRACTLong read sequencing technologies are the ultimate solution for genome repeats, allowing near reference level reconstructions of large genomes. However, long read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods which combine short and long read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. In this paper, we propose a new method, called FAST-SG, which uses a new ultra-fast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. FAST-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how FAST-SG outperforms the state-of-the-art short read aligners when building the scaffolding graph, and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using FAST-SG with shallow long read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).

Get full-text (via PubEx)

rRNA Analysis Based on Long-Read High-Throughput Sequencing Reveals a More Accurate Diagnostic for the Bacterial Infection of Ascites

BioMed Research International ◽

10.1155/2021/6287280 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Xiaoling Yu ◽

Wenqian Jiang ◽

Xinhui Huang ◽

Jun Lin ◽

Hanhui Ye ◽

...

Keyword(s):

Pathogenic Bacteria ◽

High Throughput Sequencing ◽

Second Generation ◽

Subsequent Treatment ◽

Microbial Culture ◽

Third Generation ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Generation Sequencing

Traditional pathogenic diagnosis presents defects such as a low positivity rate, inability to identify uncultured microorganisms, and time-consuming nature. Clinical metagenomics next-generation sequencing can be used to detect any pathogen, compensating for the shortcomings of traditional pathogenic diagnosis. We report third-generation long-read sequencing results and second-generation short-read sequencing results for ascitic fluid from a patient with liver ascites and compared the two types of sequencing results with the results of traditional clinical microbial culture. The distribution of pathogenic microbial species revealed by the two types of sequencing results was quite different, and the third-generation sequencing results were consistent with the results of traditional microbial culture, which can effectively guide subsequent treatment. Short reads, the lack of amplification, and enrichment to amplify signals from trace pathogens, and host background noise may be the reasons for the high error in the second-generation short-read sequencing results. Therefore, we propose that long-read-based rRNA analysis technology is superior to the short-read shotgun-based metagenomics method in the identification of pathogenic bacteria.

Get full-text (via PubEx)

Genomic evaluation of Bordetella spp. originating from Australia

10.1101/2021.03.02.433639 ◽

2021 ◽

Author(s):

Winkie Fong ◽

Verlaine Timms ◽

Eby Sim ◽

Vitali Sintchenko

Keyword(s):

Public Health ◽

Genomic Structure ◽

Vaccine Uptake ◽

Snp Analysis ◽

Whole Genome ◽

Molecular Monitoring ◽

Insertion Element ◽

Short Read ◽

Short Read Sequencing ◽

Long Read

AbstractBordetella pertussis is the primary causative agent of pertussis, a highly infectious respiratory disease associated with prolonged coughing episodes. Pertussis infections are typically mild in adults, however in neonates, infections can be fatal. Despite successful vaccine uptake, the disease is re-emerging across the globe, therefore it is critical to determine the mechanism by which B. pertussis is escaping vaccination control. Studies have suggested that significant changes have occurred in B. pertussis genomes in response to whole cell and acellular vaccines. Continued molecular monitoring is therefore crucial for public health surveillance.High-resolution molecular surveillance of B. pertussis can be achieved through the sequencing of the whole genome. In public health laboratories, whole genome sequencing is primarily performed by short-read sequencing technologies as they are most cost-effective. However short read sequencing does not resolve the extensive genomic rearrangement evident in Bordetella genomes. This is because repeat regions present in Bordetella genomes are collapsed by downstream analysis. For example, the B. pertussis genome contains more than 200 copies of the IS481 insertion element, hence assemblies generally consist of >200 contigs. Advancements in long-read technologies however increase the potential to circularise and close genomes by bridging the locations of the IS481 insertion element.In this study, we aimed to contextualise the Bordetella spp. circulating in NSW, Australia and assess their relationship with global isolates utilising core genome, SNP and structural clustering analysis using long read technology. We report five closed genomes of Bordetella spp. isolated from Australian patients. Two of the three B. pertussis closed isolates, were unique with their own genomic structure, while the other structurally clustered with global isolates. We found that Australian B. holmesii and B. parapertussis strains cluster with global isolates and do not appear to be unique to Australia. Australian draft B. holmesii SNP analysis showed that between 1999 and 2007, isolates were relatively similar, however post-2012, isolates were distinct from each other. The closed isolates can also be used as high-quality reference sequences for both surveillance and other investigations into pertussis spread.

Get full-text (via PubEx)

rRNA analysis based on long-read high-throughput sequencing reveals a more accurate diagnostic for the bacterial infection of ascites

10.1101/2020.09.14.20194134 ◽

2020 ◽

Author(s):

Xiaoling Yu ◽

Wenqian Jiang ◽

Xinhui Huang ◽

Jun Lin ◽

Hanhui Ye ◽

...

Keyword(s):

Pathogenic Bacteria ◽

High Throughput Sequencing ◽

Second Generation ◽

Subsequent Treatment ◽

Microbial Culture ◽

Third Generation ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Generation Sequencing

Traditional pathogenic diagnosis presents defects such as a low positivity rate, inability to identify uncultured microorganisms, and time-consuming nature. Clinical metagenomics next-generation sequencing can be used to detect any pathogen, compensating for the shortcomings of traditional pathogenic diagnosis. We report third-generation long-read sequencing results and second-generation short-read sequencing results for ascitic fluid from a patient with liver ascites and compared the two types of sequencing results with the results of traditional clinical microbial culture. The distribution of pathogenic microbial species revealed by the two types of sequencing results was quite different, and the third-generation sequencing results were consistent with the results of traditional microbial culture, which can effectively guide subsequent treatment. Short reads, the lack of amplification and enrichment to amplify signals from trace pathogens, and host background noise may be the reasons for high error in the second-generation short-read sequencing results. Therefore, we propose that long-read-based rRNA analysis technology is superior to the short-read shotgun-based metagenomics method in the identification of pathogenic bacteria.

Get full-text (via PubEx)

Near-Complete Genome Sequence of a Human Norovirus GII.1[Pg] Strain Associated with Acute Gastroenteritis, Determined Using Long-Read Sequencing

Microbiology Resource Announcements ◽

10.1128/mra.00401-21 ◽

2021 ◽

Vol 10 (28) ◽

Author(s):

Zhihui Yang ◽

Mark Mammel ◽

Samantha Q. Wales

Keyword(s):

Foodborne Pathogens ◽

High Throughput Sequencing ◽

Read Length ◽

Human Norovirus ◽

Short Read ◽

Fast Detection ◽

Short Read Sequencing ◽

Detection And Identification ◽

Long Read ◽

Norovirus Gii

High-throughput sequencing is one of the approaches used for the detection of foodborne pathogens such as noroviruses. Long-read sequencing has advantages over short-read sequencing in speed, read length, and lower fragmentation bias, which makes it a potential powerful tool for the fast detection and identification of viruses.

Get full-text (via PubEx)

Perspectives and benefits of high-throughput long-read sequencing in microbial ecology

Applied and Environmental Microbiology ◽

10.1128/aem.00626-21 ◽

2021 ◽

Author(s):

Leho Tedersoo ◽

Mads Albertsen ◽

Sten Anslan ◽

Benjamin Callahan

Keyword(s):

Microbial Ecology ◽

High Throughput ◽

Single Molecule ◽

High Throughput Sequencing ◽

Environmental Dna ◽

Nanopore Sequencing ◽

High Quality ◽

Short Read ◽

Sequencing Technologies ◽

Long Read

Short-read, high-throughput sequencing (HTS) methods have yielded numerous important insights into microbial ecology and function. Yet, in many instances short-read HTS techniques are suboptimal, for example by providing insufficient phylogenetic resolution or low integrity of assembled genomes. Single-molecule and synthetic long-read (SLR) HTS methods have successfully ameliorated these limitations. In addition, nanopore sequencing has generated a number of unique analysis opportunities such as rapid molecular diagnostics and direct RNA sequencing, and both PacBio and nanopore sequencing support detection of epigenetic modifications. Although initially suffering from relatively low sequence quality, recent advances have greatly improved the accuracy of long read sequencing technologies. In spite of great technological progress in recent years, the long-read HTS methods (PacBio and nanopore sequencing) are still relatively costly, require large amounts of high-quality starting material, and commonly need specific solutions in various analysis steps. Despite these challenges, long-read sequencing technologies offer high-quality, cutting-edge alternatives for testing hypotheses about microbiome structure and functioning as well as assembly of eukaryote genomes from complex environmental DNA samples.

Get full-text (via PubEx)

Correction to: Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures

BMC Genomics ◽

10.1186/s12864-021-08208-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kerstin Neubert ◽

Eric Zuchantke ◽

Robert Maximilian Leidenfrost ◽

Röbbe Wünschiers ◽

Josephine Grützke ◽

...

Keyword(s):

Francisella Tularensis ◽

Evolutionary Conservation ◽

Conservation Analysis ◽

Testing Assembly

Get full-text (via PubEx)

Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

10.1101/530824 ◽

2019 ◽

Cited By ~ 3

Author(s):

Nicola De Maio ◽

Liam P. Shaw ◽

Alasdair Hubbard ◽

Sophie George ◽

Nick Sanderson ◽

...

Keyword(s):

Bacterial Genome ◽

Hybrid Assembly ◽

Bacterial Genomes ◽

Short Read ◽

Short Reads ◽

Genome Reconstruction ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms

ABSTRACTIllumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods impact on assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or from SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the Enterobacteriaceae family, as these frequently have highly plastic, repetitive genetic structures and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies. Both strategies facilitate high-quality genome reconstruction. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.IMPACT STATEMENTIllumina short-read sequencing is frequently used for tasks in bacterial genomics, such as assessing which species are present within samples, checking if specific genes of interest are present within individual isolates, and reconstructing the evolutionary relationships between strains. However, while short-read sequencing can reveal significant detail about the genomic content of bacterial isolates, it is often insufficient for assessing genomic structure: how different genes are arranged within genomes, and particularly which genes are on plasmids – potentially highly mobile components of the genome frequently carrying antimicrobial resistance elements. This is because Illumina short reads are typically too short to span repetitive structures in the genome, making it impossible to accurately reconstruct these repetitive regions. One solution is to complement Illumina short reads with long reads generated with SMRT Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT) sequencing platforms. Using this approach, called ‘hybrid assembly’, we show that we can automatically fully reconstruct complex bacterial genomes of Enterobacteriaceae isolates in the majority of cases (best-performing method: 17/20 isolates). In particular, by comparing different methods we find that using the assembler Unicycler with Illumina and ONT reads represents a low-cost, high-quality approach for reconstructing bacterial genomes using publicly available software.DATA SUMMARYRaw sequencing data and assemblies have been deposited in NCBI under BioProject Accession PRJNA422511 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA422511). We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files.

Get full-text (via PubEx)

Outcome of Different Sequencing and Assembly Approaches on the Detection of Plasmids and Localization of Antimicrobial Resistance Genes in Commensal Escherichia coli

Microorganisms ◽

10.3390/microorganisms9030598 ◽

2021 ◽

Vol 9 (3) ◽

pp. 598

Author(s):

Katharina Juraschek ◽

Maria Borowiak ◽

Simon H. Tausch ◽

Burkhard Malorny ◽

Annemarie Käsbohrer ◽

...

Keyword(s):

Escherichia Coli ◽

Antimicrobial Resistance ◽

Mobile Genetic Elements ◽

Hybrid Assembly ◽

Small Plasmid ◽

Short Read ◽

Short Read Sequencing ◽

Genetic Elements ◽

Antimicrobial Resistance Genes ◽

Long Read

Antimicrobial resistance (AMR) is a major threat to public health worldwide. Currently, AMR typing changes from phenotypic testing to whole-genome sequence (WGS)-based detection of resistance determinants for a better understanding of the isolate diversity and elements involved in gene transmission (e.g., plasmids, bacteriophages, transposons). However, the use of WGS data in monitoring purposes requires suitable techniques, standardized parameters and approved guidelines for reliable AMR gene detection and prediction of their association with mobile genetic elements (plasmids). In this study, different sequencing and assembly strategies were tested for their suitability in AMR monitoring in Escherichia coli in the routines of the German National Reference Laboratory for Antimicrobial Resistances. To assess the outcomes of the different approaches, results from in silico predictions were compared with conventional phenotypic- and genotypic-typing data. With the focus on (fluoro)quinolone-resistant E.coli, five qnrS-positive isolates with multiple extrachromosomal elements were subjected to WGS with NextSeq (Illumina), PacBio (Pacific BioSciences) and ONT (Oxford Nanopore) for in depth characterization of the qnrS1-carrying plasmids. Raw reads from short- and long-read sequencing were assembled individually by Unicycler or Flye or a combination of both (hybrid assembly). The generated contigs were subjected to bioinformatics analysis. Based on the generated data, assembly of long-read sequences are error prone and can yield in a loss of small plasmid genomes. In contrast, short-read sequencing was shown to be insufficient for the prediction of a linkage of AMR genes (e.g., qnrS1) to specific plasmid sequences. Furthermore, short-read sequencing failed to detect certain duplications and was unsuitable for genome finishing. Overall, the hybrid assembly led to the most comprehensive typing results, especially in predicting associations of AMR genes and mobile genetic elements. Thus, the use of different sequencing technologies and hybrid assemblies currently represents the best approach for reliable AMR typing and risk assessment.

Get full-text (via PubEx)