Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework

AbstractIntroductionImproving the surveillance of tuberculosis (TB) is especially important for multidrug-resistant (MDR) and extensively drug-resistant (XDR)-TB. The large amount of publicly available whole-genome sequencing (WGS) data for TB gives us the chance to re-use data and to perform additional analysis at a large scale.AimWe assessed the usefulness of raw WGS data of global MDR/XDR-TB isolates available from public repositories to improve TB surveillance.MethodsWe extracted raw WGS data and the related metadata of Mycobacterium tuberculosis isolates available from the Sequence Read Archive. We compared this public dataset with WGS data and metadata of 131 MDR- and XDR-TB isolates from Germany in 2012-2013.ResultsWe aggregated a dataset that includes 1,081 MDR and 250 XDR isolates among which we identified 133 molecular clusters. In 16 clusters, the isolates were from at least two different countries. For example, cluster2 included 56 MDR/XDR isolates from Moldova, Georgia, and Germany. By comparing the WGS data from Germany and the public dataset, we found that 11 clusters contained at least one isolate from Germany and at least one isolate from another country. We could, therefore, connect TB cases despite missing epidemiological information.ConclusionWe demonstrated the added value of using WGS raw data from public repositories to contribute to TB surveillance. By comparing the German and the public dataset, we identified potential international transmission events. Thus, using this approach might support the interpretation of national surveillance results in an international context.

Download Full-text

Recombinational Switching of the Clostridium difficile S-Layer and a Novel Glycosylation Gene Cluster Revealed by Large-Scale Whole-Genome Sequencing

The Journal of Infectious Diseases ◽

10.1093/infdis/jis734 ◽

2012 ◽

Vol 207 (4) ◽

pp. 675-686 ◽

Cited By ~ 58

Author(s):

Kate E. Dingle ◽

Xavier Didelot ◽

M. Azim Ansari ◽

David W. Eyre ◽

Alison Vaughan ◽

...

Keyword(s):

Clostridium Difficile ◽

Whole Genome Sequencing ◽

Gene Cluster ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome

Download Full-text

Whole-Genome Sequencing for Routine Pathogen Surveillance in Public Health: a Population Snapshot of InvasiveStaphylococcus aureusin Europe

mBio ◽

10.1128/mbio.00444-16 ◽

2016 ◽

Vol 7 (3) ◽

Cited By ~ 123

Author(s):

David M. Aanensen ◽

Edward J. Feil ◽

Matthew T. G. Holden ◽

Janina Dordel ◽

Corin A. Yeats ◽

...

Keyword(s):

Public Health ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Bacterial Pathogens ◽

Epidemiological Surveillance ◽

Data Sets ◽

Whole Genome ◽

Bioinformatic Tools ◽

Road Map

ABSTRACTThe implementation of routine whole-genome sequencing (WGS) promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasiveStaphylococcus aureusisolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial visualization of the data is made possible by a generic software tool designed for public health purposes that is available at the project URL (http://www.microreact.org/project/EkUvg9uY?tt=rc). Our analysis demonstrates that high-risk clones can be identified on the basis of population level properties such as clonal relatedness, abundance, and spatial structuring and by inferring virulence and resistance properties on the basis of gene content. We also show thatin silicopredictions of antibiotic resistance profiles are at least as reliable as phenotypic testing. We argue that this work provides a comprehensive road map illustrating the three vital components for future molecular epidemiological surveillance: (i) large-scale structured surveys, (ii) WGS, and (iii) community-oriented database infrastructure and analysis tools.IMPORTANCEThe spread of antibiotic-resistant bacteria is a public health emergency of global concern, threatening medical intervention at every level of health care delivery. Several recent studies have demonstrated the promise of routine whole-genome sequencing (WGS) of bacterial pathogens for epidemiological surveillance, outbreak detection, and infection control. However, as this technology becomes more widely adopted, the key challenges of generating representative national and international data sets and the development of bioinformatic tools to manage and interpret the data become increasingly pertinent. This study provides a road map for the integration of WGS data into routine pathogen surveillance. We emphasize the importance of large-scale routine surveys to provide the population context for more targeted or localized investigation and the development of open-access bioinformatic tools to provide the means to combine and compare independently generated data with publicly available data sets.

Download Full-text

ReadFilter - Filtering reads of interest for quicker downstream analysis

10.1101/266080 ◽

2018 ◽

Author(s):

Kim Lee Ng ◽

Thor Bech Johannesen ◽

Mark Østerlund ◽

Kristoffer Kiil ◽

Paal Skytt Andersen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Whole Genome ◽

Assembly Time ◽

Link Type ◽

Redundant Data ◽

Run Time ◽

Downstream Analysis

AbstractWhole-genome sequencing is becoming the method of choice but provides redundant data for many tasks. ReadFilter (https://github.com/ssi-dk/serum_readfilter) is offered as a way to improve run time of these tasks by rapidly filtering reads against user-specified sequences in order to work with a small fraction of original reads while maintaining accuracy. This can noticeably reduce mapping time and substantially reduce de novo assembly time.

Download Full-text

Fast and inexpensive whole genome sequencing library preparation from intact yeast cells

10.1101/2020.09.03.280990 ◽

2020 ◽

Author(s):

Sibylle C Vonesch ◽

Shengdi Li ◽

Chelsea Szu Tu ◽

Bianca P Hennig ◽

Nikolay Dobrev ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Dna ◽

Large Scale ◽

Massively Parallel Sequencing ◽

Yeast Cells ◽

Whole Genome ◽

High Quality ◽

Rapid Preparation ◽

Yeast Cultures

ABSTRACTThrough the increase in the capacity of sequencing machines massively parallel sequencing of thousands of samples in a single run is now possible. With the improved throughput and resulting drop in the price of sequencing, the cost and time for preparation of sequencing libraries have become the major bottleneck in large-scale experiments. Methods using a hyperactive variant of the Tn5 transposase efficiently generate libraries starting from cDNA or genomic DNA in a few hours and are highly scalable. For genome sequencing, however, the time and effort spent on genomic DNA isolation limits the practicability of sequencing large numbers of samples. Here, we describe a highly scalable method for preparing high quality whole-genome sequencing libraries directly from yeast cultures in less than three hours at 34 cents per sample. We skip the rate-limiting step of genomic DNA extraction by directly tagmenting yeast spheroplasts and add a nucleosome release step prior to enrichment PCR to improve the evenness of genomic coverage. Resulting libraries do not show any GC-bias and are comparable in quality to libraries processed from genomic DNA with a commercially available Tn5-based kit. We use our protocol to investigate CRISPR/Cas9 on- and off-target edits and reliably detect edited variants and shared polymorphisms between strains. Our protocol enables rapid preparation of unbiased and high-quality, sequencing-ready indexed libraries for hundreds of yeast strains in a single day at a low price. By adjusting individual steps of our workflow we expect that our protocol can be adapted to other organisms.

Download Full-text

Whole genome sequencing and metabolomics analyses reveal the biosynthesis of nerol in a multi-stress-tolerant Meyerozyma guilliermondii GXDK6

10.21203/rs.3.rs-91176/v1 ◽

2020 ◽

Author(s):

Xueyan Mo ◽

Xinghua Cai ◽

Qinyan Hui ◽

Huijie Sun ◽

Ran Yu ◽

...

Keyword(s):

Heavy Metal ◽

Carbon Source ◽

Chemical Synthesis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Gas Chromatography Mass Spectrometry ◽

Whole Genome ◽

Scale Production ◽

Meyerozyma Guilliermondii

Abstract BackgroundNerol (C10H18O), an acyclic monoterpene, naturally presents in plant essential oils, and is used widely in food, cosmetics and pharmaceuticals as the valuable fragrance. Meanwhile, chemical synthesis is the only strategy for large-scale production of nerol, and the disadvantages of chemical synthesis greatly limited the production and its application. These defects drive the interests of researchers shift to the production of nerol by eco-friendly methods known as biosynthesis methods. However, the main technical bottleneck restricting the biosynthesis of nerol is the lacking of corresponding natural aroma-producing microorganisms.ResultsIn this study, a novel multi-stress-tolerant probiotics Meyerozyma guilliermondii GXDK6 with aroma-producing properties was identified by whole genome sequencing and metabolomics technology. GXDK6 showed a broad pH tolerance in the range of 2.5–10.0. The species also showed salt tolerance with up to 12% NaCl and up to 18% of KCl or MgCl2. GXDK6 exhibited heavy-metal Mn2+ tolerance of up to 5494 ppm. GXDK6 could also ferment with a total of 21 kinds of single organic matter as the carbon source, and produce abundant aromatic metabolites. Results from the gas chromatography–mass spectrometry indicated the production of 8–14 types of aromatic metabolites (isopentanol, nerol, geraniol, phenylethanol, isobutanol, etc.) when GXDK6 was fermented up to 72 h with glucose, sucrose, fructose, or xylose as the single carbon source. Among of them, nerol was found as a novel aromatic metabolite from GXDK6 fermentation, and its biosynthesis mechanism had also been further revealed.ConclusionA novel aroma-producing M. guilliermondii GXDK6 was identified successfully by whole genome sequencing and metabolomics technology. GXDK6 showed high multi-stress-tolerant properties with acid–base, salty, and heavy-metal environments. The aroma-producing mechanism of nerol in GXDK6 had also been revealed. These findings indicated the aroma-producing M. guilliermondii GXDK6 with multi-stress-tolerant properties has great potential value in the fermentation industry.

Download Full-text

Comparing serotyping with whole-genome sequencing for subtyping of non-typhoidal Salmonella enterica: a large-scale analysis of 37 serotypes with a public health impact in the USA

Microbial Genomics ◽

10.1099/mgen.0.000425 ◽

2020 ◽

Vol 6 (9) ◽

Cited By ~ 1

Author(s):

Ehud Elnekave ◽

Samuel L. Hong ◽

Seunghyun Lim ◽

Timothy J. Johnson ◽

Andres Perez ◽

...

Keyword(s):

Public Health ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Health Impact ◽

Public Health Impact ◽

Whole Genome ◽

Scale Analysis ◽

Large Scale Analysis ◽

The Usa

Serotyping has traditionally been used for subtyping of non-typhoidal Salmonella (NTS) isolates. However, its discriminatory power is limited, which impairs its use for epidemiological investigations of source attribution. Whole-genome sequencing (WGS) analysis allows more accurate subtyping of strains. However, because of the relative newness and cost of routine WGS, large-scale studies involving NTS WGS are still rare. We aimed to revisit the big picture of subtyping NTS with a public health impact by using traditional serotyping (i.e. reaction between antisera and surface antigens) and comparing the results with those obtained using WGS. For this purpose, we analysed 18 282 sequences of isolates belonging to 37 serotypes with a public health impact that were recovered in the USA between 2006 and 2017 from multiple sources, and were available at the National Center for Biotechnology Information (NCBI). Phylogenetic trees were reconstructed for each serotype using the core genome for the identification of genetic subpopulations. We demonstrated that WGS-based subtyping allows better identification of sources potentially linked with human infection and emerging subpopulations, along with providing information on the risk of dissemination of plasmids and acquired antimicrobial resistance genes (AARGs). In addition, by reconstructing a phylogenetic tree with representative isolates from all serotypes (n=370), we demonstrated genetic variability within and between serotypes, which formed monophyletic, polyphyletic and paraphyletic clades. Moreover, we found (in the entire data set) an increased detection rate for AARGs linked to key antimicrobials (such as quinolones and extended-spectrum cephalosporins) over time. The outputs of this large-scale analysis reveal new insights into the genetic diversity within and between serotypes; the polyphyly and paraphyly of certain serotypes may suggest that the subtyping of NTS to serotypes may not be sufficient. Moreover, the results and the methods presented here, leading to differentiation between genetic subpopulations based on their potential risk to public health, as well as narrowing down the possible sources of these infections, may be used as a baseline for subtyping of future NTS infections and help efforts to mitigate and prevent infections in the USA and globally.

Download Full-text