Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity

The COVID-19 pandemic has led to a rapid accumulation of SARS-CoV-2 genomes, enabling genomic epidemiology on local and global scales. Collections of genomes from resources such as GISAID must be subsampled to enable computationally feasible phylogenetic and other analyses. We present genome-sampler, a software package that supports sampling collections of viral genomes across multiple axes including time of genome isolation, location of genome isolation, and viral diversity. The software is modular in design so that these or future sampling approaches can be applied independently and combined (or replaced with a random sampling approach) to facilitate custom workflows and benchmarking. genome-sampler is written as a QIIME 2 plugin, ensuring that its application is fully reproducible through QIIME 2’s unique retrospective data provenance tracking system. genome-sampler can be installed in a conda environment on macOS or Linux systems. A complete default pipeline is available through a Snakemake workflow, so subsampling can be achieved using a single command. genome-sampler is open source, free for all to use, and available at https://caporasolab.us/genome-sampler. We hope that this will facilitate SARS-CoV-2 research and support evaluation of viral genome sampling approaches for genomic epidemiology.

Download Full-text

Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity

F1000Research ◽

10.12688/f1000research.24751.1 ◽

2020 ◽

Vol 9 ◽

pp. 657 ◽

Cited By ~ 1

Author(s):

Evan Bolyen ◽

Matthew R. Dillon ◽

Nicholas A. Bokulich ◽

Jason T. Ladner ◽

Brendan B. Larsen ◽

...

Keyword(s):

Software Package ◽

Tracking System ◽

Viral Diversity ◽

Data Provenance ◽

Time Geography ◽

Viral Genomes ◽

Genomic Epidemiology ◽

Rapid Accumulation ◽

Support Evaluation ◽

Random Sampling Approach

Download Full-text

Genomic epidemiology and associated clinical outcomes of a SARS-CoV-2 outbreak in a general adult hospital in Quebec

10.1101/2021.05.29.21257760 ◽

2021 ◽

Author(s):

Bastien Pare ◽

Marieke Rozendaal ◽

Sacha Morin ◽

Raphael Poujol ◽

Fatima Mostefai ◽

...

Keyword(s):

Dimensionality Reduction ◽

Patient Outcomes ◽

Viral Diversity ◽

Whole Genome ◽

Nanopore Sequencing ◽

Viral Genomes ◽

Genomic Epidemiology ◽

Hospital Acquired ◽

Viral Lineage ◽

Adult Hospital

The first confirmed case of COVID-19 in Quebec, Canada, occurred at Verdun Hospital on February 25, 2020. A month later, a localized outbreak was observed at this hospital. We performed tiled amplicon whole genome nanopore sequencing on nasopharyngeal swabs from all SARS-CoV-2 positive samples from 31 March to 17 April 2020 in 2 local hospitals to assess the viral diversity of the outbreak. We report 264 viral genomes from 242 individuals (both staff and patients) with associated clinical features and outcomes, as well as longitudinal samples, technical replicates and the first publicly disseminated SARS-CoV-2 genomes in Quebec. Viral lineage assessment identified multiple subclades in both hospitals, with a predominant subclade in the Verdun outbreak, indicative of hospital-acquired transmission. Dimensionality reduction identified two subclades that evaded supervised lineage assignment methods, including Pangolin, and identified certain symptoms (headache, myalgia and sore throat) that are significantly associated with favorable patient outcomes. We also address certain limitations of standard SARS-CoV-2 bioinformatics procedures, notably when presented with multiple viral haplotypes.

Download Full-text

Genomic Epidemiology of SARS-CoV-2 in Madrid, Spain, during the First Wave of the Pandemic: Fast Spread and Early Dominance by D614G Variants

Microorganisms ◽

10.3390/microorganisms9020454 ◽

2021 ◽

Vol 9 (2) ◽

pp. 454

Author(s):

Esther Viedma ◽

Elias Dahdouh ◽

José González-Alba ◽

Sara González-Bodi ◽

Laura Martínez-García ◽

...

Keyword(s):

Air Transportation ◽

European Population ◽

Viral Population ◽

Rt Pcr ◽

Viral Genomes ◽

Genomic Epidemiology ◽

International Transport ◽

Polymerase Chain ◽

Phylodynamic Analysis

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first detected in Madrid, Spain, on 25 February 2020. It increased in frequency very fast and by the end of May more than 70,000 cases had been confirmed by reverse transcription-polymerase chain reaction (RT-PCR). To study the lineages and the diversity of the viral population during this first epidemic wave in Madrid we sequenced 224 SARS-CoV-2 viral genomes collected from three hospitals from February to May 2020. All the known major lineages were found in this set of samples, though B.1 and B.1.5 were the most frequent ones, accounting for more than 60% of the sequences. In parallel with the B lineages and sublineages, the D614G mutation in the Spike protein sequence was detected soon after the detection of the first coronavirus disease 19 (COVID-19) case in Madrid and in two weeks became dominant, being found in 80% of the samples and remaining at this level during all the study periods. The lineage composition of the viral population found in Madrid was more similar to the European population than to the publicly available Spanish data, underlining the role of Madrid as a national and international transport hub. In agreement with this, phylodynamic analysis suggested multiple independent entries before the national lockdown and air transportation restrictions.

Download Full-text

Genomic epidemiology reveals multiple introductions of SARS-CoV-2 followed by community and nosocomial spread, Germany, February to May 2020

Eurosurveillance ◽

10.2807/1560-7917.es.2021.26.43.2002066 ◽

2021 ◽

Vol 26 (43) ◽

Author(s):

Maximilian Muenchhoff ◽

Alexander Graf ◽

Stefan Krebs ◽

Caroline Quartucci ◽

Sandra Hasmann ◽

...

Keyword(s):

Healthcare Workers ◽

Sequence Data ◽

Phylogenetic Analyses ◽

Local Level ◽

University Hospital ◽

Metropolitan Region ◽

Viral Genomes ◽

Genomic Epidemiology ◽

Viral Spread ◽

Spatio Temporal

Background In the SARS-CoV-2 pandemic, viral genomes are available at unprecedented speed, but spatio-temporal bias in genome sequence sampling precludes phylogeographical inference without additional contextual data. Aim We applied genomic epidemiology to trace SARS-CoV-2 spread on an international, national and local level, to illustrate how transmission chains can be resolved to the level of a single event and single person using integrated sequence data and spatio-temporal metadata. Methods We investigated 289 COVID-19 cases at a university hospital in Munich, Germany, between 29 February and 27 May 2020. Using the ARTIC protocol, we obtained near full-length viral genomes from 174 SARS-CoV-2-positive respiratory samples. Phylogenetic analyses using the Auspice software were employed in combination with anamnestic reporting of travel history, interpersonal interactions and perceived high-risk exposures among patients and healthcare workers to characterise cluster outbreaks and establish likely scenarios and timelines of transmission. Results We identified multiple independent introductions in the Munich Metropolitan Region during the first weeks of the first pandemic wave, mainly by travellers returning from popular skiing areas in the Alps. In these early weeks, the rate of presumable hospital-acquired infections among patients and in particular healthcare workers was high (9.6% and 54%, respectively) and we illustrated how transmission chains can be dissected at high resolution combining virus sequences and spatio-temporal networks of human interactions. Conclusions Early spread of SARS-CoV-2 in Europe was catalysed by superspreading events and regional hotspots during the winter holiday season. Genomic epidemiology can be employed to trace viral spread and inform effective containment strategies.

Download Full-text

Raw Sewage Harbors Diverse Viral Populations

mBio ◽

10.1128/mbio.00180-11 ◽

2011 ◽

Vol 2 (5) ◽

Cited By ~ 171

Author(s):

Paul G. Cantalupo ◽

Byron Calgua ◽

Guoyan Zhao ◽

Ayalkibet Hundesa ◽

Adam D. Wier ◽

...

Keyword(s):

Genetic Material ◽

Viral Diversity ◽

Emerging Pathogens ◽

Viral Genomes ◽

Untreated Wastewater ◽

Virus Diversity ◽

Double Stranded Dna ◽

Large Numbers ◽

Ideal System ◽

Species Specific

ABSTRACTAt this time, about 3,000 different viruses are recognized, but metagenomic studies suggest that these viruses are a small fraction of the viruses that exist in nature. We have explored viral diversity by deep sequencing nucleic acids obtained from virion populations enriched from raw sewage. We identified 234 known viruses, including 17 that infect humans. Plant, insect, and algal viruses as well as bacteriophages were also present. These viruses represented 26 taxonomic families and included viruses with single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), positive-sense ssRNA [ssRNA(+)], and dsRNA genomes. Novel viruses that could be placed in specific taxa represented 51 different families, making untreated wastewater the most diverse viral metagenome (genetic material recovered directly from environmental samples) examined thus far. However, the vast majority of sequence reads bore little or no sequence relation to known viruses and thus could not be placed into specific taxa. These results show that the vast majority of the viruses on Earth have not yet been characterized. Untreated wastewater provides a rich matrix for identifying novel viruses and for studying virus diversity.IMPORTANCEAt this time, virology is focused on the study of a relatively small number of viral species. Specific viruses are studied either because they are easily propagated in the laboratory or because they are associated with disease. The lack of knowledge of the size and characteristics of the viral universe and the diversity of viral genomes is a roadblock to understanding important issues, such as the origin of emerging pathogens and the extent of gene exchange among viruses. Untreated wastewater is an ideal system for assessing viral diversity because virion populations from large numbers of individuals are deposited and because raw sewage itself provides a rich environment for the growth of diverse host species and thus their viruses. These studies suggest that the viral universe is far more vast and diverse than previously suspected.

Download Full-text

Novel viral genomes identified from six metagenomes reveal wide distribution of archaeal viruses and high viral diversity in terrestrial hot springs

Environmental Microbiology ◽

10.1111/1462-2920.13079 ◽

2015 ◽

Vol 18 (3) ◽

pp. 863-874 ◽

Cited By ~ 38

Author(s):

Sóley Ruth Gudbergsdóttir ◽

Peter Menzel ◽

Anders Krogh ◽

Mark Young ◽

Xu Peng

Keyword(s):

Hot Springs ◽

Wide Distribution ◽

Viral Diversity ◽

Viral Genomes ◽

Archaeal Viruses ◽

Terrestrial Hot Springs

Download Full-text

377. SARS-CoV-2 Genomic Surveillance Reveals Little Spread Between a Large University Campus and the Surrounding Community

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab466.578 ◽

2021 ◽

Vol 8 (Supplement_1) ◽

pp. S290-S290

Author(s):

Andrew Valesano ◽

William Fitzsimmons ◽

Christopher Blair ◽

Robert Woods ◽

Julie Gilbert ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Panel Member ◽

Illumina Miseq ◽

Clinical Microbiology Laboratory ◽

University Of Michigan ◽

Viral Genomes ◽

Genomic Epidemiology ◽

Ann Arbor ◽

Washtenaw County ◽

The University

Abstract Background Understanding SARS-CoV-2 transmission dynamics is critical for controlling and preventing outbreaks. The genomic epidemiology of SARS-CoV-2 on college campuses has not been comprehensively studied, and the extent to which campus-associated outbreaks lead to transmission in nearby communities is unclear. We used high-density genomic surveillance to track SARS-CoV-2 transmission across the University of Michigan-Ann Arbor campus and Washtenaw County during the Fall 2020 semester. Methods We retrieved all available residual diagnostic specimens from the Michigan Medicine Clinical Microbiology Laboratory and University Health Service that were positive for SARS-CoV-2 from August 16th – November 25th, 2020 (n = 2245). We extracted viral RNA, amplified the SARS-CoV-2 genome by multiplex RT-PCR, and sequenced these amplicons on an Illumina MiSeq. We applied maximum likelihood phylogenetic analysis to whole genome sequences to define and characterize transmission lineages. Results We assembled complete viral genomes from 1659 individual infections, representing roughly 25% of confirmed cases in Washtenaw County across the fall semester. Of these cases, 468 were University of Michigan students. Phylogenetic analysis revealed 203 genetically distinct introductions of SARS-CoV-2 into the student population, most of which were singletons (n = 171) or small clusters of 2 – 8 students. We identified two large SARS-CoV-2 transmission lineages (115 and 73 students, respectively), including individuals from multiple on-campus residences. Viral descendants of these student outbreaks were rare, constituting less than 4% of cases in the community. Conclusion We identified many SARS-CoV-2 transmission introductions into the University of Michigan campus in Fall 2020. While there was widespread transmission among students, there is little evidence that these outbreaks significantly contributed to the rise in COVID-19 cases that Washtenaw County experienced in November 2020. Disclosures Adam Lauring, MD, PhD, Roche (Advisor or Review Panel member) Sanofi (Consultant)

Download Full-text

Coast-to-coast spread of SARS-CoV-2 in the United States revealed by genomic epidemiology

10.1101/2020.03.25.20043828 ◽

2020 ◽

Cited By ~ 12

Author(s):

Joseph R. Fauver ◽

Mary E. Petrone ◽

Emma B. Hodcroft ◽

Kayoko Shioda ◽

Hanna Y. Ehrlich ◽

...

Keyword(s):

United States ◽

Pacific Northwest ◽

The United States ◽

International Travel ◽

Viral Genomes ◽

Genomic Epidemiology ◽

The Pacific ◽

Travel Restrictions ◽

Novel Coronavirus ◽

The U.S

SummarySince its emergence and detection in Wuhan, China in late 2019, the novel coronavirus SARS-CoV-2 has spread to nearly every country around the world, resulting in hundreds of thousands of infections to date. The virus was first detected in the Pacific Northwest region of the United States in January, 2020, with subsequent COVID-19 outbreaks detected in all 50 states by early March. To uncover the sources of SARS-CoV-2 introductions and patterns of spread within the U.S., we sequenced nine viral genomes from early reported COVID-19 patients in Connecticut. Our phylogenetic analysis places the majority of these genomes with viruses sequenced from Washington state. By coupling our genomic data with domestic and international travel patterns, we show that early SARS-CoV-2 transmission in Connecticut was likely driven by domestic introductions. Moreover, the risk of domestic importation to Connecticut exceeded that of international importation by mid-March regardless of our estimated impacts of federal travel restrictions. This study provides evidence for widespread, sustained transmission of SARS-CoV-2 within the U.S. and highlights the critical need for local surveillance.

Download Full-text

CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes

10.1101/2021.07.20.453079 ◽

2021 ◽

Author(s):

Roux-Cil Ferreira ◽

Emmanuel Wong ◽

Gopi Gugan ◽

Kaitlyn Wade ◽

Molly Liu ◽

...

Keyword(s):

Horizontal Line ◽

Consensus Tree ◽

New Approach ◽

Genomic Epidemiology ◽

Large Numbers ◽

Rapid Accumulation ◽

Global Spread ◽

Incomplete Coverage ◽

Collection Date ◽

Strict Molecular Clock

Phylogenetics has played a pivotal role in the genomic epidemiology of SARS-CoV-2, such as tracking the emergence and global spread of variants, and scientific communication. However, the rapid accumulation of genomic data from around the world - with over two million genomes currently available in the GISAID database - is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2, and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into 'variants', generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neigihbor-joining trees in RapidNJ and converted into a majority-rule consensus tree for the lineage. Branches with support values below 50% or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly-sampled ancestral variants. Currently, we process about 1.6 million genomes in approximately nine hours on 34 cores. The resulting trees are visualized using the JavaScript framework D3.js as 'beadplots', in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu.

Download Full-text

Genomic diversity of SARS-CoV-2 in Oxford during United Kingdom’s first national lockdown

Scientific Reports ◽

10.1038/s41598-021-01022-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Altar M. Munis ◽

Monique Andersson ◽

Alexander Mobbs ◽

Stephen C. Hyde ◽

Deborah R. Gill

Keyword(s):

Large Scale ◽

Genomic Diversity ◽

Viral Diversity ◽

Host Immune System ◽

Infection Rates ◽

Short Term ◽

Viral Genomes ◽

The United Kingdom ◽

Local Spread ◽

Term Evolution

AbstractEpidemiological efforts to model the spread of SARS-CoV-2, the virus that causes COVID-19, are crucial to understanding and containing current and future outbreaks and to inform public health responses. Mutations that occur in viral genomes can alter virulence during outbreaks by increasing infection rates and helping the virus evade the host immune system. To understand the changes in viral genomic diversity and molecular epidemiology in Oxford during the first wave of infections in the United Kingdom, we analyzed 563 clinical SARS-CoV-2 samples via whole-genome sequencing using Nanopore MinION sequencing. Large-scale surveillance efforts during viral epidemics are likely to be confounded by the number of independent introductions of the viral strains into a region. To avoid such issues and better understand the selection-based changes occurring in the SARS-CoV-2 genome, we utilized local isolates collected during the UK’s first national lockdown whereby personal interactions, international and national travel were considerably restricted and controlled. We were able to track the short-term evolution of the virus, detect the emergence of several mutations of concern or interest, and capture the viral diversity of the region. Overall, these results demonstrate genomic pathogen surveillance efforts have considerable utility in controlling the local spread of the virus.

Download Full-text