Rapid incidence estimation from SARS-CoV-2 genomes reveals decreased case detection in Europe during summer 2020

Abstract By May 2021, over 160 million SARS-CoV-2 diagnoses have been reported worldwide. Yet, the true number of infections is unknown and believed to exceed the reported numbers by several fold. National testing policies, in particular, can strongly affect the proportion of undetected cases. Here, we propose a novel method (GInPipe) that reconstructs SARS-CoV-2 incidence profiles within minutes, solely from publicly available, time-stamped viral genomes. We validated GInPipe against in silico generated outbreak data and elaborate phylodynamic analyses. We apply the method to reconstruct incidence histories from sequence data for Denmark, Scotland, Switzerland, and Victoria (Australia). GInPipe reconstructs the different pandemic waves robustly and remarkably accurate. We demonstrate how the method can be used to investigate the effects of changing testing policies on the probability to diagnose and report infected individuals. Specifically, we find that under-reporting was highest in mid 2020 in parts of Europe, coinciding with changes towards more liberal testing policies at times of low testing capacities. Due to the increased use of real-time sequencing, it is envisaged that GInPipe can complement established surveillance tools to monitor the SARS-CoV-2 pandemic. We anticipate that the method is particularly useful in settings where diagnostic and reporting infrastructures are insufficient. In ‘post-pandemic’ times, when diagnostic efforts are decreased, GInPipe may facilitate the detection of hidden infection dynamics.

Download Full-text

Rapid incidence estimation from SARS-CoV-2 genomes reveals decreased case detection in Europe during summer 2020

Nature Communications ◽

10.1038/s41467-021-26267-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Maureen Rebecca Smith ◽

Maria Trofimova ◽

Ariane Weber ◽

Yannick Duport ◽

Denise Kühnert ◽

...

Keyword(s):

Real Time ◽

Sequence Data ◽

Considerable Proportion ◽

Case Detection ◽

Viral Genomes ◽

Case Ascertainment ◽

Infection Dynamics ◽

Incidence Estimation ◽

Testing Policies

AbstractBy October 2021, 230 million SARS-CoV-2 diagnoses have been reported. Yet, a considerable proportion of cases remains undetected. Here, we propose GInPipe, a method that rapidly reconstructs SARS-CoV-2 incidence profiles solely from publicly available, time-stamped viral genomes. We validate GInPipe against simulated outbreaks and elaborate phylodynamic analyses. Using available sequence data, we reconstruct incidence histories for Denmark, Scotland, Switzerland, and Victoria (Australia) and demonstrate, how to use the method to investigate the effects of changing testing policies on case ascertainment. Specifically, we find that under-reporting was highest during summer 2020 in Europe, coinciding with more liberal testing policies at times of low testing capacities. Due to the increased use of real-time sequencing, it is envisaged that GInPipe can complement established surveillance tools to monitor the SARS-CoV-2 pandemic. In post-pandemic times, when diagnostic efforts are decreasing, GInPipe may facilitate the detection of hidden infection dynamics.

Download Full-text

Rapid incidence estimation from SARS-CoV-2 genomes reveals decreased case detection in Europe during summer 2020

10.1101/2021.05.14.21257234 ◽

2021 ◽

Author(s):

Maureen Rebecca Smith ◽

Maria Trofimova ◽

Ariane Weber ◽

Yannick Duport ◽

Denise Kuhnert ◽

...

Keyword(s):

Sequence Data ◽

Large Data ◽

Relative Magnitude ◽

Large Data Sets ◽

Data Sets ◽

Sequencing Data ◽

A Genome ◽

Incidence Estimation ◽

Automated Pipeline ◽

Testing Policies

In May 2021, over 160 million SARS-CoV-2 infections have been reported worldwide. Yet, the true amount of infections is unknown and believed to exceed the reported numbers by several fold, depending on national testing policies that can strongly affect the proportion of undetected cases. To overcome this testing bias and better assess SARS-CoV-2 transmission dynamics, we propose a genome-based computational pipeline, GInPipe, to reconstruct the SARS-CoV-2 incidence dynamics through time. After validating GInPipe against in silico generated outbreak data, as well as more complex phylodynamic analyses, we use the pipeline to reconstruct incidence histories in Denmark, Scotland, Switzerland, and Victoria (Australia) solely from viral sequence data. The proposed method robustly reconstructs the different pandemic waves in the investigated countries and regions, does not require phylodynamic reconstruction, and can be directly applied to publicly deposited SARS-CoV-2 sequencing data sets. We observe differences in the relative magnitude of reconstructed versus reported incidences during times with sparse availability of diagnostic tests. Using the reconstructed incidence dynamics, we assess how testing policies may have affected the probability to diagnose and report infected individuals. We find that under-reporting was highest in mid 2020 in all analysed countries, coinciding with liberal testing policies at times of low test capacities. Due to the increased use of real-time sequencing, it is envisaged that GInPipe can complement established surveillance tools to monitor the SARS-CoV-2 pandemic and evaluate testing policies. The method executes within minutes on very large data sets and is freely available as a fully automated pipeline from https://github.com/KleistLab/GInPipe.

Download Full-text

Whole genome characterization of strains belonging to the Ralstonia solanacearum species complex and in silico analysis of TaqMan assays for detection in this heterogenous species complex

European Journal of Plant Pathology ◽

10.1007/s10658-020-02190-8 ◽

2021 ◽

Author(s):

Viola Kurm ◽

Ilse Houwers ◽

Claudia E. Coipan ◽

Peter Bonants ◽

Cees Waalwijk ◽

...

Keyword(s):

Ralstonia Solanacearum ◽

In Silico ◽

Species Complex ◽

Sequence Data ◽

In Silico Analysis ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequences ◽

Pcr Assays

AbstractIdentification and classification of members of the Ralstonia solanacearum species complex (RSSC) is challenging due to the heterogeneity of this complex. Whole genome sequence data of 225 strains were used to classify strains based on average nucleotide identity (ANI) and multilocus sequence analysis (MLSA). Based on the ANI score (>95%), 191 out of 192(99.5%) RSSC strains could be grouped into the three species R. solanacearum, R. pseudosolanacearum, and R. syzygii, and into the four phylotypes within the RSSC (I,II, III, and IV). R. solanacearum phylotype II could be split in two groups (IIA and IIB), from which IIB clustered in three subgroups (IIBa, IIBb and IIBc). This division by ANI was in accordance with MLSA. The IIB subgroups found by ANI and MLSA also differed in the number of SNPs in the primer and probe sites of various assays. An in-silico analysis of eight TaqMan and 11 conventional PCR assays was performed using the whole genome sequences. Based on this analysis several cases of potential false positives or false negatives can be expected upon the use of these assays for their intended target organisms. Two TaqMan assays and two PCR assays targeting the 16S rDNA sequence should be able to detect all phylotypes of the RSSC. We conclude that the increasing availability of whole genome sequences is not only useful for classification of strains, but also shows potential for selection and evaluation of clade specific nucleic acid-based amplification methods within the RSSC.

Download Full-text

Nextstrain: real-time tracking of pathogen evolution

10.1101/224048 ◽

2017 ◽

Cited By ~ 21

Author(s):

James Hadfield ◽

Colin Megill ◽

Sidney M. Bell ◽

John Huddleston ◽

Barney Potter ◽

...

Keyword(s):

Public Health ◽

Real Time ◽

Web Application ◽

Sequence Data ◽

Data Types ◽

Bioinformatics Pipeline ◽

Public Health Importance ◽

Viral Genomes ◽

Effective Public Health ◽

Interactive Visualisation

AbstractSummaryUnderstanding the spread and evolution of pathogens is important for effective public health measures and surveillance. Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualisation platform. Together these present a real-time view into the evolution and spread of a range of viral pathogens of high public health importance. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles our current understanding into a single accessible location, publicly available for use by health professionals, epidemiologists, virologists and the public alike.Availability and implementationAll code (predominantly JavaScript and Python) is freely available from github.com/nextstrain and the web-application is available at nextstrain.org.

Download Full-text

Gattaca: Base pair resolution mutation tracking for somatic evolution studies using agent-based models

10.1101/2021.11.08.467784 ◽

2021 ◽

Author(s):

Ryan O Schenck ◽

Gabriel Brosula ◽

Jeffrey West ◽

Simon Leedham ◽

Darryl Shibata ◽

...

Keyword(s):

Base Pair ◽

In Silico ◽

Sequence Data ◽

Agent Based Modeling ◽

Sequence Coverage ◽

Agent Based ◽

Coverage Error ◽

Somatic Evolution ◽

User Friendly ◽

Mutation Spectra

Gattaca provides the first base-pair resolution artificial genomes for tracking somatic mutations within agent based modeling. Through the incorporation of human reference genomes, mutational context, sequence coverage/error information Gattaca is able to realistically provide comparable sequence data for in-silico comparative evolution studies with human somatic evolution studies. This user-friendly method, incorporated into each in-silico cell, allows us to fully capture somatic mutation spectra and evolution.

Download Full-text

Genomic epidemiology reveals multiple introductions of SARS-CoV-2 followed by community and nosocomial spread, Germany, February to May 2020

Eurosurveillance ◽

10.2807/1560-7917.es.2021.26.43.2002066 ◽

2021 ◽

Vol 26 (43) ◽

Author(s):

Maximilian Muenchhoff ◽

Alexander Graf ◽

Stefan Krebs ◽

Caroline Quartucci ◽

Sandra Hasmann ◽

...

Keyword(s):

Healthcare Workers ◽

Sequence Data ◽

Phylogenetic Analyses ◽

Local Level ◽

University Hospital ◽

Metropolitan Region ◽

Viral Genomes ◽

Genomic Epidemiology ◽

Viral Spread ◽

Spatio Temporal

Background In the SARS-CoV-2 pandemic, viral genomes are available at unprecedented speed, but spatio-temporal bias in genome sequence sampling precludes phylogeographical inference without additional contextual data. Aim We applied genomic epidemiology to trace SARS-CoV-2 spread on an international, national and local level, to illustrate how transmission chains can be resolved to the level of a single event and single person using integrated sequence data and spatio-temporal metadata. Methods We investigated 289 COVID-19 cases at a university hospital in Munich, Germany, between 29 February and 27 May 2020. Using the ARTIC protocol, we obtained near full-length viral genomes from 174 SARS-CoV-2-positive respiratory samples. Phylogenetic analyses using the Auspice software were employed in combination with anamnestic reporting of travel history, interpersonal interactions and perceived high-risk exposures among patients and healthcare workers to characterise cluster outbreaks and establish likely scenarios and timelines of transmission. Results We identified multiple independent introductions in the Munich Metropolitan Region during the first weeks of the first pandemic wave, mainly by travellers returning from popular skiing areas in the Alps. In these early weeks, the rate of presumable hospital-acquired infections among patients and in particular healthcare workers was high (9.6% and 54%, respectively) and we illustrated how transmission chains can be dissected at high resolution combining virus sequences and spatio-temporal networks of human interactions. Conclusions Early spread of SARS-CoV-2 in Europe was catalysed by superspreading events and regional hotspots during the winter holiday season. Genomic epidemiology can be employed to trace viral spread and inform effective containment strategies.

Download Full-text

VGEA: an RNA viral assembly toolkit

PeerJ ◽

10.7717/peerj.12129 ◽

2021 ◽

Vol 9 ◽

pp. e12129

Author(s):

Paul E. Oluniyi ◽

Fehintola Ajogbasile ◽

Judith Oguzie ◽

Jessica Uwanibe ◽

Adeyemi Kayode ◽

...

Keyword(s):

De Novo ◽

Sequence Data ◽

Workflow Management ◽

Viral Population ◽

Lassa Virus ◽

Viral Genomes ◽

Bioinformatics Tools ◽

Reference Sequences ◽

Genome Assemblies

Next generation sequencing (NGS)-based studies have vastly increased our understanding of viral diversity. Viral sequence data obtained from NGS experiments are a rich source of information, these data can be used to study their epidemiology, evolution, transmission patterns, and can also inform drug and vaccine design. Viral genomes, however, represent a great challenge to bioinformatics due to their high mutation rate and forming quasispecies in the same infected host, bringing about the need to implement advanced bioinformatics tools to assemble consensus genomes well-representative of the viral population circulating in individual patients. Many tools have been developed to preprocess sequencing reads, carry-out de novo or reference-assisted assembly of viral genomes and assess the quality of the genomes obtained. Most of these tools however exist as standalone workflows and usually require huge computational resources. Here we present (Viral Genomes Easily Analyzed), a Snakemake workflow for analyzing RNA viral genomes. VGEA enables users to map sequencing reads to the human genome to remove human contaminants, split bam files into forward and reverse reads, carry out de novo assembly of forward and reverse reads to generate contigs, pre-process reads for quality and contamination, map reads to a reference tailored to the sample using corrected contigs supplemented by the user’s choice of reference sequences and evaluate/compare genome assemblies. We designed a project with the aim of creating a flexible, easy-to-use and all-in-one pipeline from existing/stand-alone bioinformatics tools for viral genome analysis that can be deployed on a personal computer. VGEA was built on the Snakemake workflow management system and utilizes existing tools for each step: fastp (Chen et al., 2018) for read trimming and read-level quality control, BWA (Li & Durbin, 2009) for mapping sequencing reads to the human reference genome, SAMtools (Li et al., 2009) for extracting unmapped reads and also for splitting bam files into fastq files, IVA (Hunt et al., 2015) for de novo assembly to generate contigs, shiver (Wymant et al., 2018) to pre-process reads for quality and contamination, then map to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences, SeqKit (Shen et al., 2016) for cleaning shiver assembly for QUAST, QUAST (Gurevich et al., 2013) to evaluate/assess the quality of genome assemblies and MultiQC (Ewels et al., 2016) for aggregation of the results from fastp, BWA and QUAST. Our pipeline was successfully tested and validated with SARS-CoV-2 (n = 20), HIV-1 (n = 20) and Lassa Virus (n = 20) datasets all of which have been made publicly available. VGEA is freely available on GitHub at: https://github.com/pauloluniyi/VGEA under the GNU General Public License.

Download Full-text

Porcine Teschoviruses Comprise at Least Eleven Distinct Serotypes: Molecular and Evolutionary Aspects

Journal of Virology ◽

10.1128/jvi.75.4.1620-1631.2001 ◽

2001 ◽

Vol 75 (4) ◽

pp. 1620-1631 ◽

Cited By ~ 85

Author(s):

Roland Zell ◽

Malte Dauber ◽

Andi Krumbholz ◽

Andreas Henke ◽

Eckhard Birch-Hirschfeld ◽

...

Keyword(s):

Neutralizing Antibodies ◽

Sequence Data ◽

Nucleotide Sequencing ◽

Genome Region ◽

Viral Genomes ◽

Field Isolates ◽

Porcine Enterovirus ◽

Group I ◽

Close Relationship ◽

Serological Data

ABSTRACT Nucleotide sequencing and phylogenetic analysis of 10 recognized prototype strains of the porcine enterovirus (PEV) cytopathic effect (CPE) group I reveals a close relationship of the viral genomes to the previously sequenced strain F65, supporting the concept of a reclassification of this virus group into a new picornavirus genus. Also, nucleotide sequences of the polyprotein-encoding genome region or the P1 region of 28 historic strains and recent field isolates were determined. The data suggest that several closely related but antigenically and molecular distinct serotypes constitute one species within the proposed genus Teschovirus. Based on sequence data and serological data, we propose a new serotype with strain Dresden as prototype. This hitherto unrecognized serotype is closely related to porcine teschovirus 1 (PTV-1, former PEV-1), but induces type-specific neutralizing antibodies. Sequencing of field isolates collected from animals presenting with neurological disorders prove that other serotypes than PTV-1 may also cause polioencephalomyelitis of swine.

Download Full-text

In Silico Analysis of Evolution in Swine Flu Viral Genomes Through Re-assortment by Promulgation and Mutation

Biotechnology(Faisalabad) ◽

10.3923/biotech.2009.434.441 ◽

2009 ◽

Vol 8 (4) ◽

pp. 434-441 ◽

Cited By ~ 6

Author(s):

S. Sur ◽

G. Sen ◽

S. Thakur ◽

A.K. Bothra ◽

A. Sen

Keyword(s):

In Silico ◽

In Silico Analysis ◽

Swine Flu ◽

Viral Genomes ◽

Silico Analysis

Download Full-text

The global and local distribution of RNA structure throughout the SARS-CoV-2 genome

Journal of Virology ◽

10.1128/jvi.02190-20 ◽

2020 ◽

Cited By ~ 2

Author(s):

Rafael de Cesaris Araujo Tavares ◽

Gandhar Mahadeshwar ◽

Han Wan ◽

Nicholas C. Huston ◽

Anna Marie Pyle

Keyword(s):

In Silico ◽

Rna Structure ◽

Drug Targets ◽

Rna Folding ◽

Rna Viruses ◽

Viral Agent ◽

Rna Structures ◽

Viral Rnas ◽

Viral Genomes ◽

Rna Genome

SARS-CoV-2 is the causative viral agent of COVID-19, the disease at the center of the current global pandemic. While knowledge of highly structured regions is integral for mechanistic insights into the viral infection cycle, very little is known about the location and folding stability of functional elements within the massive, ∼30kb SARS-CoV-2 RNA genome. In this study, we analyze the folding stability of this RNA genome relative to the structural landscape of other well-known viral RNAs. We present an in-silico pipeline to predict regions of high base pair content across long genomes and to pinpoint hotspots of well-defined RNA structures, a method that allows for direct comparisons of RNA structural complexity within the several domains in SARS-CoV-2 genome. We report that the SARS-CoV-2 genomic propensity for stable RNA folding is exceptional among RNA viruses, superseding even that of HCV, one of the most structured viral RNAs in nature. Furthermore, our analysis suggests varying levels of RNA structure across genomic functional regions, with accessory and structural ORFs containing the highest structural density in the viral genome. Finally, we take a step further to examine how individual RNA structures formed by these ORFs are affected by the differences in genomic and subgenomic contexts, which given the technical difficulty of experimentally separating cellular mixtures of sgRNA from gRNA, is a unique advantage of our in-silico pipeline. The resulting findings provide a useful roadmap for planning focused empirical studies of SARS-CoV-2 RNA biology, and a preliminary guide for exploring potential SARS-CoV-2 RNA drug targets. Importance The RNA genome of SARS-CoV-2 is among the largest and most complex viral genomes, and yet its RNA structural features remain relatively unexplored. Since RNA elements guide function in most RNA viruses, and they represent potential drug targets, it is essential to chart the architectural features of SARS-CoV-2 and pinpoint regions that merit focused study. Here we show that RNA folding stability of SARS-CoV-2 genome is exceptional among viral genomes and we develop a method to directly compare levels of predicted secondary structure across SARS-CoV-2 domains. Remarkably, we find that coding regions display the highest structural propensity in the genome, forming motifs that differ between the genomic and subgenomic contexts. Our approach provides an attractive strategy to rapidly screen for candidate structured regions based on base pairing potential and provides a readily interpretable roadmap to guide functional studies of RNA viruses and other pharmacologically relevant RNA transcripts.

Download Full-text