scholarly journals Reanalysis of deep-sequencing data from Austria points towards a small SARS-COV-2 transmission bottleneck on the order of one to three virions

2021 ◽  
Author(s):  
Michael A. Martin ◽  
Katia Koelle

An early analysis of SARS-CoV-2 deep-sequencing data that combined epidemiological and genetic data to characterize the transmission dynamics of the virus in and beyond Austria concluded that the size of the virus’s transmission bottleneck was large – on the order of 1000 virions. We performed new computational analyses using these deep-sequenced samples from Austria. Our analyses included characterization of transmission bottleneck sizes across a range of variant calling thresholds and examination of patterns of shared low-frequency variants between transmission pairs in cases where de novo genetic variation was present in the recipient. From these analyses, among others, we found that SARS-CoV-2 transmission bottlenecks are instead likely to be very tight, on the order of 1-3 virions. These findings have important consequences for understanding how SARS-CoV-2 evolves between hosts and the processes shaping genetic variation observed at the population level.

2018 ◽  
Author(s):  
Dimitrios Kleftogiannis ◽  
Marco Punta ◽  
Anuradha Jayaram ◽  
Shahneen Sandhu ◽  
Stephen Q. Wong ◽  
...  

AbstractBackgroundTargeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs).MethodsTo address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection.ResultsOur tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments.ConclusionsAmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve.


2015 ◽  
Author(s):  
Stefano Lonardi ◽  
Hamid Mirebrahim ◽  
Steve Wanamaker ◽  
Matthew Alpert ◽  
Gianfranco Ciardo ◽  
...  

Since the invention of DNA sequencing in the seventies, computational biologists have had to deal with the problem de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, for the first time we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Specifically, we explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed by our group), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on "divide and conquer": we "slice" a large dataset into smaller samples of optimal size, decode each slice independently, then merge the results. Experimental results on over 15,000 barley BACs and over 4,000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.


BMC Genomics ◽  
2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Richard J Orton ◽  
Caroline F Wright ◽  
Marco J Morelli ◽  
David J King ◽  
David J Paton ◽  
...  

2017 ◽  
Author(s):  
Ashley Sobel Leonard ◽  
Daniel Weissman ◽  
Benjamin Greenbaum ◽  
Elodie Ghedin ◽  
Katia Koelle

AbstractThe bottleneck governing infectious disease transmission describes the size of the pathogen population transferred from a donor to a recipient host. Accurate quantification of the bottleneck size is of particular importance for rapidly evolving pathogens such as influenza virus, as narrow bottlenecks would limit the extent of transferred viral genetic diversity and, thus, have the potential to slow the rate of viral adaptation. Previous studies have estimated the transmission bottleneck size governing viral transmission through statistical analyses of variants identified in pathogen sequencing data. The methods used by these studies, however, did not account for variant calling thresholds and stochastic dynamics of the viral population within recipient hosts. Because these factors can skew bottleneck size estimates, we here introduce a new method for inferring transmission bottleneck sizes that explicitly takes these factors into account. We compare our method, based on beta-binomial sampling, with existing methods in the literature for their ability to recover the transmission bottleneck size of a simulated dataset. This comparison demonstrates that the beta-binomial sampling method is best able to accurately infer the simulated bottleneck size. We then apply our method to a recently published dataset of influenza A H1N1p and H3N2 infections, for which viral deep sequencing data from inferred donor-recipient transmission pairs are available. Our results indicate that transmission bottleneck sizes across transmission pairs are variable, yet that there is no significant difference in the overall bottleneck sizes inferred for H1N1p and H3N2. The mean bottleneck size for influenza virus in this study, considering all transmission pairs, was Nb = 196 (95% confidence interval 66-392) virions. While this estimate is consistent with previous bottleneck size estimates for this dataset, it is considerably higher than the bottleneck sizes estimated for influenza from other datasets.Author SummaryThe transmission bottleneck size describes the size of the pathogen population transferred from the donor to recipient host at the onset of infection and is a key factor in determining the rate at which a pathogen can adapt within a host population. Recent advances in sequencing technology have enabled the bottleneck size to be estimated from pathogen sequence data, though there is not yet a consensus on the statistical method to use. In this study, we introduce a new approach for inferring the transmission bottleneck size from sequencing data that accounts for the criteria used to identify sequence variants and stochasticity in pathogen replication dynamics. We show that the failure to account for these factors may lead to underestimation of the transmission bottleneck size. We apply this method to a previous dataset of human influenza A infections, showing that transmission is governed by a loose transmission bottleneck and that the bottleneck size is highly variable across transmission events. This work advances our understanding of the bottleneck size governing influenza infection and introduces a method for estimating the bottleneck size that can be applied to other rapidly evolving RNA viruses, such as norovirus and RSV.


2021 ◽  
Author(s):  
Daniele Ramazzotti ◽  
Davide Maspero ◽  
Fabrizio Angaroni ◽  
Marco Antoniotti ◽  
Rocco Piazza ◽  
...  

In the definition of fruitful strategies to contrast the worldwide diffusion of SARS-CoV-2, maximum efforts must be devoted to the early detection of dangerous variants. An effective help to this end is granted by the analysis of deep sequencing data of viral samples, which are typically discarded after the creation of consensus sequences. Indeed, only with deep sequencing data it is possible to identify intra-host low-frequency mutations, which are a direct footprint of mutational processes that may eventually lead to the origination of functionally advantageous variants. Accordingly, a timely and statistically robust identification of such mutations might inform political decision-making with significant anticipation with respect to standard analyses based on consensus sequences. To support our claim, we here present the largest study to date of SARS-CoV-2 deep sequencing data, which involves 220,788 high quality samples, collected over 20 months from 137 distinct studies. Importantly, we show that a relevant number of spike and nucleocapsid mutations of interest associated to the most circulating variants, including Beta, Delta and Omicron, might have been intercepted several months in advance, possibly leading to different public-health decisions. In addition, we show that a refined genomic surveillance system involving high- and low-frequency mutations might allow one to pinpoint possibly dangerous emerging mutation patterns, providing a data-driven automated support to epidemiologists and virologists.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
Nicholas Stoler ◽  
Anton Nekrutenko ◽  
...  

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.


2012 ◽  
Vol 5 (1) ◽  
pp. 338
Author(s):  
Sharon Ben-Zvi ◽  
Adi Givati ◽  
Noam Shomron

2017 ◽  
Vol 26 ◽  
pp. 1-11 ◽  
Author(s):  
Molly M. Rathbun ◽  
Jennifer A. McElhoe ◽  
Walther Parson ◽  
Mitchell M. Holland

Sign in / Sign up

Export Citation Format

Share Document