scholarly journals High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants

Author(s):  
Rafaela S. Fontenele ◽  
Simona Kraberger ◽  
James Hadfield ◽  
Erin M. Driver ◽  
Devin Bowes ◽  
...  

AbstractSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged from a zoonotic spill-over event and has led to a global pandemic. The public health response has been predominantly informed by surveillance of symptomatic individuals and contact tracing, with quarantine, and other preventive measures have then been applied to mitigate further spread. Non-traditional methods of surveillance such as genomic epidemiology and wastewater-based epidemiology (WBE) have also been leveraged during this pandemic. Genomic epidemiology uses high-throughput sequencing of SARS-CoV-2 genomes to inform local and international transmission events, as well as the diversity of circulating variants. WBE uses wastewater to analyse community spread, as it is known that SARS-CoV-2 is shed through bodily excretions. Since both symptomatic and asymptomatic individuals contribute to wastewater inputs, we hypothesized that the resultant pooled sample of population-wide excreta can provide a more comprehensive picture of SARS-CoV-2 genomic diversity circulating in a community than clinical testing and sequencing alone. In this study, we analysed 91 wastewater samples from 11 states in the USA, where the majority of samples represent Maricopa County, Arizona (USA). With the objective of assessing the viral diversity at a population scale, we undertook a single-nucleotide variant (SNV) analysis on data from 52 samples with >90% SARS-CoV-2 genome coverage of sequence reads, and compared these SNVs with those detected in genomes sequenced from clinical patients. We identified 7973 SNVs, of which 5680 were “novel” SNVs that had not yet been identified in the global clinical-derived data as of 17th June 2020 (the day after our last wastewater sampling date). However, between 17th of June 2020 and 20th November 2020, almost half of the SNVs have since been detected in clinical-derived data. Using the combination of SNVs present in each sample, we identified the more probable lineages present in that sample and compared them to lineages observed in North America prior to our sampling dates. The wastewater-derived SARS-CoV-2 sequence data indicates there were more lineages circulating across the sampled communities than represented in the clinical-derived data. Principal coordinate analyses identified patterns in population structure based on genetic variation within the sequenced samples, with clear trends associated with increased diversity likely due to a higher number of infected individuals relative to the sampling dates. We demonstrate that genetic correlation analysis combined with SNVs analysis using wastewater sampling can provide a comprehensive snapshot of the SARS-CoV-2 genetic population structure circulating within a community, which might not be observed if relying solely on clinical cases.

2020 ◽  
Author(s):  
Brenda G. Díaz ◽  
Maria I. Zucchi ◽  
Alessandro. Alves-Pereira ◽  
Caléo P. de Almeida ◽  
Aline C. L. Moraes ◽  
...  

AbstractAcrocomia (Arecaceae) is a genus widely distributed in tropical and subtropical America that has been achieving economic interest due to the great potential of oil production of some of its species. In particular A. aculeata, due to its vocation to supply oil with the same productive capacity as the oil palm even in areas with water deficit. Although eight species are recognized in the genus, the taxonomic classification based on morphology and geographic distribution is still controversial. Knowledge about the genetic diversity and population structure of the species is limited, which has limited the understanding of the genetic relationships and the orientation of management, conservation, and genetic improvement activities of species of the genus. In the present study, we analyzed the genomic diversity and population structure of seven species of Acrocomia including 117 samples of A. aculeata covering a wide geographical area of occurrence, using single nucleotide Polymorphism (SNP) markers originated from Genotyping By Sequencing (GBS). The genetic structure of the Acrocomia species were partially congruent with the current taxonomic classification based on morphological characters, recovering the separation of the species A. aculeata, A. totai, A. crispa and A. intumescens as distinct taxonomic groups. However, the species A. media was attributed to the cluster of A. aculeata while A. hassleri and A. glauscescens were grouped together with A. totai. The species that showed the highest and lowest genetic diversity were A. totai and A. media, respectively. When analyzed separately, the species A. aculeata showed a strong genetic structure, forming two genetic groups, the first represented mainly by genotypes from Brazil and the second by accessions from Central and North American countries. Greater genetic diversity was found in Brazil when compared to the other countries. Our results on the genetic diversity of the genus are unprecedented, as is also establishes new insights on the genomic relationships between Acrocomia species. It is also the first study to provide a more global view of the genomic diversity of A. aculeata. We also highlight the applicability of genomic data as a reference for future studies on genetic diversity, taxonomy, evolution and phylogeny of the Acrocomia genus, as well as to support strategies for the conservation, exploration and breeding of Acrocomia species and in particular A. aculeata.


Forests ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 681 ◽  
Author(s):  
Huiquan Zheng ◽  
Dehuo Hu ◽  
Ruping Wei ◽  
Shu Yan ◽  
Runhui Wang

Knowledge on population diversity and structure is of fundamental importance for conifer breeding programs. In this study, we concentrated on the development and application of high-density single nucleotide polymorphism (SNP) markers through a high-throughput sequencing technique termed as specific-locus amplified fragment sequencing (SLAF-seq) for the economically important conifer tree species, Chinese fir (Cunninghamia lanceolata). Based on the SLAF-seq, we successfully established a high-density SNP panel consisting of 108,753 genomic SNPs from Chinese fir. This SNP panel facilitated us in gaining insight into the genetic base of the Chinese fir advance breeding population with 221 genotypes for its genetic variation, relationship and diversity, and population structure status. Overall, the present population appears to have considerable genetic variability. Most (94.15%) of the variability was attributed to the genetic differentiation of genotypes, very limited (5.85%) variation occurred on the population (sub-origin set) level. Correspondingly, low FST (0.0285–0.0990) values were seen for the sub-origin sets. When viewing the genetic structure of the population regardless of its sub-origin set feature, the present SNP data opened a new population picture where the advanced Chinese fir breeding population could be divided into four genetic sets, as evidenced by phylogenetic tree and population structure analysis results, albeit some difference in membership of the corresponding set (cluster vs. group). It also suggested that all the genetic sets were admixed clades revealing a complex relationship of the genotypes of this population. With a step wise pruning procedure, we captured a core collection (core 0.650) harboring 143 genotypes that maintains all the allele, diversity, and specific genetic structure of the whole population. This generalist core is valuable for the Chinese fir advanced breeding program and further genetic/genomic studies.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Yuka Torii ◽  
Kazuhiro Horiba ◽  
Satoshi Hayano ◽  
Taichi Kato ◽  
Takako Suzuki ◽  
...  

Abstract Background Kawasaki disease (KD) is an idiopathic systemic vasculitis that predominantly damages coronary arteries in children. Various pathogens have been investigated as triggers for KD, but no definitive causative pathogen has been determined. As KD is diagnosed by symptoms, several days are needed for diagnosis. Therefore, at the time of diagnosis of KD, the pathogen of the trigger may already be diminished. The aim of this study was to explore comprehensive pathogens in the sera at the acute stage of KD using high-throughput sequencing (HTS). Methods Sera of 12 patients at an extremely early stage of KD and 12 controls were investigated. DNA and RNA sequences were read separately using HTS. Sequence data were imported into the home-brew meta-genomic analysis pipeline, PATHDET, to identify the pathogen sequences. Results No RNA virus reads were detected in any KD case except for that of equine infectious anemia, which is known as a contaminant of commercial reverse transcriptase. Concerning DNA viruses, human herpesvirus 6B (HHV-6B, two cases) and Anelloviridae (eight cases) were detected among KD cases as well as controls. Multiple bacterial reads were obtained from KD and controls. Bacteria of the genera Acinetobacter, Pseudomonas, Delfita, Roseomonas, and Rhodocyclaceae appeared to be more common in KD sera than in the controls. Conclusion No single pathogen was identified in serum samples of patients at the acute phase of KD. With multiple bacteria detected in the serum samples, it is difficult to exclude the possibility of contamination; however, it is possible that these bacteria might stimulate the immune system and induce KD.


2017 ◽  
Author(s):  
Gregory L. Owens ◽  
Marco Todesco ◽  
Emily B. M. Drummond ◽  
Sam Yeaman ◽  
Loren H. Rieseberg

AbstractHigh throughput sequencing using the Illumina HiSeq platform is a pervasive and critical molecular ecology resource, and has provided the data underlying many recent advances. A recent study has suggested that ‘index switching’, where reads are misattributed to the wrong sample, may be higher in new versions of the HiSeq platform. This has the potential to invalidate both published and in-progress work across the field. Here, we test for evidence of index switching in an exemplar whole genome shotgun dataset sequenced on both the Illumina HiSeq 2500, which should not have the problem, and the Illumina HiSeq X, which may. We leverage unbalanced heterozygotes, which may be produced by index switching, and ask whether the under-sequenced allele is more likely to be found in other samples in the same lane than expected based on the allele frequency. Although we validate the sensitivity of this method using simulations, we find that neither the HiSeq 2500 nor the HiSeq X have evidence of index switching. This suggests that, thankfully, index switching may not be a ubiquitous problem in HiSeq X sequence data. Lastly, we provide scripts for applying our method so that index switching can be tested for in other datasets.


2021 ◽  
Vol 17 (1) ◽  
pp. e1008678
Author(s):  
Carlos Valiente-Mullor ◽  
Beatriz Beamud ◽  
Iván Ansari ◽  
Carlos Francés-Cuesta ◽  
Neris García-González ◽  
...  

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.


2018 ◽  
Author(s):  
Johanna B. Holm ◽  
Michael S. Humphrys ◽  
Courtney K. Robinson ◽  
Matthew L. Settles ◽  
Sandra Ott ◽  
...  

AbstractAmplification, sequencing and analysis of the 16S rRNA gene affords characterization of microbial community composition. As this tool has become more popular and amplicon-sequencing applications have grown in the total number of samples, growth in sample multiplexing is becoming necessary while maintaining high sequence quality and sequencing depth. Here, modifications to the Illumina HiSeq 2500 platform are described which produce greater multiplexing capabilities and 300 bp paired-end reads of higher quality than produced by the current Illumina MiSeq platform. To improve the feasibility and flexibility of this method, a 2-Step PCR amplification protocol is also described that allows for targeting of different amplicon regions, thus improving amplification success from low bacterial bioburden samples.ImportanceAmplicon sequencing has become a popular and widespread tool for surveying microbial communities. Lower overall costs associated with high throughput sequencing have made it a widely-adopted approach, especially for projects which necessitate sample multiplexing to eliminate batch effect and reduced time to acquire data. The method for amplicon sequencing on the Illumina HiSeq 2500 platform described here provides improved multiplexing capabilities while simultaneously producing greater quality sequence data and lower per sample cost relative to the Illumina MiSeq platform, without sacrificing amplicon length. To make this method more flexible to various amplicon targeted regions as well as improve amplification from low biomass samples, we also present and validate a 2-Step PCR library preparation method.


2018 ◽  
Author(s):  
Devika Ganesamoorthy ◽  
Minh Duc Cao ◽  
Tania Duarte ◽  
Wenhan Chen ◽  
Lachlan Coin

ABSTRACTBackgroundTandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations.MethodsWe report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.ResultsWe used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.ConclusionsThe novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.


2017 ◽  
Author(s):  
Darrell O. Ricke ◽  
Anna Shcherbina ◽  
Adam Michaleas ◽  
Philip Fremont-Smith

AbstractHigh throughput DNA sequencing technologies enable improved characterization of forensic DNA samples enabling greater insights into DNA contributor(s). Current DNA forensics techniques rely upon allele sizing of short tandem repeats by capillary electrophoresis. High throughput sequencing enables forensic sample characterizations for large numbers of single nucleotide polymorphism loci. The slowest computational component of the DNA forensics analysis pipeline is the characterization of raw sequence data. This paper optimizes the SNP calling module of the DNA analysis pipeline with runtime results that scale linearly with the number of HTS sequences (patent pending)[1]. GrigoraSNPs can analyze 100 million reads in less than 5 minutes using 3 threads on a 4.0 GHz Intel i7-6700K laptop CPU.


2020 ◽  
Author(s):  
Justin P. Shaffer ◽  
Clarisse Marotz ◽  
Pedro Belda-Ferre ◽  
Cameron Martino ◽  
Stephen Wandro ◽  
...  

AbstractOne goal among microbial ecology researchers is to capture the maximum amount of information from all organisms in a sample. The recent COVID-19 pandemic, caused by the RNA virus SARS-CoV-2, has highlighted a gap in traditional DNA-based protocols, including the high-throughput methods we previously established as field standards. To enable simultaneous SARS-CoV-2 and microbial community profiling, we compare the relative performance of two total nucleic acid extraction protocols and our previously benchmarked protocol. We included a diverse panel of environmental and host-associated sample types, including body sites commonly swabbed for COVID-19 testing. Here we present results comparing the cost, processing time, DNA and RNA yield, microbial community composition, limit of detection, and well-to-well contamination, between these protocols.Accession numbersRaw sequence data were deposited at the European Nucleotide Archive (accession#: ERP124610) and raw and processed data are available at Qiita (Study ID: 12201). All processing and analysis code is available on GitHub (github.com/justinshaffer/Extraction_test_MagMAX).Methods summaryTo allow for downstream applications involving RNA-based organisms such as SARS-CoV-2, we compared the two extraction protocols designed to extract DNA and RNA against our previously established protocol for extracting only DNA for microbial community analyses. Across 10 diverse sample types, one of the two protocols was equivalent or better than our established DNA-based protocol. Our conclusion is based on per-sample comparisons of DNA and RNA yield, the number of quality sequences generated, microbial community alpha- and beta-diversity and taxonomic composition, the limit of detection, and extent of well-to-well contamination.


Sign in / Sign up

Export Citation Format

Share Document