scholarly journals Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation

2020 ◽  
Author(s):  
Caroline Charre ◽  
Christophe Ginevra ◽  
Marina Sabatier ◽  
Hadrien Regue ◽  
Grégory Destras ◽  
...  

AbstractSince the beginning of the COVID-19 outbreak, SARS-CoV-2 whole-genome sequencing (WGS) has been performed at unprecedented rate worldwide with the use of very diverse Next Generation Sequencing (NGS) methods. Herein, we compare the performance of four NGS-based approaches for SARS-CoV-2 WGS. Twenty four clinical respiratory samples with a large scale of Ct values (from 10.7 to 33.9) were sequenced with four methods. Three used Illumina sequencing: an in-house metagenomic NGS (mNGS) protocol and two newly commercialized kits including a hybridization capture method developed by Illumina (DNA Prep with Enrichment kit and Respiratory Virus Oligo Panel, RVOP) and an amplicon sequencing method developed by Paragon Genomics (CleanPlex SARS-CoV-2 kit). We also evaluated the widely used amplicon sequencing protocol developed by ARTIC Network and combined with Oxford Nanopore Technologies (ONT) sequencing. All four methods yielded near-complete genomes (>99%) for high viral loads samples, with mNGS and RVOP producing the most complete genomes. For mid viral loads, 2/8 and 1/8 genomes were incomplete (<99%) with mNGS and both CleanPlex and RVOP, respectively. For low viral loads (Ct ≥25), amplicon-based enrichment methods were the most sensitive techniques yielding complete genomes for 7/8 samples. All methods were highly concordant in terms of identity in complete consensus sequence. Just one mismatch in two samples was observed in CleanPlex vs the other methods, due to the dedicated bioinformatics pipeline setting a high threshold to call SNP compared to reference sequence. Importantly, all methods correctly identified a newly observed 34-nt deletion in ORF6 but required specific bioinformatic validation for RVOP. Finally, as a major warning for targeted techniques, a default of coverage in any given region of the genome should alert to a potential rearrangement or a SNP in primer annealing or probe-hybridizing regions and would require regular updates of the technique according to SARS-CoV-2 evolution.

2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Caroline Charre ◽  
Christophe Ginevra ◽  
Marina Sabatier ◽  
Hadrien Regue ◽  
Grégory Destras ◽  
...  

Abstract Since the beginning of the COVID-19 outbreak, SARS-CoV-2 whole-genome sequencing (WGS) has been performed at unprecedented rate worldwide with the use of very diverse Next-Generation Sequencing (NGS) methods. Herein, we compare the performance of four NGS-based approaches for SARS-CoV-2 WGS. Twenty-four clinical respiratory samples with a large scale of Ct values (from 10.7 to 33.9) were sequenced with four methods. Three used Illumina sequencing: an in-house metagenomic NGS (mNGS) protocol and two newly commercialised kits including a hybridisation capture method developed by Illumina (DNA Prep with Enrichment kit and Respiratory Virus Oligo Panel, RVOP), and an amplicon sequencing method developed by Paragon Genomics (CleanPlex SARS-CoV-2 kit). We also evaluated the widely used amplicon sequencing protocol developed by ARTIC Network and combined with Oxford Nanopore Technologies (ONT) sequencing. All four methods yielded near-complete genomes (&gt;99%) for high viral loads samples (n = 8), with mNGS and RVOP producing the most complete genomes. For mid viral loads (Ct 20–25), amplicon-based enrichment methods led to genome coverage &gt;99 per cent for all samples while 1/8 sample sequenced with RVOP and 2/8 samples sequenced with mNGS had a genome coverage below 99 per cent. For low viral loads (Ct ≥25), amplicon-based enrichment methods were the most sensitive techniques. All methods were highly concordant in terms of identity in complete consensus sequence. Just one mismatch in three samples was observed in CleanPlex vs the other methods, due to the dedicated bioinformatics pipeline setting a high threshold to call SNP compared to reference sequence. Importantly, all methods correctly identified a newly observed 34nt-deletion in ORF6 but required specific bioinformatic validation for RVOP. Finally, as a major warning for targeted techniques, a loss of coverage in any given region of the genome should alert to a potential rearrangement or a SNP in primer-annealing or probe-hybridizing regions and would require further validation using unbiased metagenomic sequencing.


2019 ◽  
Author(s):  
Judit Szarvas ◽  
Johanne Ahrenfeldt ◽  
Jose Luis Bellod Cisneros ◽  
Martin Christen Frølund Thomsen ◽  
Frank M. Aarestrup ◽  
...  

AbstractPublic health authorities whole-genome sequence thousands of pathogenic isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and need for real-time results.We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. To decrease the computational burden, a two level clustering strategy is employed. The data is first divided into sets by matching each isolate to a closely related reference genome. The reads then are aligned to the reference to gain a consensus sequence and SNP based genetic distance is calculated between the sequences in each set. Isolates are clustered together with a threshold of 10 SNPs. Finally, phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are placed on a clade with the cluster representative sequence. The method was benchmarked and found to be accurate in grouping outbreak strains together, while discriminating from non-outbreak strains.The pipeline was applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating the phylogenetic trees as needed. It has so far placed more than 100,000 isolates into phylogenies, and has been able to keep up with the daily release of data. The trees are continuously published on https://cge.cbs.dtu.dk/services/Evergreen


Blood ◽  
2005 ◽  
Vol 106 (11) ◽  
pp. 605-605
Author(s):  
Marco A. Marra ◽  
Martin Krzywinski ◽  
Readman Chiu ◽  
Matthew Field ◽  
Inanc Birol ◽  
...  

Abstract With the aim of identifying and sequencing mutations in follicular lymphoma genomes, we have begun a project to generate at least 24 deeply redundant sequence-ready Bacterial Artificial Clone (BAC) - based whole genome maps, each from a different individual’s lymphoma. BAC-array CGH and Affymetrix whole-genome sampling assays (WGSA) will be used along with the mapping data to identify genomic amplifications and losses in the lymphomas. Results from the mapping and array studies will be used to prioritize BAC clones for sequence analysis. Because each map will span essentially the entire genome of the corresponding lymphoma, we anticipate that essentially all regions of each tumor genome will be represented in easily sequenced BAC clones. This approach facilitates targeted sequencing of genomic regions of interest, including those containing genes relevant to cancer or harboring amplifications or deletions. Our mapping strategy hinges on the successful creation of deeply redundant high quality BAC libraries from primary lymphomas and large scale high throughput restriction enzyme fingerprinting of individual BACs with a version of the technology we used to map the human, mouse, rat and other genomes. The effort is large-scale, and will result in the generation of at least 2.5 million fingerprinted BAC clones over the next three years. Using the fingerprints, we will align the BACs to the reference human genome to assess genome coverage and to identify candidate genome rearrangements. In parallel, we will assemble the fingerprints into genome maps, looking for larger-scale genome variations between the lymphoma maps and the reference genome sequence. To test the feasibility of our approach, we obtained two restriction digest fingerprints from each of 140,000 individual BAC clones. BACs were sampled from a 7-fold redundant BAC library that had been created from genomic DNA purified from a primary follicular lymphoma sample. The fingerprints are being assembled into a clone map with the intent of reconstructing the entire tumor genome. 90,377 fingerprinted clones with unambiguous single alignments to the reference sequence were automatically assembled into 15,538 contigs. Subsequent rounds of semi-automatic contig merging further reduced the number of contigs to 5,433. Only 1,241 clones remained unassembled. We anchored the tumor genome map to the reference human genome sequence by aligning the clone fingerprints to the restriction map computed from the reference sequence assembly. As a result of this, we identified a BAC that captured the canonical t(14;18) translocation characteristic of follicular lymphomas. We sequenced this BAC and confirmed that it contains the expected translocation. Almost 2.6 gigabases (~91%) of the reference genome are represented in the evolving map, with an additional 50,000 clone fingerprints awaiting incorporation into the map assembly. Among these are repeat-rich and other clones that may well harbor genome rearrangements. Additional prioritization of sequencing targets will be undertaken when map construction and analysis of genome copy number alterations are complete.


2019 ◽  
Vol 109 (3) ◽  
pp. 488-497 ◽  
Author(s):  
Sebastien Massart ◽  
Michela Chiumenti ◽  
Kris De Jonghe ◽  
Rachel Glover ◽  
Annelies Haegeman ◽  
...  

Recent developments in high-throughput sequencing (HTS), also called next-generation sequencing (NGS), technologies and bioinformatics have drastically changed research on viral pathogens and spurred growing interest in the field of virus diagnostics. However, the reliability of HTS-based virus detection protocols must be evaluated before adopting them for diagnostics. Many different bioinformatics algorithms aimed at detecting viruses in HTS data have been reported but little attention has been paid thus far to their sensitivity and reliability for diagnostic purposes. Therefore, we compared the ability of 21 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 12 plant viruses through a double-blind large-scale performance test using 10 datasets of 21- to 24-nucleotide small RNA (sRNA) sequences from three different infected plants. The sensitivity of virus detection ranged between 35 and 100% among participants, with a marked negative effect when sequence depth decreased. The false-positive detection rate was very low and mainly related to the identification of host genome-integrated viral sequences or misinterpretation of the results. Reproducibility was high (91.6%). This work revealed the key influence of bioinformatics strategies for the sensitive detection of viruses in HTS sRNA datasets and, more specifically (i) the difficulty in detecting viral agents when they are novel or their sRNA abundance is low, (ii) the influence of key parameters at both assembly and annotation steps, (iii) the importance of completeness of reference sequence databases, and (iv) the significant level of scientific expertise needed when interpreting pipeline results. Overall, this work underlines key parameters and proposes recommendations for reliable sRNA-based detection of known and unknown viruses.


2018 ◽  
Author(s):  
Qian Liu ◽  
Daniela C. Georgieva ◽  
Dieter Egli ◽  
Kai Wang

AbstractBackgroundRecent advances in single-molecule sequencing techniques, such as Nanopore sequencing, improved read length, increased sequencing throughput, and enabled direct detection of DNA modifications through the analysis of raw signals. These DNA modifications include naturally occurring modifications such as DNA methylations, as well as modifications that are introduced by DNA damage or through synthetic modifications to one of the four standard nucleotides.MethodsTo improve the performance of detecting DNA modifications, especially synthetically introduced modifications, we developed a novel computational tool called NanoMod. NanoMod takes raw signal data on a pair of DNA samples with and without modified bases, extracts signal intensities, performs base error correction based on a reference sequence, and then identifies bases with modifications by comparing the distribution of raw signals between two samples, while taking into account of the effects of neighboring bases on modified bases (“neighborhood effects”).ResultsWe evaluated NanoMod on simulation data sets, based on different types of modifications and different magnitudes of neighborhood effects, and found that NanoMod outperformed other methods in identifying known modified bases. Additionally, we demonstrated superior performance of NanoMod on an E. coli data set with 5mC (5-methylcytosine) modifications.ConclusionsIn summary, NanoMod is a flexible tool to detect DNA modifications with single-base resolution from raw signals in Nanopore sequencing, and will greatly facilitate large-scale functional genomics experiments in the future that use modified nucleotides.


2016 ◽  
Vol 94 (suppl_5) ◽  
pp. 146-146
Author(s):  
D. M. Bickhart ◽  
L. Xu ◽  
J. L. Hutchison ◽  
J. B. Cole ◽  
D. J. Null ◽  
...  

2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Jacqueline King ◽  
Anne Pohlmann ◽  
Kamila Dziadek ◽  
Martin Beer ◽  
Kerstin Wernike

Abstract Background As a global ruminant pathogen, bovine viral diarrhea virus (BVDV) is responsible for the disease Bovine Viral Diarrhea with a variety of clinical presentations and severe economic losses worldwide. Classified within the Pestivirus genus, the species Pestivirus A and B (syn. BVDV-1, BVDV-2) are genetically differentiated into 21 BVDV-1 and four BVDV-2 subtypes. Commonly, the 5’ untranslated region and the Npro protein are utilized for subtyping. However, the genetic variability of BVDV leads to limitations in former studies analyzing genome fragments in comparison to a full-genome evaluation. Results To enable rapid and accessible whole-genome sequencing of both BVDV-1 and BVDV-2 strains, nanopore sequencing of twelve representative BVDV samples was performed on amplicons derived through a tiling PCR procedure. Covering a multitude of subtypes (1b, 1d, 1f, 2a, 2c), sample matrices (plasma, EDTA blood and ear notch), viral loads (Cq-values 19–32) and species (cattle and sheep), ten of the twelve samples produced whole genomes, with two low titre samples presenting 96 % genome coverage. Conclusions Further phylogenetic analysis of the novel sequences emphasizes the necessity of whole-genome sequencing to identify novel strains and supplement lacking sequence information in public repositories. The proposed amplicon-based sequencing protocol allows rapid, inexpensive and accessible obtainment of complete BVDV genomes.


Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1379
Author(s):  
Sandra Barroso-Arévalo ◽  
Belén Rivera ◽  
Lucas Domínguez ◽  
José M. Sánchez-Vizcaíno

Natural SARS-CoV-2 infection in pets has been widely documented during the last year. Although the majority of reports suggested that dogs’ susceptibility to the infection is low, little is known about viral pathogenicity and transmissibility in the case of variants of concern, such as B.1.1.7 in this species. Here, as part of a large-scale study on SARS-CoV-2 prevalence in pets in Spain, we have detected the B.1.1.7 variant of concern (VOC) in a dog whose owners were infected with SARS-CoV-2. The animal did not present any symptoms, but viral loads were high in the nasal and rectal swabs. In addition, viral isolation was possible from both swabs, demonstrating that the dog was shedding infectious virus. Seroconversion occurred 23 days after the first sampling. This study documents the first detection of B.1.1.7 VOC in a dog in Spain and emphasizes the importance of performing active surveillance and genomic investigation on infected animals.


Sign in / Sign up

Export Citation Format

Share Document