scholarly journals Whole-genome sequencing and de novo assembly of a 2019 novel coronavirus (SARS-CoV-2) strain isolated in Vietnam

2020 ◽  
Vol 18 (2) ◽  
pp. 197-208
Author(s):  
Le Tung Lam ◽  
Nguyen Trung Hieu ◽  
Nguyen Hong Trang ◽  
Ho Thi Thuong ◽  
Tran Huyen Linh ◽  
...  

The pandemic COVID-19 caused by the virus SARS-CoV-2 has devastated countries worldwide, infecting more than 4.5 million people and leading to more than 300,000 deaths as of May 16th, 2020. Whole-genome sequencing (WGS) is an effective tool to monitor emerging strains and provide information for intervention, thus help to inform outbreak control decisions. Here, we reported the first effort to sequence and de novo assemble the whole genome of SARS-CoV-2 using PacBio’s SMRT sequencing technology in Vietnam. We also presented the annotation results and a brief analysis of the variants found in our SARS-CoV-2 strain, which was isolated from a Vietnamese patient. The sequencing was successfully completed and de novo assembled in less than 30 hours, resulting in one contig with no gap and a length of 29,766 bp. All detected variants as compared to the NCBI reference were highly accurate, as confirmed by Sanger sequencing. The results have shown the potential of long read sequencing to provide high quality WGS data to support public health responses and advance understanding of this and future pandemics.

2020 ◽  
Author(s):  
Le Tung Lam ◽  
Nguyen Trung Hieu ◽  
Nguyen Hong Trang ◽  
Ho Thi Thuong ◽  
Tran Huyen Linh ◽  
...  

ABSTRACTThe pandemic COVID-19 caused by the zoonotic virus SARS-CoV-2 has devastated countries worldwide, infecting more than 4.5 million people and leading to more than 300,000 deaths. Whole genome sequencing (WGS) is an effective tool to monitor emerging strains and provide information for intervention, thus help to inform outbreak control decisions. Here, we reported the first effort to sequence and de novo assemble the whole genome of SARS-CoV-2 using PacBio’s SMRT sequencing technology in Vietnam. We also presented the annotation results and a brief analysis of the variants found in our SARS-CoV-2 strain, which was isolated from a Vietnamese patient. The sequencing was successfully completed and de novo assembled in less than 30 hours, resulting in one contig with no gap and a length of 29,766 bp. All detected variants as compared to the NCBI reference were highly accurate as confirmed by Sanger sequencing. The results have shown the potential of long read sequencing to provide high quality WGS data to support public health responses, and advance understanding of this and future pandemics.


2018 ◽  
Vol 64 (3) ◽  
pp. 191-197 ◽  
Author(s):  
Takeshi Mizuguchi ◽  
Tomoko Toyota ◽  
Hiroaki Adachi ◽  
Noriko Miyake ◽  
Naomichi Matsumoto ◽  
...  

2018 ◽  
Author(s):  
Alba Sanchis-Juan ◽  
Jonathan Stephens ◽  
Courtney E French ◽  
Nicholas Gleadall ◽  
Karyn Mégy ◽  
...  

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.


2018 ◽  
Author(s):  
Benjamin A. Neely ◽  
Debra L. Ellisor ◽  
W. Clay Davis

AbstractBackgroundThe last decade has witnessed dramatic improvements in whole-genome sequencing capabilities coupled to drastically decreased costs, leading to an inundation of high-quality de novo genomes. For this reason, continued development of genome quality metrics is imperative. The current study utilized the recently updated Atlantic bottlenose dolphin (Tursiops truncatus) genome and annotation to evaluate a proteomics-based metric of genome accuracy.ResultsProteomic analysis of six tissues provided experimental confirmation of 10 402 proteins from 4 711 protein groups, almost 1/3 of the possible predicted proteins in the genome. There was an increased median molecular weight and number of identified peptides per protein using the current T. truncatus annotation versus the previous annotation. Identification of larger proteins with more identified peptides implied reduced database fragmentation and improved gene annotation accuracy. A metric is proposed, NP10, that attempts to capture this quality improvement. When using the new T. truncatus genome there was a 21 % improvement in NP10. This metric was further demonstrated by using a publicly available proteomic data set to compare human genome annotations from 2004, 2013 and 2016, which had a 33 % improvement in NP10.ConclusionsThese results demonstrate that new whole-genome sequencing techniques can rapidly generate high quality de novo genome assemblies and emphasizes the speed of advancing bioanalytical measurements in a non-model organism. Moreover, proteomics may be a useful metrological tool to benchmark genome accuracy, though there is a need for reference proteomic datasets to facilitate this utility in new de novo and existing genomes.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Brent S. Pedersen ◽  
Joe M. Brown ◽  
Harriet Dashnow ◽  
Amelia D. Wallace ◽  
Matt Velinder ◽  
...  

AbstractIn studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0253440
Author(s):  
Samantha Gunasekera ◽  
Sam Abraham ◽  
Marc Stegger ◽  
Stanley Pang ◽  
Penghao Wang ◽  
...  

Whole-genome sequencing is essential to many facets of infectious disease research. However, technical limitations such as bias in coverage and tagmentation, and difficulties characterising genomic regions with extreme GC content have created significant obstacles in its use. Illumina has claimed that the recently released DNA Prep library preparation kit, formerly known as Nextera Flex, overcomes some of these limitations. This study aimed to assess bias in coverage, tagmentation, GC content, average fragment size distribution, and de novo assembly quality using both the Nextera XT and DNA Prep kits from Illumina. When performing whole-genome sequencing on Escherichia coli and where coverage bias is the main concern, the DNA Prep kit may provide higher quality results; though de novo assembly quality, tagmentation bias and GC content related bias are unlikely to improve. Based on these results, laboratories with existing workflows based on Nextera XT would see minor benefits in transitioning to the DNA Prep kit if they were primarily studying organisms with neutral GC content.


Sign in / Sign up

Export Citation Format

Share Document