scholarly journals Benchmarking different approaches for Norovirus genome assembly in metagenome samples

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Azahara Fuentes-Trillo ◽  
Carolina Monzó ◽  
Iris Manzano ◽  
Cristina Santiso-Bellón ◽  
Juliana da Silva Ribeiro de Andrade ◽  
...  

Abstract Background Genome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains. Furthermore, there is no standard method for obtaining whole-genome sequences in non-related patients. After polyA RNA isolation and sequencing in eight patients with acute gastroenteritis, we evaluated two de Bruijn graph assemblers (SPAdes and MEGAHIT), combined with four different and common pre-assembly strategies, and compared those yielding whole genome Norovirus contigs. Results Reference-genome guided strategies with both host and target virus did not present any advantages compared to the assembly of non-filtered data in the case of SPAdes, and in the case of MEGAHIT, only host genome filtering presented improvements. MEGAHIT performed better than SPAdes in most samples, reaching complete genome sequences in most of them for all the strategies employed. Read binning with CD-HIT improved assembly when paired with different analysis strategies, and more notably in the case of SPAdes. Conclusions Not all metagenome assemblies are equal and the choice in the workflow depends on the species studied and the prior steps to analysis. We may need different approaches even for samples treated equally due to the presence of high intra host variability. We tested and compared different workflows for the accurate assembly of Norovirus genomes and established their assembly capacities for this purpose.

Author(s):  
Yuanchao Liu ◽  
Longhua Huang ◽  
Huiping Hu ◽  
Manjun Cai ◽  
Xiaowei Liang ◽  
...  

Abstract Ganoderma leucocontextum, a newly discovered species of Ganodermataceae in China, has diverse pharmacological activities. G. leucocontextum was widely cultivated in southwest China, but the systematic genetic study has been impeded by the lack of a reference genome. Herein, we present the first whole-genome assembly of G. leucocontextum based on the Illumina and Nanopore platform from high-quality DNA extracted from a monokaryon strain (DH-8). The generated genome was 50.05 Mb in size with a N50 scaffold size of 3.06 Mb, 78,206 coding sequences and 13,390 putative genes. Genome completeness was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool, which identified 96.55% of the 280 Fungi BUSCO genes. Furthermore, differences in functional genes of secondary metabolites (terpenoids) were analyzed between G. leucocontextum and G. lucidum. G. leucocontextum has more genes related to terpenoids synthesis compared to G. lucidum, which may be one of the reasons why they exhibit different biological activities. This is the first genome assembly and annotation for G. leucocontextum, which would enrich the toolbox for biological and genetic studies in G. leucocontextum.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Gokhan Yavas ◽  
Huixiao Hong ◽  
Wenming Xiao

Abstract Background Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. Results To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. Conclusions The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.


Author(s):  
Jingxuan Chen ◽  
David J. Garfinkel ◽  
Casey M. Bergman

Here, we report a long-read genome assembly for Saccharomyces uvarum strain CBS 7001 based on PacBio whole-genome shotgun sequence data. Our assembly provides an improved reference genome for an important yeast in the Saccharomyces sensu stricto clade.


2019 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
Julia Hillung ◽  
María Alma Bracho ◽  
Javier Pons Tamarit ◽  
Fernando González-Candelas

Abstract Next-generation sequencing (NGS) is a technique that can capture the variability of viral populations in transmission studies. The conventional sample preparation for NGS, based on amplicons, is a potential source of errors, derived from the variable affinity of specific primers for different viral variants and from irregular DNA polymerase efficiency. In this context, we propose a more reliable method for viral whole genome sample preparation, starting from nucleic acids obtained and stored with conventional procedures. Our goal was to obtain complete hepatitis C virus (HCV) genome sequences to subsequently perform extensive phylogenetic analyses. Additionally, we aimed to test the effectiveness of nuclease treatment used to remove contaminating host DNA. Nucleic acids were obtained from almost cell-free blood plasma of HCV-infected patients. As a source for Illumina library preparation, double-stranded cDNA was generated using random primers. The HCV genome was not amplified before library preparation, avoiding possible biases derived from unequal copying. To get rid of possible host contaminants in the samples, a DNase treatment step was added. Libraries were paired-end sequenced on the Illumina platform using MiSeq reagent kit v3. After conservative filtering of contaminant human reads by alignment with the human reference genome using Burrows-Wheeler Aligner (BWA), the remaining reads were mapped to the HCV reference genome using BWA. Primary maximum likelihood phylogenetic analyses were performed using ClustalW and IQTREE to infer the phylogenetic relationships of the sequenced samples in the context of complete genome sequences of the same genotype. NGS sample preparation method of HCV from blood plasma was established. Complete genome sequences of HCV could be obtained with variable coverage depending on the viral load of plasma samples. No significant reduction of host DNA proportion in DNase treated samples in comparison to the controls was observed. The new sequences clustered within the Los Alamos National Laboratory database-deposited HCV subtype 4d samples. The method can be used to obtain full-length sequences of HCV from nucleic acid samples not previously planned for NGS. No improvement was observed when DNase pre-treatment of nucleic acids extracted from blood plasma was performed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gehendra Bhattarai ◽  
Ainong Shi ◽  
Devi R. Kandel ◽  
Nora Solís-Gracia ◽  
Jorge Alberto da Silva ◽  
...  

AbstractThe availability of well-assembled genome sequences and reduced sequencing costs have enabled the resequencing of many additional accessions in several crops, thus facilitating the rapid discovery and development of simple sequence repeat (SSR) markers. Although the genome sequence of inbred spinach line Sp75 is available, previous efforts have resulted in a limited number of useful SSR markers. Identification of additional polymorphic SSR markers will support genetics and breeding research in spinach. This study aimed to use the available genomic resources to mine and catalog a large number of polymorphic SSR markers. A search for SSR loci on six chromosome sequences of spinach line Sp75 using GMATA identified a total of 42,155 loci with repeat motifs of two to six nucleotides in the Sp75 reference genome. Whole-genome sequences (30x) of additional 21 accessions were aligned against the chromosome sequences of the reference genome and in silico genotyped using the HipSTR program by comparing and counting repeat numbers variation across the SSR loci among the accessions. The HipSTR program generated SSR genotype data were filtered for monomorphic and high missing loci, and a final set of the 5986 polymorphic SSR loci were identified. The polymorphic SSR loci were present at a density of 12.9 SSRs/Mb and were physically mapped. Out of 36 randomly selected SSR loci for validation, two failed to amplify, while the remaining were all polymorphic in a set of 48 spinach accessions from 34 countries. Genetic diversity analysis performed using the SSRs allele score data on the 48 spinach accessions showed three main population groups. This strategy to mine and develop polymorphic SSR markers by a comparative analysis of the genome sequences of multiple accessions and computational genotyping of the candidate SSR loci eliminates the need for laborious experimental screening. Our approach increased the efficiency of discovering a large set of novel polymorphic SSR markers, as demonstrated in this report.


2020 ◽  
Author(s):  
Joseph H. Collins ◽  
Kevin W. Keating ◽  
Trent R. Jones ◽  
Shravani Balaji ◽  
Celeste B. Marsan ◽  
...  

ABSTRACTYeast genomes can be assembled from sequencing data, but genome integrations and episomal plasmids often fail to be resolved with accuracy, completeness, and contiguity. Resolution of these features is critical for many synthetic biology applications, including strain quality control and identifying engineering in unknown samples. Here, we report an integrated workflow, named Prymetime, that uses sequencing reads from inexpensive NGS platforms, assembly and error correction software, and a list of synthetic biology parts to achieve accurate whole genome sequences of yeasts with engineering annotated. To build the workflow, we first determined which sequencing methods and software packages returned an accurate, complete, and contiguous genome of an engineered S. cerevisiae strain with two similar plasmids and an integrated pathway. We then developed a sequence feature annotation step that labels synthetic biology parts from a standard list of yeast engineering sequences or from a custom sequence list. We validated the workflow by sequencing a collection of 15 engineered yeasts built from different parent S. cerevisiae and nonconventional yeast strains. We show that each integrated pathway and episomal plasmid can be correctly assembled and annotated, even in strains that have part repeats and multiple similar plasmids. Interestingly, Prymetime was able to identify deletions and unintended integrations that were subsequently confirmed by other methods. Furthermore, the whole genomes are accurate, complete, and contiguous. To illustrate this clearly, we used a publicly available S. cerevisiae CEN.PK113 reference genome and the accompanying reads to show that a Prymetime genome assembly is equivalent to the reference using several standard metrics. Finally, we used Prymetime to resequence the nonconventional yeasts Y. lipolytica Po1f and K. phaffii CBS 7435, producing an improved genome assembly for each strain. Thus, our workflow can achieve accurate, complete, and contiguous whole genome sequences of yeast strains before and after engineering. Therefore, Prymetime enables NGS-based strain quality control through assembly and identification of engineering features.


Viruses ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 1017
Author(s):  
Hirohisa Mekata ◽  
Tomohiro Okagawa ◽  
Satoru Konnai ◽  
Takayuki Miyazawa

Bovine foamy virus (BFV) is a member of the foamy virus family in cattle. Information on the epidemiology, transmission routes, and whole-genome sequences of BFV is still limited. To understand the characteristics of BFV, this study included a molecular survey in Japan and the determination of the whole-genome sequences of 30 BFV isolates. A total of 30 (3.4%, 30/884) cattle were infected with BFV according to PCR analysis. Cattle less than 48 months old were scarcely infected with this virus, and older animals had a significantly higher rate of infection. To reveal the possibility of vertical transmission, we additionally surveyed 77 pairs of dams and 3-month-old calves in a farm already confirmed to have BFV. We confirmed that one of the calves born from a dam with BFV was infected. Phylogenetic analyses revealed that a novel genotype was spread in Japan. In conclusion, the prevalence of BFV in Japan is relatively low and three genotypes, including a novel genotype, are spread in Japan.


Sign in / Sign up

Export Citation Format

Share Document