scholarly journals Highly Contiguous Genome Resource of Colletotrichum fructicola Generated Using Long-Read Sequencing

2020 ◽  
Vol 33 (6) ◽  
pp. 790-793 ◽  
Author(s):  
Xiaofei Liang ◽  
Mengyu Cao ◽  
Sen Li ◽  
Yuanyuan Kong ◽  
Jeffrey A. Rollins ◽  
...  

Colletotrichum fructicola is a plant-pathogenic fungus with a broad host range. It causes significant losses to important crops, including apple, pear, strawberry, and other Rosaceae and non-Rosaceae species. To date, two short read–based C. fructicola genomes are publicly available, but both are fragmented. In this study, we re-sequenced the genome of C. fructicola using nanopore long-read technology and refined the assembly with Hi-C map data. The resulting high-quality assembly is an important resource for further comparative and experimental studies with C. fructicola.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Seth Commichaux ◽  
Kiran Javkar ◽  
Padmini Ramachandran ◽  
Niranjan Nagarajan ◽  
Denis Bertrand ◽  
...  

Abstract Background Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. Results We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. Conclusion The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.


2014 ◽  
Vol 99 ◽  
pp. 27-34 ◽  
Author(s):  
Jiujun Cheng ◽  
Lee Pinnell ◽  
Katja Engel ◽  
Josh D. Neufeld ◽  
Trevor C. Charles

2019 ◽  
Author(s):  
Carolyn Graham-Taylor ◽  
Lars G Kamphuis ◽  
Mark Derbyshire

Abstract Background The broad host range pathogen Sclerotinia sclerotiorum infects over 400 plant species and causes substantial yield losses in crops worldwide. Secondary metabolites are known to play important roles in the virulence of plant pathogens, but little is known about the secondary metabolite repertoire of S. sclerotiorum. In this study, we predicted secondary metabolite biosynthetic gene clusters in the genome of S. sclerotiorum and analysed their expression during infection of Brassica napus using an existing transcriptome data set. We also investigated their sequence diversity among a panel of 25 previously published S. sclerotiorum isolate genomes.Results We identified 80 putative secondary metabolite clusters. Over half of the clusters contained at least three transcriptionally coregulated genes. Comparative genomics revealed clusters homologous to clusters in the closely related plant pathogen Botrytis cinerea for production of carotenoids, hydroxamate siderophores, DHN melanin and botcinic acid. We also identified putative phytotoxin clusters that can potentially produce the polyketide sclerin and an epipolythiodioxopiperazine. Secondary metabolite clusters were enriched in subtelomeric genomic regions, and those containing paralogues showed a particularly strong association with repeats. The positional bias we identified was borne out by intraspecific comparisons that revealed putative secondary metabolite genes suffered more presence / absence polymorphisms and exhibited a significantly higher sequence diversity than other genes.Conclusions These data suggest that S. sclerotiorum produces numerous secondary metabolites during plant infection and that their gene clusters undergo enhanced rates of mutation, duplication and recombination in subtelomeric regions. The microevolutionary regimes leading to S. sclerotiorum secondary metabolite diversity have yet to be elucidated. Several potential phytotoxins documented in this study provide the basis for future functional analyses.


mBio ◽  
2016 ◽  
Vol 7 (1) ◽  
Author(s):  
Yu-Chih Tsai ◽  
Sean Conlan ◽  
Clayton Deming ◽  
Julia A. Segre ◽  
Heidi H. Kong ◽  
...  

ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. IMPORTANCE The species comprising a microbial community are often difficult to deconvolute due to technical limitations inherent to most short-read sequencing technologies. Here, we leverage new advances in sequencing technology, single-molecule sequencing, to significantly improve reconstruction of a complex human skin microbial community. With this long-read technology, we were able to reconstruct and annotate a closed, high-quality genome of a previously uncharacterized skin species. We demonstrate that hybrid approaches with short-read technology are sufficiently powerful to reconstruct even single-nucleotide polymorphism level variation of species in this a community.


Author(s):  
Leho Tedersoo ◽  
Mads Albertsen ◽  
Sten Anslan ◽  
Benjamin Callahan

Short-read, high-throughput sequencing (HTS) methods have yielded numerous important insights into microbial ecology and function. Yet, in many instances short-read HTS techniques are suboptimal, for example by providing insufficient phylogenetic resolution or low integrity of assembled genomes. Single-molecule and synthetic long-read (SLR) HTS methods have successfully ameliorated these limitations. In addition, nanopore sequencing has generated a number of unique analysis opportunities such as rapid molecular diagnostics and direct RNA sequencing, and both PacBio and nanopore sequencing support detection of epigenetic modifications. Although initially suffering from relatively low sequence quality, recent advances have greatly improved the accuracy of long read sequencing technologies. In spite of great technological progress in recent years, the long-read HTS methods (PacBio and nanopore sequencing) are still relatively costly, require large amounts of high-quality starting material, and commonly need specific solutions in various analysis steps. Despite these challenges, long-read sequencing technologies offer high-quality, cutting-edge alternatives for testing hypotheses about microbiome structure and functioning as well as assembly of eukaryote genomes from complex environmental DNA samples.


2021 ◽  
Author(s):  
Anna Cusco ◽  
Daniel Perez ◽  
Joaquim Vines ◽  
Norma Fabregas ◽  
Olga Francino

Long-read metagenomics facilitates the assembly of high-quality metagenome-assembled genomes (HQ MAGs) out of complex microbiomes. It provides highly contiguous assemblies by spanning repetitive regions, complete ribosomal genes, and mobile genetic elements. Hi-C proximity ligation data bins the long contigs and their associated extra-chromosomal elements to their bacterial host. Here, we characterized a canine fecal sample combining a long-read metagenomics assembly with Hi-C data, and further correcting frameshift errors. We retrieved 27 HQ MAGs and seven medium-quality (MQ) MAGs considering MIMAG criteria. All the long-read canine MAGs improved previous short-read MAGs from public datasets regarding contiguity of the assembly, presence, and completeness of the ribosomal operons, and presence of canonical tRNAs. This trend was also observed when comparing to representative genomes from a pure culture (short-read assemblies). Moreover, Hi-C data linked six potential plasmids to their bacterial hosts. Finally, we identified 51 bacteriophages integrated into their bacterial host, providing novel host information for eight viral clusters that included Gut Phage Database viral genomes. Even though three viral clusters were species-specific, most of them presented a broader host range. In conclusion, long-read metagenomics retrieved long contigs harboring complete assembled ribosomal operons, prophages, and other mobile genetic elements. Hi-C binned together the long contigs into HQ and MQ MAGs, some of them representing closely related species. Long-read metagenomics and Hi-C proximity ligation are likely to become a comprehensive approach to HQ MAGs discovery and assignment of extra-chromosomal elements to their bacterial host.


2021 ◽  
Author(s):  
Yu-Hsiang Chen ◽  
Pei-Wen Chiang ◽  
Denis Yu Rogozin ◽  
Andrey Georgievich Degermendzhy ◽  
Hsiu-Hui Chiu ◽  
...  

Background: Most of Earth's bacteria have yet to be cultivated. The metabolic and functional potentials of these uncultivated microorganisms thus remain mysterious, and the metagenome-assembled genome (MAG) approach is the most robust method for uncovering these potentials. However, MAGs discovered by conventional metagenomic assembly and binning methods are usually highly fragmented genomes with heterogeneous sequence contamination, and this affects the accuracy and sensitivity of genomic analyses. Though the maturation of long-read sequencing technologies provides a good opportunity to fix the problem of highly fragmented MAGs as mentioned above, the method's error-prone nature causes severe problems of long-read-alone metagenomics. Hence, methods are urgently needed to retrieve MAGs by a combination of both long- and short-read technologies to advance genome-centric metagenomics. Results: In this study, we combined Illumina and Nanopore data to develop a new workflow to reconstruct 233 MAGs-six novel bacterial orders, 20 families, 66 genera, and 154 species-from Lake Shunet, a secluded meromictic lake in Siberia. Those new MAGs were underrepresented or undetectable in other MAGs studies using metagenomes from human or other common organisms or habitats. Using this newly developed workflow and strategy, the average N50 of reconstructed MAGs greatly increased 10-40-fold compared to when the conventional Illumina assembly and binning method were used. More importantly, six complete MAGs were recovered from our datasets, five of which belong to novel species. We used these as examples to demonstrate many novel and intriguing genomic characteristics discovered in these newly complete genomes and proved the importance of high-quality complete MAGs in microbial genomics and metagenomics studies. Conclusions: The results show that it is feasible to apply our workflow with a few additional long reads to recover numerous complete and high-quality MAGs from short-read metagenomes of high microbial diversity environment samples. The unique features we identified from five complete genomes highlight the robustness of this method in genome-centric metagenomic research. The recovery of 154 novel species MAGs from a rarely explored lake greatly expands the current bacterial genome encyclopedia and broadens our knowledge by adding new genomic characteristics of bacteria. It demonstrates a strong need to recover MAGs from diverse unexplored habitats in the search for microbial dark matter.


Author(s):  
Shilpa Garg

High-quality chromosome-scale haplotype sequences— of diploid genomes, polyploid genomes and metagenomes — provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information that spans whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent methodological progress in these areas and discuss perspectives that could enable routine high-quality haplotype reconstruction in clinical and evolutionary studies.


2021 ◽  
Vol 12 ◽  
Author(s):  
Min Tang ◽  
Suqun He ◽  
Xun Gong ◽  
Peng Lü ◽  
Rehab H. Taha ◽  
...  

The reference genomes of Bombyx mori (B. mori), Silkworm Knowledge-based database (SilkDB) and SilkBase, have served as the gold standard for nearly two decades. Their use has fundamentally shaped model organisms and accelerated relevant studies on lepidoptera. However, the current reference genomes of B. mori do not accurately represent the full set of genes for any single strain. As new genome-wide sequencing technologies have emerged and the cost of high-throughput sequencing technology has fallen, it is now possible for standard laboratories to perform full-genome assembly for specific strains. Here we present a high-quality de novo chromosome-level genome assembly of a single B. mori with nuclear polyhedrosis virus (BmNPV) resistance through the integration of PacBio long-read sequencing, Illumina short-read sequencing, and Hi-C sequencing. In addition, regular bioinformatics analyses, such as gene family, phylogenetic, and divergence analyses, were performed. The sample was from our unique B. mori species (NB), which has strong inborn resistance to BmNPV. Our genome assembly showed good collinearity with SilkDB and SilkBase and particular regions. To the best of our knowledge, this is the first genome assembly with BmNPV resistance, which should be a more accurate insect model for resistance studies.


Plant Disease ◽  
2021 ◽  
Author(s):  
Shiqin Zheng ◽  
Ruiqi Chen ◽  
Zhe Wang ◽  
Juan Liu ◽  
Yan Cai ◽  
...  

Tea grey blight is one of the most serious foliar diseases of tea tree caused by the plant pathogenic fungus Pseudopestalotiopsis theae which can affect production and quality of tea worldwide. We generated a highly contiguous, 50.41Mbp genome assembly (N50 1.30 Mbp) of P. theae strain CYF27 by combining PacBio long-read and Illumina short-read sequencing technologies. We identified a total of 15,626 gene models, of which 1,038 genes encode putative secreted proteins. The high-quality genome assembly and annotation resource reported here will be useful for the study of fungal infection mechanisms and pathogen-host interaction.


Sign in / Sign up

Export Citation Format

Share Document