scholarly journals Whole genome data from Curtobacterium flaccumfaciens pv. flaccumfaciens strains associated with tan spot of mungbean and soybean reveal diverse plasmid profiles

Author(s):  
Niloofar Vaghefi ◽  
Dante Adorada ◽  
Lauren Huth ◽  
Lisa A Kelly ◽  
Barsha Poudel ◽  
...  

Despite the substantial economic impact of Curtobacterium flaccumfaciens pv. flaccumfaciens (Cff) on legume productions worldwide, the genetic basis of its pathogenicity and potential host association is poorly understood. The production of high-quality reference genome assemblies of Cff strains associated with different hosts sheds light on the genetic basis of its pathogenic variability and host association. Moreover, the study of recent outbreaks of bacterial wilt and microevolution of the pathogen in Australia requires access to high-quality, reference genomes that are sufficiently closely related to the population being studied within Australia. We provide the first genome assemblies of Cff strains associated with mungbean and soybean, which revealed high variability in their plasmid composition. The analysis of Cff genomes revealed an extensive suite of carbohydrate-active enzymes potentially associated with pathogenicity, including four carbohydrate esterases, 50 glycoside hydrolases, 23 glycosyl transferases, and a polysaccharide lyase. We also identified 11 serine peptidases, three of which were located within a linear plasmid, pCff119. These high-quality assemblies and annotations will provide a foundation for population genomics studies of Cff in Australia and for answering fundamental questions regarding pathogenicity factors and adaptation of Cff to various hosts worldwide, and, at a broader scale, contribute to unravelling genomic features of Gram-positive, xylem-inhabiting bacterial pathogens.

2021 ◽  
Vol 10 (31) ◽  
Author(s):  
Keeley O’Grady ◽  
Thomas V. Riley ◽  
Daniel R. Knight

Clostridioides difficile infection (CDI) is the leading cause of life-threatening health care-related gastrointestinal illness worldwide. Phylogenetically appropriate closed reference genomes are essential for studies of C. difficile transmission and evolution. Here, we provide high-quality complete hybrid genome assemblies for the three most prevalent C. difficile strains causing CDI in Australia.


Nature ◽  
2021 ◽  
Vol 592 (7856) ◽  
pp. 737-746 ◽  
Author(s):  
Arang Rhie ◽  
Shane A. McCarthy ◽  
Olivier Fedrigo ◽  
Joana Damas ◽  
Giulio Formenti ◽  
...  

AbstractHigh-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.


Author(s):  
Arang Rhie ◽  
Shane A. McCarthy ◽  
Olivier Fedrigo ◽  
Joana Damas ◽  
Giulio Formenti ◽  
...  

AbstractHigh-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.


2018 ◽  
Author(s):  
Danny E. Miller ◽  
Cynthia Staber ◽  
Julia Zeitlinger ◽  
R. Scott Hawley

ABSTRACTThe Drosophila genus is a unique group containing a wide range of species that occupy diverse ecosystems. In addition to the most widely studied species, Drosophila melanogaster, many other members in this genus also possess a well-developed set of genetic tools. Indeed, high-quality genomes exist for several species within the genus, facilitating studies of the function and evolution of cis-regulatory regions and proteins by allowing comparisons across at least 50 million years of evolution. Yet, the available genomes still fail to capture much of the substantial genetic diversity within the Drosophila genus. We have therefore tested protocols to rapidly and inexpensively sequence and assemble the genome from any Drosophila species using single-molecule sequencing technology from Oxford Nanopore. Here, we use this technology to present high-quality genome assemblies of 15 Drosophila species: 10 of the 12 originally sequenced Drosophila species (ananassae, erecta, mojavensis, persimilis, pseudoobscura, sechellia, simulans, virilis, willistoni, and yakuba), four additional species that had previously reported assemblies (biarmipes, bipectinata, eugracilis, and mauritiana), and one novel assembly (triauraria). Genomes were generated from an average of 29x depth-of-coverage data that after assembly resulted in an average contig N50 of 4.4 Mb. Subsequent alignment of contigs from the published reference genomes demonstrates that our assemblies could be used to close over 60% of the gaps present in the currently published reference genomes. Importantly, the materials and reagents cost for each genome was approximately $1,000 (USD). This study demonstrates the power and cost-effectiveness of long-read sequencing for genome assembly in Drosophila and provides a framework for the affordable sequencing and assembly of additional Drosophila genomes.


2021 ◽  
Author(s):  
Michael Alonge ◽  
Ludivine Lebeigle ◽  
Melanie Kirsche ◽  
Sergey Aganezov ◽  
Xingang Wang ◽  
...  

Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a rapid-cycling genotype that we developed to accelerate functional genomics and genome editing. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species.


2016 ◽  
Author(s):  
Gavin G. Rutledge ◽  
Ulrike Böehme ◽  
Mandy Sanders ◽  
Adam J. Reid ◽  
Oumou Maiga-Ascofare ◽  
...  

SummaryDespite the huge international endeavor to understand the genomic basis of malaria biology, there remains a lack of information about two human-infective species: Plasmodium malariae and P. ovale. The former is prevalent across all malaria endemic regions and able to recrudesce decades after the initial infection. The latter is a dormant stage hypnozoite-forming species, similar to P. vivax. Here we present the newly assembled reference genomes of both species, thereby completing the set of all human-infective Plasmodium species. We show that the P. malariae genome is markedly different to other Plasmodium genomes and relate this to its unique biology. Using additional draft genome assemblies, we confirm that P. ovale consists of two cryptic species that may have diverged millions of years ago. These genome sequences provide a useful resource to study the genetic basis of human-infectivity in Plasmodium species.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 222
Author(s):  
Bartosz Ulaszewski ◽  
Joanna Meger ◽  
Jaroslaw Burczyk

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Michael Abrouk ◽  
Naveenkumar Athiyannan ◽  
Thomas Müller ◽  
Yveline Pailles ◽  
Christoph Stritt ◽  
...  

AbstractThe cloning of agriculturally important genes is often complicated by haplotype variation across crop cultivars. Access to pan-genome information greatly facilitates the assessment of structural variations and rapid candidate gene identification. Here, we identified the red glume 1 (Rg-B1) gene using association genetics and haplotype analyses in ten reference grade wheat genomes. Glume color is an important trait to characterize wheat cultivars. Red glumes are frequent among Central European spelt, a dominant wheat subspecies in Europe before the 20th century. We used genotyping-by-sequencing to characterize a global diversity panel of 267 spelt accessions, which provided evidence for two independent introductions of spelt into Europe. A single region at the Rg-B1 locus on chromosome 1BS was associated with glume color in the diversity panel. Haplotype comparisons across ten high-quality wheat genomes revealed a MYB transcription factor as candidate gene. We found extensive haplotype variation across the ten cultivars, with a particular group of MYB alleles that was conserved in red glume wheat cultivars. Genetic mapping and transient infiltration experiments allowed us to validate this particular MYB transcription factor variants. Our study demonstrates the value of multiple high-quality genomes to rapidly resolve copy number and haplotype variations in regions controlling agriculturally important traits.


Sign in / Sign up

Export Citation Format

Share Document