scholarly journals Highly contiguous assemblies of 101 drosophilid genomes

2020 ◽  
Author(s):  
Bernard Y Kim ◽  
Jeremy Wang ◽  
Danny E. Miller ◽  
Olga Barmina ◽  
Emily K. Delaney ◽  
...  

Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long read sequencing allow high quality genome assemblies for tens or even hundreds of species to be generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of high-quality assemblies for 101 lines of 95 drosophilid species encompassing 14 species groups and 35 sub-groups with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. These assemblies, along with detailed wet lab protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution within this key group.

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Bernard Y Kim ◽  
Jeremy Wang ◽  
Danny E Miller ◽  
Olga Barmina ◽  
Emily Kay Delaney ◽  
...  

Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.


2018 ◽  
Author(s):  
Danny E. Miller ◽  
Cynthia Staber ◽  
Julia Zeitlinger ◽  
R. Scott Hawley

ABSTRACTThe Drosophila genus is a unique group containing a wide range of species that occupy diverse ecosystems. In addition to the most widely studied species, Drosophila melanogaster, many other members in this genus also possess a well-developed set of genetic tools. Indeed, high-quality genomes exist for several species within the genus, facilitating studies of the function and evolution of cis-regulatory regions and proteins by allowing comparisons across at least 50 million years of evolution. Yet, the available genomes still fail to capture much of the substantial genetic diversity within the Drosophila genus. We have therefore tested protocols to rapidly and inexpensively sequence and assemble the genome from any Drosophila species using single-molecule sequencing technology from Oxford Nanopore. Here, we use this technology to present high-quality genome assemblies of 15 Drosophila species: 10 of the 12 originally sequenced Drosophila species (ananassae, erecta, mojavensis, persimilis, pseudoobscura, sechellia, simulans, virilis, willistoni, and yakuba), four additional species that had previously reported assemblies (biarmipes, bipectinata, eugracilis, and mauritiana), and one novel assembly (triauraria). Genomes were generated from an average of 29x depth-of-coverage data that after assembly resulted in an average contig N50 of 4.4 Mb. Subsequent alignment of contigs from the published reference genomes demonstrates that our assemblies could be used to close over 60% of the gaps present in the currently published reference genomes. Importantly, the materials and reagents cost for each genome was approximately $1,000 (USD). This study demonstrates the power and cost-effectiveness of long-read sequencing for genome assembly in Drosophila and provides a framework for the affordable sequencing and assembly of additional Drosophila genomes.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


GigaScience ◽  
2019 ◽  
Vol 8 (10) ◽  
Author(s):  
Sarah B Kingan ◽  
Julie Urban ◽  
Christine C Lambert ◽  
Primo Baybayan ◽  
Anna K Childers ◽  
...  

ABSTRACT Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


Biology ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 732
Author(s):  
Benjamin Istace ◽  
Caroline Belser ◽  
Cyril Falentin ◽  
Karine Labadie ◽  
Franz Boideau ◽  
...  

With the rise of long-read sequencers and long-range technologies, delivering high-quality plant genome assemblies is no longer reserved to large consortia. Not only sequencing techniques, but also computer algorithms have reached a point where the reconstruction of assemblies at the chromosome scale is now feasible at the laboratory scale. Current technologies, in particular long-range technologies, are numerous, and selecting the most promising one for the genome of interest is crucial to obtain optimal results. In this study, we resequenced the genome of the yellow sarson, Brassica rapa cv. Z1, using the Oxford Nanopore PromethION sequencer and assembled the sequenced data using current assemblers. To reconstruct complete chromosomes, we used and compared three long-range scaffolding techniques, optical mapping, Omni-C, and Pore-C sequencing libraries, commercialized by Bionano Genomics, Dovetail Genomics, and Oxford Nanopore Technologies, respectively, or a combination of the three, in order to evaluate the capability of each technology.


Author(s):  
Benjamin Istace ◽  
Caroline Belser ◽  
Cyril Falentin ◽  
Karine Labadie ◽  
Franz Boideau ◽  
...  

With the rise of long-read sequencers and long-range technologies, delivering high-quality plant genome assemblies is no longer reserved to large consortia. Indeed, sequencing techniques but also computer algorithms have reached a point where the reconstruction of assemblies at the chromosome-scale is now feasible at the laboratory scale. Current technologies, and especially long-range technologies, are numerous and selecting the most promising one for the genome of interest is crucial to obtain optimal results. In this study, we resequenced the genome of the yellow sarson, Brassica rapa cv. Z1, using the Oxford Nanopore PromethION sequencer and assembled the sequenced data using current assemblers. To reconstruct complete chromosomes, we used and compared three long-range techniques, optical mapping, Omni-C and Pore-C sequencing libraries commercialized by Bionano Genomics, Dovetail Genomics and Oxford Nanopore Technologies respectively, or a combination of the three, in order to evaluate the capability of each technology.


2019 ◽  
Author(s):  
Sarah B. Kingan ◽  
Julie Urban ◽  
Christine C. Lambert ◽  
Primo Baybayan ◽  
Anna K. Childers ◽  
...  

AbstractA high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies, however, long-read methods have historically had greater input DNA requirements and higher costs than next generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female Spotted Lanternfly (Lycorma delicatula) using a single PacBio SMRT Cell. The Spotted Lanternfly is an invasive species recently discovered in the northeastern United States, threatening to damage economically important crop plants in the region. The DNA from one individual was used to make one standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on one Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing approximately 36-fold coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Further, it was possible to segregate more than half of the diploid genome into the two separate haplotypes. The assembly also recovered two microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


2020 ◽  
Vol 33 (5) ◽  
pp. 718-720
Author(s):  
Karthi Natesan ◽  
Ji Yeon Park ◽  
Cheol-Woo Kim ◽  
Dong Suk Park ◽  
Young-Seok Kwon ◽  
...  

Peronospora destructor is an obligate biotrophic oomycete that causes downy mildew on onion (Allium cepa). Onion is an important crop worldwide, but its production is affected by this pathogen. We sequenced the genome of P. destructor using the PacBio sequencing platform, and de novo assembly resulted in 74 contigs with a total contig size of 29.3 Mb and 48.48% GC content. Here, we report the first high-quality genome sequence of P. destructor and its comparison with the genome assemblies of other oomycetes. The genome is a very useful resource to serve as a reference for analysis of P. destructor isolates and for comparative genomic studies of the biotrophic oomycetes.


2020 ◽  
Vol 10 (7) ◽  
pp. 2179-2183 ◽  
Author(s):  
Stefan Prost ◽  
Malte Petersen ◽  
Martin Grethlein ◽  
Sarah Joy Hahn ◽  
Nina Kuschik-Maczollek ◽  
...  

Ever decreasing costs along with advances in sequencing and library preparation technologies enable even small research groups to generate chromosome-level assemblies today. Here we report the generation of an improved chromosome-level assembly for the Siamese fighting fish (Betta splendens) that was carried out during a practical university master’s course. The Siamese fighting fish is a popular aquarium fish and an emerging model species for research on aggressive behavior. We updated the current genome assembly by generating a new long-read nanopore-based assembly with subsequent scaffolding to chromosome-level using previously published Hi-C data. The use of ∼35x nanopore-based long-read data sequenced on a MinION platform (Oxford Nanopore Technologies) allowed us to generate a baseline assembly of only 1,276 contigs with a contig N50 of 2.1 Mbp, and a total length of 441 Mbp. Scaffolding using the Hi-C data resulted in 109 scaffolds with a scaffold N50 of 20.7 Mbp. More than 99% of the assembly is comprised in 21 scaffolds. The assembly showed the presence of 96.1% complete BUSCO genes from the Actinopterygii dataset indicating a high quality of the assembly. We present an improved full chromosome-level assembly of the Siamese fighting fish generated during a university master’s course. The use of ∼35× long-read nanopore data drastically improved the baseline assembly in terms of continuity. We show that relatively in-expensive high-throughput sequencing technologies such as the long-read MinION sequencing platform can be used in educational settings allowing the students to gain practical skills in modern genomics and generate high quality results that benefit downstream research projects.


GigaScience ◽  
2021 ◽  
Vol 10 (12) ◽  
Author(s):  
Ralf C Mueller ◽  
Patrik Ellström ◽  
Kerstin Howe ◽  
Marcela Uliano-Silva ◽  
Richard I Kuo ◽  
...  

Abstract Background The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome. Findings This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families. Conclusions This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long-read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses.


Sign in / Sign up

Export Citation Format

Share Document