Genomes of the willow-galling sawflies Euura lappo and Eupontania aestiva (Hymenoptera: Tenthredinidae): a resource for research on ecological speciation, adaptation, and gall induction

Abstract Hymenoptera are a hyperdiverse insect order represented by over 153,000 different species. As many hymenopteran species perform various crucial roles for our environment, such as pollination, herbivory, and parasitism, they are of high economic and ecological importance. There are 99 hymenopteran genomes in the NCBI database, yet only five are representative of the paraphyletic suborder Symphyta (sawflies, woodwasps, and horntails), while the rest represent the suborder Apocrita (bees, wasps, and ants). Here, using a combination of 10X Genomics linked-read sequencing, Oxford Nanopore long-read technology, and Illumina short-read data, we assembled the genomes of two willow-galling sawflies (Hymenoptera: Tenthredinidae: Nematinae: Euurina): the bud-galling species Euura lappo and the leaf-galling species Eupontania aestiva. The final assembly for E. lappo is 259.85 Mbp in size, with a contig N50 of 209.0 kbp and a BUSCO score of 93.5%. The E. aestiva genome is 222.23 Mbp in size, with a contig N50 of 49.7 kbp and an 90.2% complete BUSCO score. De novo annotation of repetitive elements showed that 27.45% of the genome was composed of repetitive elements in E. lappo and 16.89% in E. aestiva, which is a marked increase compared to previously published hymenopteran genomes. The genomes presented here provide a resource for inferring phylogenetic relationships among basal hymenopterans, comparative studies on host-related genomic adaptation in plant-feeding insects, and research on the mechanisms of plant manipulation by gall-inducing insects.

Download Full-text

Oxford Nanopore Sequencing, Hybrid Error Correction, and de novo Assembly of a Eukaryotic Genome

10.1101/013490 ◽

2015 ◽

Cited By ~ 23

Author(s):

Sara Goodwin ◽

James Gurtowski ◽

Scott Ethe-Sayers ◽

Panchajanya Deshpande ◽

Michael Schatz ◽

...

Keyword(s):

Error Correction ◽

De Novo Assembly ◽

De Novo ◽

Correction Algorithm ◽

Membrane Pore ◽

Complete Representation ◽

Oxford Nanopore ◽

Long Read ◽

Error Correction Algorithm ◽

Sequencing Instrument

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available that we used for sequencing the S. cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr (https://github.com/jgurtowski/nanocorr) specifically for Oxford Nanopore reads, as existing packages were incapable of assembling the long read lengths (5-50kbp) at such high error rate (between ~5 and 40% error). With this new method we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: the contig N50 length is more than ten-times greater than an Illumina-only assembly (678kb versus 59.9kbp), and has greater than 99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.

Download Full-text

A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping

10.1101/327817 ◽

2018 ◽

Cited By ~ 2

Author(s):

Stáphane Deschamps ◽

Yun Zhang ◽

Victor Llaca ◽

Liang Ye ◽

Gregory May ◽

...

Keyword(s):

Sorghum Bicolor ◽

De Novo ◽

Hybrid Assembly ◽

The Public ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Optical Maps ◽

Eukaryotic Genomes ◽

Direct Label

The advent of long-read sequencing technologies has greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer were combined with BioNano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final hybrid assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 value of 33.28Mbps and covers >90% of Sorghum bicolor expected genome length. A sequence accuracy of 99.67% was obtained in unique regions after aligning contigs against Illumina Tx430 data. Alignments showed that 99.4% of the 34,211 public gene models are present in the assembly, including 94.2% mapping end-to-end. Comparisons of the DLS optical maps against the public Sorghum Bicolor v3.0.1 BTx623 genome assembly suggest the presence of substantial genomic rearrangements whose origin remains to be determined.

Download Full-text

De novo genome assembly of the olive fruit fly (Bactrocera oleae) developed through a combination of linked-reads and long-read technologies

10.1101/505040 ◽

2018 ◽

Cited By ~ 1

Author(s):

Haig Djambazian ◽

Anthony Bayega ◽

Konstantina T. Tsoumani ◽

Efthimia Sagri ◽

Maria-Eleni Gregoriou ◽

...

Keyword(s):

Y Chromosome ◽

De Novo ◽

Fruit Fly ◽

Bactrocera Oleae ◽

Olive Fruit Fly ◽

Olive Fruit ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

AbstractLong-read sequencing has greatly contributed to the generation of high quality assemblies, albeit at a high cost. It is also not always clear how to combine sequencing platforms. We sequenced the genome of the olive fruit fly (Bactrocera oleae), the most important pest in the olive fruits agribusiness industry, using Illumina short-reads, mate-pairs, 10x Genomics linked-reads, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT). The 10x linked-reads assembly gave the most contiguous assembly with an N50 of 2.16 Mb. Scaffolding the linked-reads assembly using long-reads from ONT gave a more contiguous assembly with scaffold N50 of 4.59 Mb. We also present the most extensive transcriptome datasets of the olive fly derived from different tissues and stages of development. Finally, we used the Chromosome Quotient method to identify Y-chromosome scaffolds and show that the long-reads based assembly generates very highly contiguous Y-chromosome assembly.JR is a member of the MinION Access Program (MAP) and has received free-of-charge flow cells and sequencing kits from Oxford Nanopore Technologies for other projects. JR has had no other financial support from ONT.AB has received re-imbursement for travel costs associated with attending Nanopore Community meeting 2018, a meeting organized my Oxford Nanopore Technologies.

Download Full-text

Targeted Long-Read RNA Sequencing Demonstrates Transcriptional Diversity Driven by Splice-Site Variation in MYBPC3

10.1101/522698 ◽

2019 ◽

Author(s):

Alexandra Dainis ◽

Elizabeth Tseng ◽

Tyson A. Clark ◽

Ting Hon ◽

Matthew Wheeler ◽

...

Keyword(s):

Rna Sequencing ◽

Splice Site ◽

De Novo ◽

Exon Skipping ◽

Molecular Evidence ◽

Additional Tool ◽

Oxford Nanopore ◽

Site Variation ◽

Long Read ◽

Alternatively Spliced

ABSTRACTBackgroundClinical sequencing has traditionally focused on genomic DNA through the use of targeted panels and exome sequencing, rather than investigating the potential transcriptomic consequences of disease-associated variants. RNA sequencing has recently been shown to be an effective additional tool for identifying disease-causing variants. We here use targeted long-read genome and transcriptome sequencing to efficiently and economically identify molecular consequences of a rare, disease-associated variant in hypertrophic cardiomyopathy (HCM).Methods and ResultsOur study, which employed both Pacific Biosciences SMRT sequencing and Oxford Nanopore Technologies MinION sequencing, as well as two RNA targeting strategies, identified alternatively-spliced isoforms that resulted from a splice-site variant containing allele in HCM. These included a predicted in-frame exon-skipping event, as well as an abundance of additional isoforms with unexpected intron-inclusion, exon-extension, and pseudo-exon events. The use of long-read RNA sequencing allowed us to not only investigate full length alternatively-spliced transcripts but also to phase them back to the variant-containing allele.ConclusionsWe suggest that targeted, long-read RNA sequencing in conjunction with genome sequencing may provide additional molecular evidence of disease for rare or de novo variants in cardiovascular disease, as well as providing new information about the consequence of these variants on downstream RNA and protein expression.

Download Full-text

Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish

GigaScience ◽

10.1093/gigascience/giaa067 ◽

2020 ◽

Vol 9 (6) ◽

Cited By ~ 3

Author(s):

Lisa K Johnson ◽

Ruta Sahasrabudhe ◽

James Anthony Gill ◽

Jennifer L Roach ◽

Lutz Froenicke ◽

...

Keyword(s):

North American ◽

De Novo ◽

Draft Genome ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Sequence Coverage ◽

Short Read ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Background Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms. Findings Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30–45× sequence coverage, and the Illumina platform was used to generate 50–160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database. Conclusions High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.

Download Full-text

De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm

10.1101/463463 ◽

2018 ◽

Cited By ~ 8

Author(s):

Kristoffer Sahlin ◽

Paul Medvedev

Keyword(s):

Clustering Algorithm ◽

De Novo ◽

Substantial Improvement ◽

Error Rates ◽

Reconstruction Algorithms ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Transcript Reconstruction ◽

Oxford Nanopore Technologies

AbstractLong-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ksahlin/isONclust.

Download Full-text

Clustering de Novo by Gene of Long Reads from Transcriptomics Data

10.1101/170035 ◽

2017 ◽

Cited By ~ 3

Author(s):

Camille Marchet ◽

Lolita Lecompte ◽

Corinne Da Silva ◽

Corinne Cruaud ◽

Jean-Marc Aury ◽

...

Keyword(s):

De Novo ◽

Free Access ◽

Sequencing Data ◽

Base Pairs ◽

Long Reads ◽

Oxford Nanopore ◽

Processing Step ◽

Whole Transcriptome Sequencing ◽

Long Read ◽

Transcriptomics Data

AbstractLong-read sequencing currently provides sequences of several thousand base pairs. This allows to obtain complete transcripts, which offers an un-precedented vision of the cellular transcriptome.However the literature is lacking tools to cluster such data de novo, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads.Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene. This de novo approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping. Our contribution is both to propose a new algorithm adapted to clustering of reads by gene and a practical and free access tool that permits to scale the complete processing of eukaryotic transcriptomes.We sequenced a mouse RNA sample using the MinION device, this dataset is used to compare our solution to other algorithms used in the context of biological clustering. We demonstrate its is better-suited for transcriptomics long reads. When a reference is available thus mapping possible, we show that it stands as an alternative method that predicts complementary clusters.

Download Full-text

Genomic Surveillance for Antimicrobial Resistance inMannheimia haemolyticaUsing Nanopore Single Molecule Sequencing Technology

10.1101/395087 ◽

2018 ◽

Author(s):

Alexander Lim ◽

Bryan Naidenov ◽

Haley Bates ◽

Karyn Willyerd ◽

Timothy Snider ◽

...

Keyword(s):

Antibiotic Resistance ◽

Antimicrobial Resistance ◽

Single Molecule ◽

Resistant Strain ◽

De Novo ◽

Gene Annotation ◽

Cost Effective ◽

Multidrug Resistant ◽

Oxford Nanopore ◽

Long Read

AbstractDisruptive innovations in long-range, cost-effective direct template nucleic acid sequencing are transforming clinical and diagnostic medicine. A multidrug resistant strain and a pan-susceptible strain ofMannheimia haemolytica, isolated from pneumonic bovine lung samples, were respectively sequenced at 146x and 111x coverage with Oxford Nanopore Technologies MinION.De novoassembly produced a complete genome for the non-resistant strain and a nearly complete assembly for the drug resistant strain. Functional annotation using RAST (Rapid Annotations using Subsystems Technology), CARD (Comprehensive Antibiotic Resistance Database) and ResFinder databases identified genes conferring resistance to different classes of antibiotics including beta lactams, tetracyclines, lincosamides, phenicols, aminoglycosides, sulfonamides and macrolides. Antibiotic resistance phenotypes of theM. haemolyticastrains were confirmed with minimum inhibitory concentration (MIC) assays. The sequencing capacity of highly portable MinION devices was verified by sub-sampling sequencing reads; potential for antimicrobial resistance determined by identification of resistance genes in the draft assemblies with as little as 5,437 MinION reads corresponded to all classes of MIC assays. The resulting quality assemblies and AMR gene annotation highlight efficiency of ultra long-read, whole-genome sequencing (WGS) as a valuable tool in diagnostic veterinary medicine.

Download Full-text

De-novo Assembly of Limnospira fusiformis Using Ultra-Long Reads

Frontiers in Microbiology ◽

10.3389/fmicb.2021.657995 ◽

2021 ◽

Vol 12 ◽

Author(s):

McKenna Hicks ◽

Thuy-Khanh Tran-Dao ◽

Logan Mulroney ◽

David L. Bernick

Keyword(s):

Phylogenetic Analysis ◽

Type Strain ◽

Reference Genome ◽

De Novo ◽

Illumina Miseq ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Rdna Analysis

The Limnospira genus is a recently established clade that is economically important due to its worldwide use in biotechnology and agriculture. This genus includes organisms that were reclassified from Arthrospira, which are commercially marketed as “Spirulina.” Limnospira are photoautotrophic organisms that are widely used for research in nutrition, medicine, bioremediation, and biomanufacturing. Despite its widespread use, there is no closed genome for the Limnospira genus, and no reference genome for the type strain, Limnospira fusiformis. In this work, the L. fusiformis genome was sequenced using Oxford Nanopore Technologies MinION and assembled using only ultra-long reads (>35 kb). This assembly was polished with Illumina MiSeq reads sourced from an axenic L. fusiformis culture; axenicity was verified via microscopy and rDNA analysis. Ultra-long read sequencing resulted in a 6.42 Mb closed genome assembled as a single contig with no plasmid. Phylogenetic analysis placed L. fusiformis in the Limnospira clade; some Arthrospira were also placed in this clade, suggesting a misclassification of these strains. This work provides a fully closed and accurate reference genome for the economically important type strain, L. fusiformis. We also present a rapid axenicity method to isolate L. fusiformis. These contributions enable future biotechnological development of L. fusiformis by way of genetic engineering.

Download Full-text

42 An Improved, High-quality Ovine Reference Genome Assembly

Journal of Animal Science ◽

10.1093/jas/skab235.039 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 23-24

Author(s):

Kimberly M Davenport ◽

Derek M Bickhart ◽

Kim Worley ◽

Shwetha C Murali ◽

Noelle Cockett ◽

...

Keyword(s):

Genome Assembly ◽

Functional Annotation ◽

Reference Genome ◽

De Novo ◽

The United States ◽

Read Length ◽

Chromosome 11 ◽

High Quality ◽

Oxford Nanopore ◽

Long Read

Abstract Sheep are an important agricultural species used for both food and fiber in the United States and globally. A high-quality reference genome enhances the ability to discover genetic and biological mechanisms influencing important traits, such as meat and wool quality. The rapid advances in genome assembly algorithms and emergence of increasingly long sequence read length provide the opportunity for an improved de novo assembly of the sheep reference genome. Tissue was collected postmortem from an adult Rambouillet ewe selected by USDA-ARS for the Ovine Functional Annotation of Animal Genomes project. Short-read (55x coverage), long-read PacBio (75x coverage), and Hi-C data from this ewe were retrieved from public databases. We generated an additional 50x coverage of Oxford Nanopore data and assembled the combined long-read data with canu v1.9. The assembled contigs were polished with Nanopolish v0.12.5 and scaffolded using Hi-C data with Salsa v2.2. Gaps were filled with PBsuite v15.8.24 and polished with Nanopolish v0.12.5 followed by removal of duplicate contigs with PurgeDups v1.0.1. Chromosomes were oriented by identifying centromeres and telomeres with RepeatMasker v4.1.1, indicating a need to reverse the orientation of chromosome 11 relative to Oar_rambouillet_v1.0. Final polishing was performed with two rounds of a pipeline which consisted of freebayes v1.3.1 to call variants, Merfin to validate them, and BCFtools to generate the consensus fasta. The ARS-UI_Ramb_v2.0 assembly has improved continuity (contig N50 of 43.19 Mb) with a 19-fold and 38-fold decrease in the number of scaffolds compared with Oar_rambouillet_v1.0 and Oar_v4.0. ARS-UI_Ramb_v2.0 has greater per-base accuracy and fewer insertions and deletions identified from mapped RNA sequence than previous assemblies. This significantly improved reference assembly, public at NCBI GenBank under accession number GCA_016772045, will optimize the functional annotation of the sheep genome and facilitate improved mapping accuracy of genetic variant and expression data for traits relevant the sheep industry.

Download Full-text