Genome of the parasitoid wasp Dinocampus coccinellae reveals extensive duplications, accelerated evolution, and independent origins of thelytokous parthenogeny and solitary behavior

Abstract Dinocampus coccinellae (Hymenoptera: Braconidae) is a generalist parasitoid wasp that parasitizes >50 species of predatory lady beetles (Coleoptera: Coccinellidae), with thelytokous parthenogeny as its primary mode of reproduction. Here we present the first high quality genome of D. coccinellae using a combination of short and long read sequencing technologies, followed by assembly and scaffolding of chromosomal segments using Chicago+HiC technologies. We also present a first-pass ab initio and a reference-based genome annotation, and resolve timings of divergence and evolution of (1) solitary behavior vs eusociality, (2) arrhenotokous vs thelytokous parthenogenesis, and (3) rates of gene loss and gain among Hymenopteran lineages. Our study finds (1) at least two independent origins of eusociality and solitary behavior among Hymenoptera, (2) two independent origins of thelytokous parthenogenesis from ancestral arrhenotoky, and (3) accelerated rates of gene duplications, loss, and gain along the lineages leading to D. coccinellae. Our work both affirms the ancient divergence of Braconid wasps from ancestral Hymenopterans and accelerated rates of evolution in response to adaptations to novel hosts, including polyDNA viral co-evolution.

Download Full-text

Genome of the parasitoid wasp Dinocampus coccinellae reveals extensive duplications, accelerated evolution, and independent origins of thelytokous parthenogeny and solitary behavior

10.1101/2021.06.30.450623 ◽

2021 ◽

Author(s):

Arun Sethuraman ◽

Alicia Tovar ◽

Christy Grenier ◽

Walker Welch ◽

Camila Arce ◽

...

Keyword(s):

Parasitoid Wasp ◽

Gene Duplications ◽

Thelytokous Parthenogenesis ◽

Dinocampus Coccinellae ◽

Sequencing Technologies ◽

Rates Of Evolution ◽

Accelerated Evolution ◽

Long Read ◽

Solitary Behavior ◽

High Quality Genome

Dinocampus coccinellae (Hymenoptera: Braconidae) is a generalist parasitoid wasp that parasitizes >50 species of predatory lady beetles (Coleoptera: Coccinellidae), with thelytokous parthenogeny as its primary mode of reproduction. Here we present the first high quality genome of D. coccinellae using a combination of short and long read sequencing technologies, followed by assembly and scaffolding of chromosomal segments using Chicago+HiC technologies. We also present a first-pass ab initio genome annotation, and resolve timings of divergence and evolution of (1) solitary behavior vs eusociality, (2) arrhenotokous vs thelytokous parthenogenesis, and (3) rates of gene loss and gain among Hymenopteran lineages. Our study finds (1) at least two independent origins of eusociality and solitary behavior among Hymenoptera, (2) two independent origins of thelytokous parthenogenesis from ancestral arrhenotoky, and (3) accelerated rates of gene duplications, loss, and gain along the lineages leading to D. coccinellae. Our work both affirms the ancient divergence of Braconid wasps from ancestral Hymenopterans and accelerated rates of evolution in response to adaptations to novel hosts, including polyDNA viral co-evolution.

Download Full-text

ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs

10.1101/2020.01.13.905240 ◽

2020 ◽

Author(s):

Lauren Coombe ◽

Vladimir Nikolić ◽

Justin Chu ◽

Inanc Birol ◽

René L. Warren

Keyword(s):

Reference Sequence ◽

Biological Research ◽

Closely Related Species ◽

Draft Assembly ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

Genome Assemblies ◽

High Quality Genome ◽

Reference Human Genome

AbstractSummaryThe ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short read assembly with a draft long read assembly, and a draft assembly with an assembly from a closely-related species. When scaffolding a human short read assembly using the reference human genome or a long read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using less than 11 GB of RAM. Compared to existing reference-guided assemblers, ntJoin generates highly contiguous assemblies faster and using less memory.Availability and implementationntJoin is written in C++ and Python, and is freely available at https://github.com/bcgsc/[email protected]

Download Full-text

Highly-accurate long-read sequencing improves variant detection and assembly of a human genome

10.1101/519025 ◽

2019 ◽

Cited By ~ 27

Author(s):

Aaron M. Wenger ◽

Paul Peluso ◽

William J. Rowell ◽

Pi-Chuan Chang ◽

Richard J. Hall ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Short Reads ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Variant Detection ◽

High Quality Genome ◽

Circular Consensus Sequencing

AbstractThe major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.

Download Full-text

High-quality genome assembly of Pseudopestalotiopsis theae, the pathogenic fungus of tea grey blight

Plant Disease ◽

10.1094/pdis-02-21-0318-a ◽

2021 ◽

Author(s):

Shiqin Zheng ◽

Ruiqi Chen ◽

Zhe Wang ◽

Juan Liu ◽

Yan Cai ◽

...

Keyword(s):

Genome Assembly ◽

Pathogenic Fungus ◽

High Quality ◽

Tea Tree ◽

Sequencing Technologies ◽

Host Interaction ◽

Long Read ◽

Infection Mechanisms ◽

Grey Blight ◽

High Quality Genome

Tea grey blight is one of the most serious foliar diseases of tea tree caused by the plant pathogenic fungus Pseudopestalotiopsis theae which can affect production and quality of tea worldwide. We generated a highly contiguous, 50.41Mbp genome assembly (N50 1.30 Mbp) of P. theae strain CYF27 by combining PacBio long-read and Illumina short-read sequencing technologies. We identified a total of 15,626 gene models, of which 1,038 genes encode putative secreted proteins. The high-quality genome assembly and annotation resource reported here will be useful for the study of fungal infection mechanisms and pathogen-host interaction.

Download Full-text

ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs

Bioinformatics ◽

10.1093/bioinformatics/btaa253 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3885-3887 ◽

Cited By ~ 1

Author(s):

Lauren Coombe ◽

Vladimir Nikolić ◽

Justin Chu ◽

Inanc Birol ◽

René L Warren

Keyword(s):

Reference Sequence ◽

Supplementary Information ◽

Biological Research ◽

Closely Related Species ◽

Draft Assembly ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

Genome Assemblies ◽

High Quality Genome

Abstract Summary The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory. Availability and implementation ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Long-read genome sequence and assembly ofLeptopilina boulardi: a specialistDrosophilaparasitoid

10.1101/284679 ◽

2018 ◽

Author(s):

Shagufta Khan ◽

Divya Tej Sowpati ◽

Rakesh K Mishra

Keyword(s):

Draft Genome ◽

Parasitoid Wasp ◽

Rna Seq ◽

Protein Coding ◽

Repeat Elements ◽

Larval Stages ◽

Long Read ◽

Average Gene ◽

Total Coverage ◽

High Quality Genome

AbstractBackgroundLeptopilina boulardiis a specialist parasitoid belonging to the order Hymenoptera, which attacks the larval stages ofDrosophila. TheLeptopilinagenus has enormous value in the biological control of pests as well as in understanding several aspects of host-parasitoid biology. However, none of the members of Figitidae family has their genomes sequenced. In order to improve the understanding of the parasitoid wasps by generating genomic resources, we sequenced the whole genome ofL. boulardi.FindingsHere, we report a high-quality genome ofL. boulardi, assembled from 70Gb of Illumina reads and 10.5Gb of PacBio reads, forming a total coverage of 230X. The 375Mb draft genome has an N50 of 275Kb with 6315 scaffolds >500bp, and encompasses >95% complete BUSCOs. The GC% of the genome is 28.26%, and RepeatMasker identified 868105 repeat elements covering 43.9% of the assembly. A total of 25259 protein-coding genes were predicted using a combination ofab-initioand RNA-Seq based methods, with an average gene size of 3.9Kb. 78.11% of the predicted genes could be annotated with at least one function.ConclusionOur study provides a highly reliable assembly of this parasitoid wasp, which will be a valuable resource to researchers studying parasitoids. In particular, it can help delineate the host-parasitoid mechanisms that are part of theDrosophila–Leptopilinamodel system.

Download Full-text

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Nature Communications ◽

10.1038/s41467-021-24041-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chong Chu ◽

Rebeca Borges-Monroy ◽

Vinayak V. Viswanadham ◽

Soohyun Lee ◽

Heng Li ◽

...

Keyword(s):

Transposable Element ◽

Structure And Function ◽

Endogenous Retroviruses ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

And Function

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.

Download Full-text

Survey of the Bradysia odoriphaga Transcriptome Using PacBio Single-Molecule Long-Read Sequencing

Genes ◽

10.3390/genes10060481 ◽

2019 ◽

Vol 10 (6) ◽

pp. 481 ◽

Cited By ~ 1

Author(s):

Chen ◽

Lin ◽

Xie ◽

Zhong ◽

Zhang ◽

...

Keyword(s):

Insecticide Resistance ◽

Single Molecule ◽

Functional Categories ◽

Genetic Studies ◽

Sequencing Technologies ◽

Clusters Of Orthologous Groups ◽

Long Read ◽

Main Gene ◽

First Time ◽

Main Factor

The damage caused by Bradysia odoriphaga is the main factor threatening the production of vegetables in the Liliaceae family. However, few genetic studies of B. odoriphaga have been conducted because of a lack of genomic resources. Many long-read sequencing technologies have been developed in the last decade; therefore, in this study, the transcriptome including all development stages of B. odoriphaga was sequenced for the first time by Pacific single-molecule long-read sequencing. Here, 39,129 isoforms were generated, and 35,645 were found to have annotation results when checked against sequences available in different databases. Overall, 18,473 isoforms were distributed in 25 various Clusters of Orthologous Groups, and 11,880 isoforms were categorized into 60 functional groups that belonged to the three main Gene Ontology classifications. Moreover, 30,610 isoforms were assigned into 44 functional categories belonging to six main Kyoto Encyclopedia of Genes and Genomes functional categories. Coding DNA sequence (CDS) prediction showed that 36,419 out of 39,129 isoforms were predicted to have CDS, and 4319 simple sequence repeats were detected in total. Finally, 266 insecticide resistance and metabolism-related isoforms were identified as candidate genes for further investigation of insecticide resistance and metabolism in B. odoriphaga.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Evidence of two deeply divergent co-existing mitochondrial genomes in the Tuatara reveals an extremely complex genomic organization

Communications Biology ◽

10.1038/s42003-020-01639-0 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

J. Robert Macey ◽

Stephan Pabinger ◽

Charles G. Barbieri ◽

Ella S. Buring ◽

Vanessa L. Gonzalez ◽

...

Keyword(s):

Genomic Organization ◽

Phylogenetic Analyses ◽

Genomic Polymorphism ◽

Sequencing Technologies ◽

Cold Tolerant ◽

Long Read ◽

Sphenodon Punctatus ◽

Nucleotide Divergence ◽

Mitochondrial Heteroplasmy ◽

Transmembrane Regions

AbstractAnimal mitochondrial genomic polymorphism occurs as low-level mitochondrial heteroplasmy and deeply divergent co-existing molecules. The latter is rare, known only in bivalvian mollusks. Here we show two deeply divergent co-existing mt-genomes in a vertebrate through genomic sequencing of the Tuatara (Sphenodon punctatus), the sole-representative of an ancient reptilian Order. The two molecules, revealed using a combination of short-read and long-read sequencing technologies, differ by 10.4% nucleotide divergence. A single long-read covers an entire mt-molecule for both strands. Phylogenetic analyses suggest a 7–8 million-year divergence between genomes. Contrary to earlier reports, all 37 genes typical of animal mitochondria, with drastic gene rearrangements, are confirmed for both mt-genomes. Also unique to vertebrates, concerted evolution drives three near-identical putative Control Region non-coding blocks. Evidence of positive selection at sites linked to metabolically important transmembrane regions of encoded proteins suggests these two mt-genomes may confer an adaptive advantage for an unusually cold-tolerant reptile.

Download Full-text