A highly contiguous nuclear genome assembly of the mandarinfish Synchiropus splendidus (Syngnathiformes: Callionymidae)

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab306 ◽

2021 ◽

Author(s):

Martin Stervander ◽

William A Cresko

Keyword(s):

Genome Assembly ◽

Nuclear Genome ◽

Phylogenetic Position ◽

Blue Color ◽

Long Reads ◽

The Family ◽

Commercially Important ◽

Genomic Resource ◽

Genome Assemblies ◽

Important Fish

Abstract The fish order Syngnathiformes has been referred to as a collection of misfit fishes, comprising commercially important fish such as red mullets as well as the highly diverse seahorses, pipefishes, and seadragons—the well-known family Syngnathidae, with their unique adaptations including male pregnancy. Another ornate member of this order is the species mandarinfish. No less than two types of chromatophores have been discovered in the spectacularly colored mandarinfish: the cyanophore (producing blue color) and the dichromatic cyano-erythrophore (producing blue and red). The phylogenetic position of mandarinfish in Syngnathiformes, and their promise of additional genetic discoveries beyond the chromatophores, made mandarinfish an appealing target for whole genome sequencing. We used linked sequences to create synthetic long reads, producing a highly contiguous genome assembly for the mandarinfish. The genome assembly comprises 483 Mbp (longest scaffold 29 Mbp), has an N50 of 12 Mbp, and an L50 of 14 scaffolds. The assembly completeness is also high, with 92.6% complete, 4.4% fragmented, and 2.9% missing out of 4,584 BUSCO genes found in ray-finned fishes. Outside the family Syngnathidae, the mandarinfish represents one of the most contiguous syngnathiform genome assemblies to date. The mandarinfish genomic resource will likely serve as a high-quality outgroup to syngnathid fish, and furthermore for research on the genomic underpinnings of the evolution of novel pigmentation.

Download Full-text

Chromosome-Level Genome Assembly and Annotation of a Sciaenid Fish, Argyrosomus japonicus

Genome Biology and Evolution ◽

10.1093/gbe/evaa246 ◽

2021 ◽

Vol 13 (2) ◽

Author(s):

Linlin Zhao ◽

Shengyong Xu ◽

Zhiqiang Han ◽

Qi Liu ◽

Wensi Ke ◽

...

Keyword(s):

Genome Assembly ◽

Wide Distribution ◽

High Quality ◽

Protein Coding ◽

Repeat Elements ◽

Long Reads ◽

The Family ◽

Solid Foundation ◽

Genomic Resource ◽

Chromosome Level

Abstract Argyrosomus japonicus is an economically and ecologically important fish species in the family Sciaenidae with a wide distribution in the world’s oceans. Here, we report a high-quality, chromosome-level genome assembly of A. japonicus based on PacBio and Hi-C sequencing technology. A 673.7-Mb genome containing 282 contigs with an N50 length of 18.4 Mb was obtained based on PacBio long reads. These contigs were further ordered and clustered into 24 chromosome groups based on Hi-C data. In addition, a total of 217.2 Mb (32.24% of the assembled genome) of sequences were identified as repeat elements, and 23,730 protein-coding genes were predicted based on multiple approaches. More than 97% of BUSCO genes were identified in the A. japonicus genome. The high-quality genome assembled in this work not only provides a valuable genomic resource for future population genetics, conservation biology and selective breeding studies of A. japonicus but also lays a solid foundation for the study of Sciaenidae evolution.

Download Full-text

LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads

10.1101/016519 ◽

2015 ◽

Cited By ~ 4

Author(s):

Rene L Warren ◽

Benjamin P Vandervalk ◽

Steven JM Jones ◽

Inanc Birol

Keyword(s):

Genome Assembly ◽

Error Rates ◽

Great Promise ◽

Genomic Information ◽

E Coli ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

K 12 ◽

Genome Assemblies

Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. Established and emerging long read technologies show great promise in this regard, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they could be of value. We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a solution that makes use of the information in error-rich long reads, without the need for read alignment or base correction. We show how the contiguity of an ABySS E. coli K-12 genome assembly could be increased over five-fold by the use of beta-released Oxford Nanopore Ltd. (ONT) long reads and how LINKS leverages long-range information in S. cerevisiae W303 ONT reads to yield an assembly with less than half the errors of competing applications. Re-scaffolding the colossal white spruce assembly draft (PG29, 20 Gbp) and how LINKS scales to larger genomes is also presented. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts. Availability: http://www.bcgsc.ca/bioinfo/software/links

Download Full-text

Long-reads are revolutionizing 20 years of insect genome sequencing

10.1101/2021.02.14.431146 ◽

2021 ◽

Author(s):

Scott Hotaling ◽

John S. Sproul ◽

Jacqueline Heckenhauer ◽

Ashlyn Powell ◽

Amanda M. Larracuente ◽

...

Keyword(s):

Drosophila Melanogaster ◽

Nuclear Genome ◽

Sequencing Technology ◽

Assembly Quality ◽

Insect Genome ◽

Long Reads ◽

Long Read ◽

Field Perspective ◽

Genome Assemblies ◽

The Impact

The first insect genome (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 different insects representing 20 orders. Here, we analyzed the best assembly for each insect and provide a “state of the field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technology. We show that while genomic efforts have been biased towards specific groups (e.g., Diptera), assemblies are generally contiguous with gene regions intact. Most notable, however, has been the impact of long-read sequencing; assemblies that incorporate long-reads are ∼48x more contiguous than those that do not.

Download Full-text

Assessment on the Occurrence of Anisakid and other Endoparasitic Nematodes Infecting Commercially-Important Fishes at Tayabas Bay

The Philippine Journal of Fisheries ◽

10.31398/tpjf/27.2.2020c0008 ◽

2020 ◽

pp. 216-230

Author(s):

Maribeth H. Ramos ◽

Trazarah Hanoof E. Argarin ◽

Beatriz A. Olaivar

Keyword(s):

San Francisco ◽

Fish Species ◽

Stock Assessment ◽

Clinical Manifestations ◽

Mean Intensity ◽

The Family ◽

Commercially Important ◽

Rastrelliger Kanagurta ◽

Anisakid Nematodes ◽

Important Fish

Anisakid nematodes are parasites commonly present in the marine environment. Parasites belonging to the family Anisakidae or the genus Anisakis can cause two different clinical manifestations: gastrointestinal disorders and allergic reactions known as anisakiasis. In this study, we examined 7,126 marine fishes belonging to four different commercially-important fish species; Rastrelliger kanagurta, Sardinella lemuru, Atule mate, and Selar crumenophthalmus for the presence of anisakid and other endoparasitic nematode infection. The fishes caught from Tayabas Bay were bought from three different landing sites from March 2017 to February 2018. The gonads, liver, and stomach of each fish species were incubated for 12-18 hours for rapid isolation and endoparasite evaluation. After the isolation of parasites, anisakid nematodes were fixed in vials with 70% ethanol for morphological analysis under the microscope. Six anisakid groups of genera, including Hysterothylacium, Terranova, Anisakis, Contracaecum, Raphidascaris, and Camallanus, and a non-anisakid group Echinorhynchus were identified. The results showed that the prevalence of anisakid infection in all species was 24.18 %, with a mean intensity of infection of 1.91. Rastrelliger kanagurta (Dalahican), Atule mate, and Selar crumenophthalmus were the most infected with 50.90%, 38.98%, and 30.52% prevalence rate, respectively, followed by Rastrelliger kanagurta (San Francisco) (24.18%) and Sardinella lemuru (7.46%). The collected data suggest that commercially-important fish caught in the Tayabas Bay waters are susceptible to parasitization by larvae of the genus Camallanus followed by Hysterothylacium and Terranova in their visceral organs. The prevalence of anisakid infection was almost similar between female (45.3 %) and male (47.21 %) fishes with a mean intensity of 1.95 & 1.96, respectively. Also, larger fishes were heavily infected with anisakid larvae than small fishes. Thus, the intensity and prevalence of the fish parasite can be used as a biological tag for benchmarking and stock assessment purposes.

Download Full-text

LongStitch: High-quality genome assembly correction and scaffolding using long reads

10.1101/2021.06.17.448848 ◽

2021 ◽

Author(s):

Lauren Coombe ◽

Janet X Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text

Genome of a Giant (Trevally): Caranx ignobilis

10.1101/2021.09.11.459923 ◽

2021 ◽

Author(s):

Brandon D. Pickett ◽

Jessica R. Glass ◽

Perry G. Ridge ◽

John S. K. Kauwe

Keyword(s):

Comparative Genomics ◽

Genome Assembly ◽

Nuclear Genome ◽

Apex Predator ◽

Rna Seq ◽

Distinct Population ◽

High Quality ◽

Pacific Biosciences ◽

Long Reads ◽

Significant Interest

ABSTRACTCaranx ignobilis, commonly known as the kingfish or giant trevally, is a large, reef-associated apex predator. It is a prized sportfish, targeted heavily throughout its tropical and subtropical range in the Indian and Pacific Oceans, and it has drawn significant interest in aquaculture due to an unusual tolerance for freshwater. In this study, we present a high-quality nuclear genome assembly of a C. ignobilis individual from Hawaiian waters, which have recently been shown to host a genetically distinct population. The assembly has a contig NG50 of 7.3Mbp and scaffold NG50 of 46.3Mbp. Twenty-five of the 203 scaffolds contain 90% of the genome. We also present the raw Pacific Biosciences continuous long-reads from which the assembly was created. A Hi-C dataset (Dovetail Genomics Omni-C) and Illumina-based RNA-seq from eight tissues are also presented; the latter of which can be particularly useful for annotation and studies of freshwater tolerance. Overall, this genome assembly and supporting data is a valuable tool for ecological and comparative genomics studies of kingfish and other carangoid fishes.

Download Full-text

Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats

10.1101/300186 ◽

2018 ◽

Cited By ~ 3

Author(s):

Michael Schmid ◽

Daniel Frei ◽

Andrea Patrignani ◽

Ralph Schlapbach ◽

Jürg E. Frey ◽

...

Keyword(s):

Dark Matter ◽

Genome Assembly ◽

De Novo ◽

Bacterial Genomes ◽

De Novo Genome Assembly ◽

Assembly Algorithm ◽

Long Reads ◽

Oxford Nanopore ◽

Prokaryotic Genomes ◽

Genome Assemblies

AbstractGenerating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length. Beyond long repeats, the P19E3 assembly was further complicated by a shufflon region. Its complex genome could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from the Oxford Nanopore Technology. Another important factor for a full genomic resolution was the choice of assembly algorithm.Importantly, a repeat analysis indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this dark matter for de novo genome assembly of prokaryotes. Several of these dark matter genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assemblers capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.

Download Full-text

Whole-Genome Sequencing and Genome-Wide Studies of Spiny Head Croaker (Collichthys lucidus) Reveals Potential Insights for Well-Developed Otoliths in the Family Sciaenidae

Frontiers in Genetics ◽

10.3389/fgene.2021.730255 ◽

2021 ◽

Vol 12 ◽

Author(s):

Wu Gan ◽

Chenxi Zhao ◽

Xinran Liu ◽

Chao Bian ◽

Qiong Shi ◽

...

Keyword(s):

Genome Assembly ◽

Molecular Mechanisms ◽

Gene Families ◽

Northwestern Pacific ◽

Draft Assembly ◽

Genome Wide ◽

Long Reads ◽

The Family ◽

Collichthys Lucidus ◽

Chromosome Level

Spiny head croaker (Collichthys lucidus), belonging to the family Sciaenidae, is a small economic fish with a main distribution in the coastal waters of Northwestern Pacific. Here, we constructed a nonredundant chromosome-level genome assembly of spiny head croaker and also made genome-wide investigations on genome evolution and gene families related to otolith development. A primary genome assembly of 811.23 Mb, with a contig N50 of 74.92 kb, was generated by a combination of 49.12-Gb Illumina clean reads and 35.24 Gb of PacBio long reads. Contigs of this draft assembly were further anchored into chromosomes by integration with additional 185.33-Gb Hi-C data, resulting in a high-quality chromosome-level genome assembly of 817.24 Mb, with an improved scaffold N50 of 26.58 Mb. Based on our phylogenetic analysis, we observed that C. lucidus is much closer to Larimichthys crocea than Miichthys miiuy. We also predicted that many gene families were significantly expanded (p-value <0.05) in spiny head croaker; among them, some are associated with “calcium signaling pathway” and potential “inner ear functions.” In addition, we identified some otolith-related genes (such as otol1a that encodes Otolin-1a) with critical deletions or mutations, suggesting possible molecular mechanisms for well-developed otoliths in the family Sciaenidae.

Download Full-text

A New Dinoflagellate Genome Illuminates a Conserved Gene Cluster Involved in Sunscreen Biosynthesis

Genome Biology and Evolution ◽

10.1093/gbe/evaa235 ◽

2020 ◽

Author(s):

Eiichi Shoguchi ◽

Girish Beedessee ◽

Kanako Hisata ◽

Ipputa Tada ◽

Haruhi Narisoko ◽

...

Keyword(s):

Thermal Tolerance ◽

Genomic Structure ◽

Gc Content ◽

Genomic Diversity ◽

Phylogenetic Position ◽

Neighboring Gene ◽

Protein Coding ◽

The Family ◽

Genomic Resource ◽

Conserved Gene

Abstract Photosynthetic dinoflagellates of the Family Symbiodiniaceae live symbiotically with many organisms that inhabit coral reefs and are currently classified into fifteen groups, including seven genera. Draft genomes from four genera, Symbiodinium, Breviolum, Fugacium, and Cladocopium, which have been isolated from corals, have been reported. However, no genome is available from the genus Durusdinium, which occupies an intermediate phylogenetic position in the Family Symbiodiniaceae and is well known for thermal tolerance (resistance to bleaching). We sequenced, assembled, and annotated the genome of Durusdinium trenchii, isolated from the coral, Favia speciosa, in Okinawa, Japan. Assembled short reads amounted to 670 Mbp with ∼47% GC content. This GC content was intermediate among taxa belonging to the Symbiodiniaceae. Approximately 30,000 protein-coding genes were predicted in the D. trenchii genome, fewer than in other genomes from the Symbiodiniaceae. However, annotations revealed that the D. trenchii genome encodes a cluster of genes for synthesis of mycosporine-like amino acids (MAAs), which absorb UV radiation. Interestingly, a neighboring gene in the cluster encodes a GMC (glucose-methanol-choline) oxidoreductase with an FAD (flavin adenine dinucleotide) domain that is also found in Symbiodinium tridacnidorum. This conservation seems to partially clarify an ancestral genomic structure in the Symbiodiniaceae and its loss in late-branching lineages, including Breviolum and Cladocopium, after splitting from the Durusdinium lineage. Our analysis suggests that approximately half of the taxa in the Symbiodiniaceae may maintain the ability to synthesize MAAs. Thus, this work provides a significant genomic resource for understanding the genomic diversity of Symbiodiniaceae in corals.

Download Full-text

A high-quality de novo genome assembly based on nanopore sequencing of a wild-caught coconut rhinoceros beetle (Oryctes rhinoceros)

10.1101/2021.09.12.459717 ◽

2021 ◽

Author(s):

Igor Filipović ◽

Gordana Rašić ◽

James Hereward ◽

Maria Gharuka ◽

Gregor J Devine ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Nuclear Genome ◽

Assembly Process ◽

Structural Annotation ◽

High Quality ◽

Oryctes Rhinoceros ◽

Rhinoceros Beetle ◽

Long Read ◽

Genome Assemblies

Background: An optimal starting point for relating genome function to organismal biology is a high-quality nuclear genome assembly, and long-read sequencing is revolutionizing the production of this genomic resource in insects. Despite this, nuclear genome assemblies have been under-represented for agricultural insect pests, particularly from the order Coleoptera. Here we present a de novo genome assembly and structural annotation for the coconut rhinoceros beetle, Oryctes rhinoceros (Coleoptera: Scarabaeidae), based on Oxford Nanopore Technologies (ONT) long-read data generated from a wild-caught female, as well as the assembly process that also led to the recovery of the complete circular genome assemblies of the beetle's mitochondrial genome and that of the biocontrol agent, Oryctes rhinoceros nudivirus (OrNV). As an invasive pest of palm trees, O. rhinoceros is undergoing an expansion in its range across the Pacific Islands, requiring new approaches to management that may include strategies facilitated by genome assembly and annotation. Results: High-quality DNA isolated from an adult female was used to create four ONT libraries that were sequenced using four MinION flow cells, producing a total of 27.2 Gb of high-quality long-read sequences. We employed an iterative assembly process and polishing with one lane of high-accuracy Illumina reads, obtaining a final size of the assembly of 377.36 Mb that had high contiguity (fragment N50 length = 12 Mb) and accuracy, as evidenced by the exceptionally high completeness of the benchmarked set of conserved single-copy orthologous genes (BUSCO completeness = 99.11%). These quality metrics place our assembly as the most complete of the published Coleopteran genomes. The structural annotation of the nuclear genome assembly contained a highly-accurate set of 16,371 protein-coding genes showing BUSCO completeness of 92.09%, as well as the expected number of non-coding RNAs and the number and structure of paralogous genes in a gene family like Sigma GST. Conclusions: The genomic resources produced in this study form a foundation for further functional genetic research and management programs that may inform the control and surveillance of O. rhinoceros populations, and we demonstrate the efficacy of de novo genome assembly using long-read ONT data from a single field-caught insect.

Download Full-text