scholarly journals A highly contiguous nuclear genome assembly of the mandarinfish Synchiropus splendidus (Syngnathiformes: Callionymidae)

Author(s):  
Martin Stervander ◽  
William A Cresko

Abstract The fish order Syngnathiformes has been referred to as a collection of misfit fishes, comprising commercially important fish such as red mullets as well as the highly diverse seahorses, pipefishes, and seadragons—the well-known family Syngnathidae, with their unique adaptations including male pregnancy. Another ornate member of this order is the species mandarinfish. No less than two types of chromatophores have been discovered in the spectacularly colored mandarinfish: the cyanophore (producing blue color) and the dichromatic cyano-erythrophore (producing blue and red). The phylogenetic position of mandarinfish in Syngnathiformes, and their promise of additional genetic discoveries beyond the chromatophores, made mandarinfish an appealing target for whole genome sequencing. We used linked sequences to create synthetic long reads, producing a highly contiguous genome assembly for the mandarinfish. The genome assembly comprises 483 Mbp (longest scaffold 29 Mbp), has an N50 of 12 Mbp, and an L50 of 14 scaffolds. The assembly completeness is also high, with 92.6% complete, 4.4% fragmented, and 2.9% missing out of 4,584 BUSCO genes found in ray-finned fishes. Outside the family Syngnathidae, the mandarinfish represents one of the most contiguous syngnathiform genome assemblies to date. The mandarinfish genomic resource will likely serve as a high-quality outgroup to syngnathid fish, and furthermore for research on the genomic underpinnings of the evolution of novel pigmentation.

2021 ◽  
Vol 13 (2) ◽  
Author(s):  
Linlin Zhao ◽  
Shengyong Xu ◽  
Zhiqiang Han ◽  
Qi Liu ◽  
Wensi Ke ◽  
...  

Abstract Argyrosomus japonicus is an economically and ecologically important fish species in the family Sciaenidae with a wide distribution in the world’s oceans. Here, we report a high-quality, chromosome-level genome assembly of A. japonicus based on PacBio and Hi-C sequencing technology. A 673.7-Mb genome containing 282 contigs with an N50 length of 18.4 Mb was obtained based on PacBio long reads. These contigs were further ordered and clustered into 24 chromosome groups based on Hi-C data. In addition, a total of 217.2 Mb (32.24% of the assembled genome) of sequences were identified as repeat elements, and 23,730 protein-coding genes were predicted based on multiple approaches. More than 97% of BUSCO genes were identified in the A. japonicus genome. The high-quality genome assembled in this work not only provides a valuable genomic resource for future population genetics, conservation biology and selective breeding studies of A. japonicus but also lays a solid foundation for the study of Sciaenidae evolution.


2015 ◽  
Author(s):  
Rene L Warren ◽  
Benjamin P Vandervalk ◽  
Steven JM Jones ◽  
Inanc Birol

Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. Established and emerging long read technologies show great promise in this regard, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they could be of value. We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a solution that makes use of the information in error-rich long reads, without the need for read alignment or base correction. We show how the contiguity of an ABySS E. coli K-12 genome assembly could be increased over five-fold by the use of beta-released Oxford Nanopore Ltd. (ONT) long reads and how LINKS leverages long-range information in S. cerevisiae W303 ONT reads to yield an assembly with less than half the errors of competing applications. Re-scaffolding the colossal white spruce assembly draft (PG29, 20 Gbp) and how LINKS scales to larger genomes is also presented. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts. Availability: http://www.bcgsc.ca/bioinfo/software/links


2021 ◽  
Author(s):  
Scott Hotaling ◽  
John S. Sproul ◽  
Jacqueline Heckenhauer ◽  
Ashlyn Powell ◽  
Amanda M. Larracuente ◽  
...  

The first insect genome (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 different insects representing 20 orders. Here, we analyzed the best assembly for each insect and provide a “state of the field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technology. We show that while genomic efforts have been biased towards specific groups (e.g., Diptera), assemblies are generally contiguous with gene regions intact. Most notable, however, has been the impact of long-read sequencing; assemblies that incorporate long-reads are ∼48x more contiguous than those that do not.


Author(s):  
Maribeth H. Ramos ◽  
Trazarah Hanoof E. Argarin ◽  
Beatriz A. Olaivar

Anisakid nematodes are parasites commonly present in the marine environment. Parasites belonging to the family Anisakidae or the genus Anisakis can cause two different clinical manifestations: gastrointestinal disorders and allergic reactions known as anisakiasis. In this study, we examined 7,126 marine fishes belonging to four different commercially-important fish species; Rastrelliger kanagurta, Sardinella lemuru, Atule mate, and Selar crumenophthalmus for the presence of anisakid and other endoparasitic nematode infection. The fishes caught from Tayabas Bay were bought from three different landing sites from March 2017 to February 2018. The gonads, liver, and stomach of each fish species were incubated for 12-18 hours for rapid isolation and endoparasite evaluation. After the isolation of parasites, anisakid nematodes were fixed in vials with 70% ethanol for morphological analysis under the microscope. Six anisakid groups of genera, including Hysterothylacium, Terranova, Anisakis, Contracaecum, Raphidascaris, and Camallanus, and a non-anisakid group Echinorhynchus were identified. The results showed that the prevalence of anisakid infection in all species was 24.18 %, with a mean intensity of infection of 1.91. Rastrelliger kanagurta (Dalahican), Atule mate, and Selar crumenophthalmus were the most infected with 50.90%, 38.98%, and 30.52% prevalence rate, respectively, followed by Rastrelliger kanagurta (San Francisco) (24.18%) and Sardinella lemuru (7.46%). The collected data suggest that commercially-important fish caught in the Tayabas Bay waters are susceptible to parasitization by larvae of the genus Camallanus followed by Hysterothylacium and Terranova in their visceral organs. The prevalence of anisakid infection was almost similar between female (45.3 %) and male (47.21 %) fishes with a mean intensity of 1.95 & 1.96, respectively. Also, larger fishes were heavily infected with anisakid larvae than small fishes. Thus, the intensity and prevalence of the fish parasite can be used as a biological tag for benchmarking and stock assessment purposes.


2021 ◽  
Author(s):  
Lauren Coombe ◽  
Janet X Li ◽  
Theodora Lo ◽  
Johnathan Wong ◽  
Vladimir Nikolic ◽  
...  

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.


2021 ◽  
Author(s):  
Brandon D. Pickett ◽  
Jessica R. Glass ◽  
Perry G. Ridge ◽  
John S. K. Kauwe

ABSTRACTCaranx ignobilis, commonly known as the kingfish or giant trevally, is a large, reef-associated apex predator. It is a prized sportfish, targeted heavily throughout its tropical and subtropical range in the Indian and Pacific Oceans, and it has drawn significant interest in aquaculture due to an unusual tolerance for freshwater. In this study, we present a high-quality nuclear genome assembly of a C. ignobilis individual from Hawaiian waters, which have recently been shown to host a genetically distinct population. The assembly has a contig NG50 of 7.3Mbp and scaffold NG50 of 46.3Mbp. Twenty-five of the 203 scaffolds contain 90% of the genome. We also present the raw Pacific Biosciences continuous long-reads from which the assembly was created. A Hi-C dataset (Dovetail Genomics Omni-C) and Illumina-based RNA-seq from eight tissues are also presented; the latter of which can be particularly useful for annotation and studies of freshwater tolerance. Overall, this genome assembly and supporting data is a valuable tool for ecological and comparative genomics studies of kingfish and other carangoid fishes.


2018 ◽  
Author(s):  
Michael Schmid ◽  
Daniel Frei ◽  
Andrea Patrignani ◽  
Ralph Schlapbach ◽  
Jürg E. Frey ◽  
...  

AbstractGenerating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length. Beyond long repeats, the P19E3 assembly was further complicated by a shufflon region. Its complex genome could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from the Oxford Nanopore Technology. Another important factor for a full genomic resolution was the choice of assembly algorithm.Importantly, a repeat analysis indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this dark matter for de novo genome assembly of prokaryotes. Several of these dark matter genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assemblers capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.


2021 ◽  
Vol 12 ◽  
Author(s):  
Wu Gan ◽  
Chenxi Zhao ◽  
Xinran Liu ◽  
Chao Bian ◽  
Qiong Shi ◽  
...  

Spiny head croaker (Collichthys lucidus), belonging to the family Sciaenidae, is a small economic fish with a main distribution in the coastal waters of Northwestern Pacific. Here, we constructed a nonredundant chromosome-level genome assembly of spiny head croaker and also made genome-wide investigations on genome evolution and gene families related to otolith development. A primary genome assembly of 811.23 Mb, with a contig N50 of 74.92 kb, was generated by a combination of 49.12-Gb Illumina clean reads and 35.24 Gb of PacBio long reads. Contigs of this draft assembly were further anchored into chromosomes by integration with additional 185.33-Gb Hi-C data, resulting in a high-quality chromosome-level genome assembly of 817.24 Mb, with an improved scaffold N50 of 26.58 Mb. Based on our phylogenetic analysis, we observed that C. lucidus is much closer to Larimichthys crocea than Miichthys miiuy. We also predicted that many gene families were significantly expanded (p-value <0.05) in spiny head croaker; among them, some are associated with “calcium signaling pathway” and potential “inner ear functions.” In addition, we identified some otolith-related genes (such as otol1a that encodes Otolin-1a) with critical deletions or mutations, suggesting possible molecular mechanisms for well-developed otoliths in the family Sciaenidae.


Author(s):  
Eiichi Shoguchi ◽  
Girish Beedessee ◽  
Kanako Hisata ◽  
Ipputa Tada ◽  
Haruhi Narisoko ◽  
...  

Abstract Photosynthetic dinoflagellates of the Family Symbiodiniaceae live symbiotically with many organisms that inhabit coral reefs and are currently classified into fifteen groups, including seven genera. Draft genomes from four genera, Symbiodinium, Breviolum, Fugacium, and Cladocopium, which have been isolated from corals, have been reported. However, no genome is available from the genus Durusdinium, which occupies an intermediate phylogenetic position in the Family Symbiodiniaceae and is well known for thermal tolerance (resistance to bleaching). We sequenced, assembled, and annotated the genome of Durusdinium trenchii, isolated from the coral, Favia speciosa, in Okinawa, Japan. Assembled short reads amounted to 670 Mbp with ∼47% GC content. This GC content was intermediate among taxa belonging to the Symbiodiniaceae. Approximately 30,000 protein-coding genes were predicted in the D. trenchii genome, fewer than in other genomes from the Symbiodiniaceae. However, annotations revealed that the D. trenchii genome encodes a cluster of genes for synthesis of mycosporine-like amino acids (MAAs), which absorb UV radiation. Interestingly, a neighboring gene in the cluster encodes a GMC (glucose-methanol-choline) oxidoreductase with an FAD (flavin adenine dinucleotide) domain that is also found in Symbiodinium tridacnidorum. This conservation seems to partially clarify an ancestral genomic structure in the Symbiodiniaceae and its loss in late-branching lineages, including Breviolum and Cladocopium, after splitting from the Durusdinium lineage. Our analysis suggests that approximately half of the taxa in the Symbiodiniaceae may maintain the ability to synthesize MAAs. Thus, this work provides a significant genomic resource for understanding the genomic diversity of Symbiodiniaceae in corals.


2021 ◽  
Author(s):  
Igor Filipović ◽  
Gordana Rašić ◽  
James Hereward ◽  
Maria Gharuka ◽  
Gregor J Devine ◽  
...  

Background: An optimal starting point for relating genome function to organismal biology is a high-quality nuclear genome assembly, and long-read sequencing is revolutionizing the production of this genomic resource in insects. Despite this, nuclear genome assemblies have been under-represented for agricultural insect pests, particularly from the order Coleoptera. Here we present a de novo genome assembly and structural annotation for the coconut rhinoceros beetle, Oryctes rhinoceros (Coleoptera: Scarabaeidae), based on Oxford Nanopore Technologies (ONT) long-read data generated from a wild-caught female, as well as the assembly process that also led to the recovery of the complete circular genome assemblies of the beetle's mitochondrial genome and that of the biocontrol agent, Oryctes rhinoceros nudivirus (OrNV). As an invasive pest of palm trees, O. rhinoceros is undergoing an expansion in its range across the Pacific Islands, requiring new approaches to management that may include strategies facilitated by genome assembly and annotation. Results: High-quality DNA isolated from an adult female was used to create four ONT libraries that were sequenced using four MinION flow cells, producing a total of 27.2 Gb of high-quality long-read sequences. We employed an iterative assembly process and polishing with one lane of high-accuracy Illumina reads, obtaining a final size of the assembly of 377.36 Mb that had high contiguity (fragment N50 length = 12 Mb) and accuracy, as evidenced by the exceptionally high completeness of the benchmarked set of conserved single-copy orthologous genes (BUSCO completeness = 99.11%). These quality metrics place our assembly as the most complete of the published Coleopteran genomes. The structural annotation of the nuclear genome assembly contained a highly-accurate set of 16,371 protein-coding genes showing BUSCO completeness of 92.09%, as well as the expected number of non-coding RNAs and the number and structure of paralogous genes in a gene family like Sigma GST. Conclusions: The genomic resources produced in this study form a foundation for further functional genetic research and management programs that may inform the control and surveillance of O. rhinoceros populations, and we demonstrate the efficacy of de novo genome assembly using long-read ONT data from a single field-caught insect.


Sign in / Sign up

Export Citation Format

Share Document