scholarly journals A Draft Genome Assembly of “Cas” (Psidium Friedrichsthalianum (O. Berg) Nied): An Indigenous Crop of Costa Rica Untapped.

Author(s):  
Mónica Rojas-Gómez ◽  
Jose Pablo Jiménez-Madrigal ◽  
Maripaz Montero-Vargas ◽  
Randall Loaiza-Montoya ◽  
Max Chavarría ◽  
...  

Abstract Psidium friedrichsthalianum (O. Berg) Nied is a tropical tree species in the Myrtaceae family, natively distributed from southern Mexico, to eastern Venezuela and Ecuador and commonly known as "Cas'', "Costa Rican guava" or “Sour Guava”. The “Cas” produces a fruit with a rather distinctive acidic flavor and has bioactive compounds and biological potential equal or greater than common Guava; is considered an indigenous crop in Costa Rica with characteristics as a functional food untapped. This species has not been completely domesticated, and can be found in home gardens, paddocks, small groups, and, more recently, in small and medium sized plantations. Also, the plantations of this species do not have technical and scientific support or agronomic promotion from industry, nor are there genetic resources or germplasm readily available to farmers. This limits its commercial development and the implementation of selection or genetic improvement programs. In this study, we present the first draft assembly of the Cas genome using PacBio long reads and the Canu assembly pipeline. Our draft assembly has a total length of 417.64 Mb, with 24 440 contigs and a N50 contig size of 21.3 Kb. Structural annotation resulted in 59 036 gene models. Functional annotation was conducted against the non-redundant set of genes from the KEGG database. Of the 52 422 complete genes models, 15.55% (8 153) presented homology with KEGG orthologs. The genes found in our Cas draft assembly were compared to those found in Eucalyptus grandis (rose gum) [erg] in the KEGG repository. According to the KEGG pathway assignments, 33 isoforms were annotated as part of the flavonoid biosynthetic pathway. In addition, 19 isoforms were annotated as part of phenylpropanoid biosynthetic pathway. The results of this study provide an overview of the first draft of the Cas genome assembly using PacBio long reads. This new genomic resource represents the basis for exploring the genetic potential of this crop with characteristics as a functional food.

2021 ◽  
Vol 12 ◽  
Author(s):  
Wu Gan ◽  
Chenxi Zhao ◽  
Xinran Liu ◽  
Chao Bian ◽  
Qiong Shi ◽  
...  

Spiny head croaker (Collichthys lucidus), belonging to the family Sciaenidae, is a small economic fish with a main distribution in the coastal waters of Northwestern Pacific. Here, we constructed a nonredundant chromosome-level genome assembly of spiny head croaker and also made genome-wide investigations on genome evolution and gene families related to otolith development. A primary genome assembly of 811.23 Mb, with a contig N50 of 74.92 kb, was generated by a combination of 49.12-Gb Illumina clean reads and 35.24 Gb of PacBio long reads. Contigs of this draft assembly were further anchored into chromosomes by integration with additional 185.33-Gb Hi-C data, resulting in a high-quality chromosome-level genome assembly of 817.24 Mb, with an improved scaffold N50 of 26.58 Mb. Based on our phylogenetic analysis, we observed that C. lucidus is much closer to Larimichthys crocea than Miichthys miiuy. We also predicted that many gene families were significantly expanded (p-value <0.05) in spiny head croaker; among them, some are associated with “calcium signaling pathway” and potential “inner ear functions.” In addition, we identified some otolith-related genes (such as otol1a that encodes Otolin-1a) with critical deletions or mutations, suggesting possible molecular mechanisms for well-developed otoliths in the family Sciaenidae.


Author(s):  
Guangtu Gao ◽  
Susana Magadan ◽  
Geoffrey C Waldbieser ◽  
Ramey C Youngblood ◽  
Paul A Wheeler ◽  
...  

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lidong Guo ◽  
Mengyang Xu ◽  
Wenchao Wang ◽  
Shengqiang Gu ◽  
Xia Zhao ◽  
...  

Abstract Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hong-Lei Li ◽  
Lin Wu ◽  
Zhaoming Dong ◽  
Yusong Jiang ◽  
Sanjie Jiang ◽  
...  

AbstractGinger (Zingiber officinale), the type species of Zingiberaceae, is one of the most widespread medicinal plants and spices. Here, we report a high-quality, chromosome-scale reference genome of ginger ‘Zhugen’, a traditionally cultivated ginger in Southwest China used as a fresh vegetable, assembled from PacBio long reads, Illumina short reads, and high-throughput chromosome conformation capture (Hi-C) reads. The ginger genome was phased into two haplotypes, haplotype 1 (1.53 Gb with a contig N50 of 4.68 M) and haplotype 0 (1.51 Gb with a contig N50 of 5.28 M). Homologous ginger chromosomes maintained excellent gene pair collinearity. In 17,226 pairs of allelic genes, 11.9% exhibited differential expression between alleles. Based on the results of ginger genome sequencing, transcriptome analysis, and metabolomic analysis, we proposed a backbone biosynthetic pathway of gingerol analogs, which consists of 12 enzymatic gene families, PAL, C4H, 4CL, CST, C3’H, C3OMT, CCOMT, CSE, PKS, AOR, DHN, and DHT. These analyses also identified the likely transcription factor networks that regulate the synthesis of gingerol analogs. Overall, this study serves as an excellent resource for further research on ginger biology and breeding, lays a foundation for a better understanding of ginger evolution, and presents an intact biosynthetic pathway for species-specific gingerol biosynthesis.


Author(s):  
Martin Stervander ◽  
William A Cresko

Abstract The fish order Syngnathiformes has been referred to as a collection of misfit fishes, comprising commercially important fish such as red mullets as well as the highly diverse seahorses, pipefishes, and seadragons—the well-known family Syngnathidae, with their unique adaptations including male pregnancy. Another ornate member of this order is the species mandarinfish. No less than two types of chromatophores have been discovered in the spectacularly colored mandarinfish: the cyanophore (producing blue color) and the dichromatic cyano-erythrophore (producing blue and red). The phylogenetic position of mandarinfish in Syngnathiformes, and their promise of additional genetic discoveries beyond the chromatophores, made mandarinfish an appealing target for whole genome sequencing. We used linked sequences to create synthetic long reads, producing a highly contiguous genome assembly for the mandarinfish. The genome assembly comprises 483 Mbp (longest scaffold 29 Mbp), has an N50 of 12 Mbp, and an L50 of 14 scaffolds. The assembly completeness is also high, with 92.6% complete, 4.4% fragmented, and 2.9% missing out of 4,584 BUSCO genes found in ray-finned fishes. Outside the family Syngnathidae, the mandarinfish represents one of the most contiguous syngnathiform genome assemblies to date. The mandarinfish genomic resource will likely serve as a high-quality outgroup to syngnathid fish, and furthermore for research on the genomic underpinnings of the evolution of novel pigmentation.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2071 ◽  
Author(s):  
María de Lourdes Adriano-Anaya ◽  
Edilma Pérez-Castillo ◽  
Miguel Salvador-Figueroa ◽  
Sonia Ruiz-González ◽  
Alfredo Vázquez-Ovando ◽  
...  

Sex expression and floral morphology studies are central to understand breeding behavior and to define the productive potential of plant genotypes. In particular, the new bioenergy cropJatropha curcasL. has been classified as a monoecious species. Nonetheless, there is no information about its reproductive diversity in the Mesoamerican region, which is considered its center of origin and diversification. Thus, we determined sex expression and floral morphology inJ. curcaspopulations from southern Mexico and Guatemala. Our results showed that most ofJ. curcasspecimens had typical inflorescences with separate sexes (monoecious); meanwhile, the rest were atypical (gynoecious, androecious, andromonoecious, androgynomonoecious). The most important variables to group these populations, based on a discriminant analysis, were: male flower diameter, female petal length and male nectary length. From southern Mexico “Guerrero” was the most diverse population, and “Centro” had the highest variability among the populations from Chiapas. A cluster analysis showed that the accessions from southern Mexico were grouped without showing any correlation with the geographical origin, while those accessions with atypical sexuality were grouped together. To answer the question of how informative are floral morphological traits compared to molecular markers, we perform a Mantel correlation test between the distance matrix generated in this study and the genetic distance matrix (AFLP) previously reported for the same accessions. We found significant correlation between data at the level of accessions. Our results contribute to design genetic improvement programs by using sexually and morphologically contrasting plants from the center of origin.


2020 ◽  
Author(s):  
Yichun Xie ◽  
Yiyi Zhong ◽  
Jinhui Chang ◽  
Hoi Shan Kwan

AbstractThe homokaryotic Coprinopsis cinerea strain A43mut B43mut pab1-1 #326 is a widely used experimental model for developmental studies in mushroom-forming fungi. It can grow on defined artificial media and complete the whole lifecycle within two weeks. The mutations in mating type factors A and B result in the special feature of clamp formation and fruiting without mating. This feature allows investigations and manipulations with a homokaryotic genetic background. Current genome assembly of strain #326 was based on short-read sequencing data and was highly fragmented, leading to the bias in gene annotation and downstream analyses. Here, we report a chromosome-level genome assembly of strain #326. Oxford Nanopore Technology (ONT) MinION sequencing was used to get long reads. Illumina short reads was used to polish the sequences. A combined assembly yield 13 chromosomes and a mitochondrial genome as individual scaffolds. The assembly has 15,250 annotated genes with a high synteny with the C. cinerea strain Okayama-7 #130. This assembly has great improvement on contiguity and annotations. It is a suitable reference for further genomic studies, especially for the genetic, genomic and transcriptomic analyses in ONT long reads. Single nucleotide variants and structural variants in six mutagenized and cisplatin-screened mutants could be identified and validated. A 66 bp deletion in Ras GTPase-activating protein (RasGAP) was found in all mutants. To make a better use of ONT sequencing platform, we modified a high-molecular-weight genomic DNA isolation protocol based on magnetic beads for filamentous fungi. This study showed the use of MinION to construct a fungal reference genome and to perform downstream studies in an individual laboratory. An experimental workflow was proposed, from DNA isolation and whole genome sequencing, to genome assembly and variant calling. Our results provided solutions and parameters for fungal genomic analysis on MinION sequencing platform.HighlightA chromosome-level genome assembly of C. cinerea #326A fast and efficient high-molecular-weight fungal genomic DNA isolation protocolStructural variant and single nucleotide variant calling using Nanopore readsA series of solutions and reference parameters for fungal genomic analysis on MinION


GigaScience ◽  
2020 ◽  
Vol 9 (8) ◽  
Author(s):  
Eugenie C Yen ◽  
Shane A McCarthy ◽  
Juan A Galarza ◽  
Tomas N Generalovic ◽  
Sarah Pelan ◽  
...  

ABSTRACT Background Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes, which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. Findings We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked reads. Both assemblies are contiguous (mean scaffold N50: 8.2 Mb) and complete (mean BUSCO completeness: 97.3%), with annotations and 31 chromosomes identified through karyotyping. We used the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from 5 populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. Conclusions We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes. Using our assembly, we provide genomic insights into the geographic population structure of A. plantaginis.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner


2020 ◽  
Vol 10 (5) ◽  
pp. 1495-1501 ◽  
Author(s):  
Tsuyoshi Tanaka ◽  
Ryo Nishijima ◽  
Shota Teramoto ◽  
Yuka Kitomi ◽  
Takeshi Hayashi ◽  
...  

IR64 is a rice variety with high-yield that has been widely cultivated around the world. IR64 has been replaced by modern varieties in most growing areas. Given that modern varieties are mostly progenies or relatives of IR64, genetic analysis of IR64 is valuable for rice functional genomics. However, chromosome-level genome sequences of IR64 have not been available previously. Here, we sequenced the IR64 genome using synthetic long reads obtained by linked-read sequencing and ultra-long reads obtained by nanopore sequencing. We integrated these data and generated the de novo assembly of the IR64 genome of 367 Mb, equivalent to 99% of the estimated size. Continuity of the IR64 genome assembly was improved compared with that of a publicly available IR64 genome assembly generated by short reads only. We annotated 41,458 protein-coding genes, including 657 IR64-specific genes, that are missing in other high-quality rice genome assemblies IRGSP-1.0 of japonica cultivar Nipponbare or R498 of indica cultivar Shuhui498. The IR64 genome assembly will serve as a genome resource for rice functional genomics as well as genomics-driven and/or molecular breeding.


Sign in / Sign up

Export Citation Format

Share Document