scholarly journals Chromosome-scale assembly of the Sparassis latifolia genome obtained using long-read and Hi-C sequencing

2021 ◽  
Author(s):  
Chi yang ◽  
Lu Ma ◽  
Donglai Xiao ◽  
Xiaoyu Liu ◽  
Xiaoling Jiang ◽  
...  

Sparassis latifolia is a valuable edible mushroom cultivated in China. In 2018, our research group reported an incomplete and low quality genome of S. latifolia was obtained by Illumina HiSeq 2500 sequencing. These limitations in the available genome have constrained genetic and genomic studies in this mushroom resource. Herein, an updated draft genome sequence of S. latifolia was generated by Oxford Nanopore sequencing and the Hi-C technique. A total of 8.24 Gb of Oxford Nanopore long reads representing ~198.08X coverage of the S. latifolia genome were generated. Subsequently, a high-quality genome of 41.41 Mb, with scaffold and contig N50 sizes of 3.31 Mb and 1.51 Mb, respectively, was assembled. Hi-C scaffolding of the genome resulted in 12 pseudochromosomes containing 93.56% of the bases in the assembled genome. Genome annotation further revealed that 17.47% of the genome was composed of repetitive sequences. In addition, 13,103 protein-coding genes were predicted, among which 98.72% were functionally annotated. BUSCO assay results further revealed that there were 92.07% complete BUSCOs. The improved chromosome-scale assembly and genome features described here will aid further molecular elucidation of various traits, breeding of S. latifolia, and evolutionary studies with related taxa.

Author(s):  
Chi Yang ◽  
Lu Ma ◽  
Donglai Xiao ◽  
Xiaoyu Liu ◽  
Xiaoling Jiang ◽  
...  

Abstract Sparassis latifolia is a valuable edible mushroom cultivated in China. In 2018, our research group reported an incomplete and low-quality genome of S. latifolia obtained by Illumina HiSeq 2500 sequencing. These limitations in the available genome have constrained genetic and genomic studies in this mushroom resource. Herein, an updated draft genome sequence of S. latifolia was generated by Oxford Nanopore sequencing and the Hi-C technique. A total of 8.24 Gb of Oxford Nanopore long reads representing ∼198.08X coverage of the S. latifolia genome were generated. Subsequently, a high-quality genome of 41.41 Mb, with scaffold and contig N50 sizes of 3.31 Mb and 1.51 Mb, respectively, was assembled. Hi-C scaffolding of the genome resulted in 12 pseudochromosomes containing 93.56% of the bases in the assembled genome. Genome annotation further revealed that 17.47% of the genome was composed of repetitive sequences. In addition, 13,103 protein-coding genes were predicted, among which 98.72% were functionally annotated. BUSCO assay results further revealed that there were 92.07% complete BUSCOs. The improved chromosome-scale assembly and genome features described here will aid further molecular elucidation of various traits, breeding of S. latifolia, and evolutionary studies with related taxa.


2021 ◽  
Author(s):  
Teng Li ◽  
David Kainer ◽  
William J Foley ◽  
Allen Rodrigo ◽  
Carsten Kuelheim

Eucalyptus polybractea is a small, multi-stemmed tree, which is widely cultivated in Australia for the production of Eucalyptus oil. We report the hybrid assembly of the E. polybractea genome utilizing both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83 and 15 times genome coverage, respectively. The hybrid-assembled genome, after polishing, contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de novo approaches. We have provided the first assembled genome based on hybrid sequences from the highly diverse Eucalyptus subgenus Symphyomyrtus, and revealed the value of including long-reads from Nanopore technology for enhancing the contiguity of the assembled genome, as well as for improving its completeness. We anticipate that the E. polybractea genome will be an invaluable resource supporting a range of studies in genetics, population genomics and evolution of related species in Eucalyptus.


2021 ◽  
Vol 10 (22) ◽  
Author(s):  
Chanakya Pachi Pulusu ◽  
Balaram Khamari ◽  
Manmath Lama ◽  
Arun Sai Kumar Peketi ◽  
Prakash Kumar ◽  
...  

The draft genome of pandrug-resistant Pseudomonas aeruginosa strain SPA03, which belongs to global high-risk sequence type 357 (ST357) and was isolated from a patient with benign prostatic hyperplasia, is presented in this report. The genome assembly was generated by combining short-read Illumina HiSeq-X Ten and long-read Oxford Nanopore Technologies MinION sequence data using the Unicycler assembler.


2019 ◽  
Author(s):  
Mengyang Xu ◽  
Xiaoshan Su ◽  
Mengqi Zhang ◽  
Ming Li ◽  
Xiaoyun Huang ◽  
...  

AbstractThe long-spine porcupinefish, Diodon holocanthus (Diodontidae, Tetraodontiformes, Actinopterygii), also known as the freckled porcupinefish, attracts great interest of ecology and economy. Its distinct characteristics including inflation reaction, spiny skin and tetradotoxin, however, have not been fully studied without a complete genome assembly.In this study, the whole genome of a single individual was sequenced using single tube-Long Fragment Read co-barcode reads, generating 154.3 Gb of paired-end data (219.8× depth). The gap was further filled using small amount of Oxford Nanopore MinION long read dataset (11.4Gb, 15.9× depth). Taking full use of long, medium, short-range of genome assembly information, the final assembled sequences with a total length of 650.02 Mb obtained contig and scaffold N50 sizes of 2.15 Mb and 8.13 Mb, respectively, despite of high repetitive content. Benchmarking Universal Single-Copy Orthologs captured 95.7% (2,474) of core genes to assess the completeness. In addition, 206.5 Mb (32.10%) of repetitive sequences were identified, and 20,840 protein-coding genes were annotated, among which 18,281 (87.72%) proteins were assigned with possible functions.This is the first demonstration of de novo genome of the porcupinefish, which will benefit downstream analysis of ontogeny, phylogeny, and evolution, and improve the exploration of its unique defensive mechanism.


GigaScience ◽  
2019 ◽  
Vol 8 (7) ◽  
Author(s):  
Chang-Ming Bai ◽  
Lu-Sheng Xin ◽  
Umberto Rosani ◽  
Biao Wu ◽  
Qing-Chen Wang ◽  
...  

Abstract Background The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae. Findings A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated. Conclusions We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector.


2020 ◽  
Author(s):  
Shangang Jia ◽  
Guoliang Wang ◽  
Guiming Liu ◽  
Jiangyong Qu ◽  
Beilun Zhao ◽  
...  

ABSTRACTThe red algae Kappaphycus alvarezii is the most important aquaculture species in Kappaphycus, widely distributed in tropical waters, and it has become the main crop of carrageenan production at present. The mechanisms of adaptation for high temperature, high salinity environments and carbohydrate metabolism may provide an important inspiration for marine algae study. Scientific background knowledge such as genomic data will be also essential to improve disease resistance and production traits of K. alvarezii. 43.28 Gb short paired-end reads and 18.52 Gb single-molecule long reads of K. alvarezii were generated by Illumina HiSeq platform and Pacbio RSII platform respectively. The de novo genome assembly was performed using Falcon_unzip and Canu software, and then improved with Pilon. The final assembled genome (336 Mb) consists of 888 scaffolds with a contig N50 of 849 Kb. Further annotation analyses predicted 21,422 protein-coding genes, with 61.28% functionally annotated. Here we report the draft genome and annotations of K. alvarezii, which are valuable resources for future genomic and genetic studies in Kappaphycus and other algae.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Chunqing Ou ◽  
Fei Wang ◽  
Jiahong Wang ◽  
Song Li ◽  
Yanjie Zhang ◽  
...  

Abstract‘Zhongai 1’ [(Pyrus ussuriensis × communis) × spp.] is an excellent pear dwarfing rootstock common in China. It is dwarf itself and has high dwarfing efficiency on most of main Pyrus cultivated species when used as inter-stock. Here we describe the draft genome sequences of ‘Zhongai 1’ which was assembled using PacBio long reads, Illumina short reads and Hi-C technology. We estimated the genome size is approximately 511.33 Mb by K-mer analysis and obtained a final genome of 510.59 Mb with a contig N50 size of 1.28 Mb. Next, 506.31 Mb (99.16%) of contigs were clustered into 17 chromosomes with a scaffold N50 size of 23.45 Mb. We further predicted 309.86 Mb (60.68%) of repetitive sequences and 43,120 protein-coding genes. The assembled genome will be a valuable resource and reference for future pear breeding, genetic improvement, and comparative genomics among related species. Moreover, it will help identify genes involved in dwarfism, early flowering, stress tolerance, and commercially desirable fruit characteristics.


Author(s):  
Jeffrey M Skerker ◽  
Kaila M Pianalto ◽  
Stephen J Mondo ◽  
Kunlong Yang ◽  
Adam P Arkin ◽  
...  

Abstract Aspergillus flavus is an opportunistic pathogen of crops, including peanuts and maize, and is the second leading cause of aspergillosis in immunocompromised patients. A. flavus is also a major producer of the mycotoxin, aflatoxin, a potent carcinogen, which results in significant crop losses annually. The A. flavus isolate NRRL 3357 was originally isolated from peanut and has been used as a model organism for understanding the regulation and production of secondary metabolites, such as aflatoxin. A draft genome of NRRL 3357 was previously constructed, enabling the development of molecular tools and for understanding population biology of this particular species. Here, we describe an updated, near complete, telomere-to-telomere assembly and re-annotation of the eight chromosomes of A. flavus NRRL 3357 genome, accomplished via long-read PacBio and Oxford Nanopore technologies combined with Illumina short-read sequencing. A total of 13,715 protein-coding genes were predicted. Using RNA-seq data, a significant improvement was achieved in predicted 5’ and 3’ untranslated regions, which were incorporated into the new gene models.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2021 ◽  
Vol 11 (2) ◽  
Author(s):  
Suzanne V Saenko ◽  
Dick S J Groenenberg ◽  
Angus Davison ◽  
Menno Schilthuizen

Abstract Studies on the shell color and banding polymorphism of the grove snail Cepaea nemoralis and the sister taxon Cepaea hortensis have provided compelling evidence for the fundamental role of natural selection in promoting and maintaining intraspecific variation. More recently, Cepaea has been the focus of citizen science projects on shell color evolution in relation to climate change and urbanization. C. nemoralis is particularly useful for studies on the genetics of shell polymorphism and the evolution of “supergenes,” as well as evo-devo studies of shell biomineralization, because it is relatively easily maintained in captivity. However, an absence of genomic resources for C. nemoralis has generally hindered detailed genetic and molecular investigations. We therefore generated ∼23× coverage long-read data for the ∼3.5 Gb genome, and produced a draft assembly composed of 28,537 contigs with the N50 length of 333 kb. Genome completeness, estimated by BUSCO using the metazoa dataset, was 91%. Repetitive regions cover over 77% of the genome. A total of 43,519 protein-coding genes were predicted in the assembled genome, and 97.3% of these were functionally annotated from either sequence homology or protein signature searches. This first assembled and annotated genome sequence for a helicoid snail, a large group that includes edible species, agricultural pests, and parasite hosts, will be a core resource for identifying the loci that determine the shell polymorphism, as well as in a wide range of analyses in evolutionary and developmental biology, and snail biology in general.


Sign in / Sign up

Export Citation Format

Share Document