A Chromosome-Level Genome Assembly of Dendrobium Huoshanense Using Long Reads and Hi-C Data

Bangxing Han; Yi Jing; Jun Dai; Tao Zheng; Fangli Gu; Qun Zhao; Fucheng Zhu; Xiangwen Song; Hui Deng; Peipei Wei; Cheng Song; Dong Liu; Xueping Jiang; Fang Wang; Yanjun Chen; Chuanbo Sun; Houjun Yao; Li Zhang; Naidong Chen; Shaotong Chen; Xiaoli Li; Yuan Wei; Zhen Ouyang; Hui Yan; Jiangjie Lu; Huizhong Wang; Lanping Guo; Lingdong Kong; Jing Zhao; Shaoping Li; Lifen Luo; Karsten Kristiansen; Zhan Feng; Silong Sun; Cunwu Chen; Zhen Yue; Naifu Chen

doi:10.1093/gbe/evaa215

A Chromosome-Level Genome Assembly of Dendrobium Huoshanense Using Long Reads and Hi-C Data

Genome Biology and Evolution ◽

10.1093/gbe/evaa215 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2486-2490

Author(s):

Bangxing Han ◽

Yi Jing ◽

Jun Dai ◽

Tao Zheng ◽

Fangli Gu ◽

...

Keyword(s):

Genome Assembly ◽

Sequencing Data ◽

Active Components ◽

Protein Coding ◽

Polysaccharide Biosynthesis ◽

Functional Studies ◽

Long Reads ◽

Dendrobium Huoshanense ◽

Paired End Sequencing ◽

Chromosome Level

Abstract Dendrobium huoshanense is used to treat various diseases in traditional Chinese medicine. Recent studies have identified active components. However, the lack of genomic data limits research on the biosynthesis and application of these therapeutic ingredients. To address this issue, we generated the first chromosome-level genome assembly and annotation of D. huoshanense. We integrated PacBio sequencing data, Illumina paired-end sequencing data, and Hi-C sequencing data to assemble a 1.285 Gb genome, with contig and scaffold N50 lengths of 598 kb and 71.79 Mb, respectively. We annotated 21,070 protein-coding genes and 0.96 Gb transposable elements, constituting 74.92% of the whole assembly. In addition, we identified 252 genes responsible for polysaccharide biosynthesis by Kyoto Encyclopedia of Genes and Genomes functional annotation. Our data provide a basis for further functional studies, particularly those focused on genes related to glycan biosynthesis and metabolism, and have implications for both conservation and medicine.

Download Full-text

LRSDAY: Long-read Sequencing Data Analysis for Yeasts

10.1101/184572 ◽

2017 ◽

Author(s):

Jia-Xing Yue ◽

Gianni Liti

Keyword(s):

Genome Assembly ◽

Model Organism ◽

Sequencing Data ◽

Protein Coding ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Downstream Analysis ◽

Eukaryotic Organisms ◽

Genomic Regions

AbstractLong-read sequencing technologies have become increasingly popular in genome projects due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast, Saccharomyces cerevisiae, has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here we present LRSDAY, the first one-stop solution to streamline this process. LRSDAY can produce chromosome-level end-to-end genome assembly and comprehensive annotations for various genomic features (including centromeres, protein-coding genes, tRNAs, transposable elements and telomere-associated elements) that are ready for downstream analysis. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable for virtually any eukaryotic organisms. Applying LRSDAY to a S. cerevisiae strain takes ∼43 hrs to generate a complete and well-annotated genome from ∼100X Pacific Biosciences (PacBio) reads using four threads.

Download Full-text

Chromosome-level genome assembly reveals the unique genome evolution of the swimming crab (Portunus trituberculatus)

GigaScience ◽

10.1093/gigascience/giz161 ◽

2020 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Boping Tang ◽

Daizhen Zhang ◽

Haorong Li ◽

Senhao Jiang ◽

Huabin Zhang ◽

...

Keyword(s):

Genome Assembly ◽

Repetitive Sequences ◽

Asia Pacific ◽

Portunus Trituberculatus ◽

Swimming Crab ◽

Protein Coding ◽

Eukaryotic Genes ◽

Long Reads ◽

Commercial Species ◽

Chromosome Level

Abstract Background The swimming crab, Portunus trituberculatus, is an important commercial species in China and is widely distributed in the coastal waters of Asia-Pacific countries. Despite increasing interest in swimming crab research, a high-quality chromosome-level genome is still lacking. Findings Here, we assembled the first chromosome-level reference genome of P. trituberculatus by combining the short reads, Nanopore long reads, and Hi-C data. The genome assembly size was 1.00 Gb with a contig N50 length of 4.12 Mb. In addition, BUSCO assessment indicated that 94.7% of core eukaryotic genes were present in the genome assembly. Approximately 54.52% of the genome was identified as repetitive sequences, with a total of 16,796 annotated protein-coding genes. In addition, we anchored contigs into chromosomes and identified 50 chromosomes with an N50 length of 21.80 Mb by Hi-C technology. Conclusions We anticipate that this chromosome-level assembly of the P. trituberculatus genome will not only promote study of basic development and evolution but also provide important resources for swimming crab reproduction.

Download Full-text

Chromosome-Level Genome Assembly and Annotation of a Sciaenid Fish, Argyrosomus japonicus

Genome Biology and Evolution ◽

10.1093/gbe/evaa246 ◽

2021 ◽

Vol 13 (2) ◽

Author(s):

Linlin Zhao ◽

Shengyong Xu ◽

Zhiqiang Han ◽

Qi Liu ◽

Wensi Ke ◽

...

Keyword(s):

Genome Assembly ◽

Wide Distribution ◽

High Quality ◽

Protein Coding ◽

Repeat Elements ◽

Long Reads ◽

The Family ◽

Solid Foundation ◽

Genomic Resource ◽

Chromosome Level

Abstract Argyrosomus japonicus is an economically and ecologically important fish species in the family Sciaenidae with a wide distribution in the world’s oceans. Here, we report a high-quality, chromosome-level genome assembly of A. japonicus based on PacBio and Hi-C sequencing technology. A 673.7-Mb genome containing 282 contigs with an N50 length of 18.4 Mb was obtained based on PacBio long reads. These contigs were further ordered and clustered into 24 chromosome groups based on Hi-C data. In addition, a total of 217.2 Mb (32.24% of the assembled genome) of sequences were identified as repeat elements, and 23,730 protein-coding genes were predicted based on multiple approaches. More than 97% of BUSCO genes were identified in the A. japonicus genome. The high-quality genome assembled in this work not only provides a valuable genomic resource for future population genetics, conservation biology and selective breeding studies of A. japonicus but also lays a solid foundation for the study of Sciaenidae evolution.

Download Full-text

A chromosome-level genome assembly and annotation of the humpback grouper Cromileptes altivelas

10.1101/2020.06.22.164277 ◽

2020 ◽

Author(s):

Yun Sun ◽

Dongdong Zhang ◽

Jianzhi Shi ◽

Guisen Chen ◽

Ying Wu ◽

...

Keyword(s):

Genome Assembly ◽

Chromosome Conformation ◽

Protein Coding ◽

Market Values ◽

Final Assembly ◽

Long Reads ◽

High Gene ◽

Food Fish ◽

High Quality Genome ◽

Chromosome Level

AbstractCromileptes altivelas that belongs to Serranidae in the order Perciformes, is widely distributed throughout the tropical waters of the Indo-West Pacific regions. Due to their excellent food quality and abundant nutrients, it has become a popular marine food fish with high market values. Here, we reported a chromosome-level genome assembly and annotation of the humpback grouper genome using more than 103X PacBio long-reads and high-throughput chromosome conformation capture (Hi-C) technologies. The N50 contig length of the assembly is as large as 4.14 Mbp, the final assembly is 1.07 Gb with N50 of scaffold 44.78 Mb, and 99.24% of the scaffold sequences were anchored into 24 chromosomes. The high-quality genome assembly also showed high gene completeness with 27,067 protein coding genes and 3,710 ncRNAs. This high accurate genome assembly and annotation will not only provide an essential genome resource for C. altivelas breeding and restocking, but will also serve as a key resource for studying fish genomics and genetics.

Download Full-text

A chromosome-level genome assembly of the Chinese tupelo Nyssa sinensis

Scientific Data ◽

10.1038/s41597-019-0296-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Xuchen Yang ◽

Minghui Kang ◽

Yanting Yang ◽

Haifeng Xiong ◽

Mingcheng Wang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Chromosome Conformation ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Data Matching ◽

Long Reads ◽

Autumn Leaf ◽

Chromosome Level

AbstractThe deciduous Chinese tupelo (Nyssa sinensis Oliv.) is a popular ornamental tree for the spectacular autumn leaf color. Here, using single-molecule sequencing and chromosome conformation capture data, we report a high-quality, chromosome-level genome assembly of N. sinensis. PacBio long reads were de novo assembled into 647 polished contigs with a total length of 1,001.42 megabases (Mb) and an N50 size of 3.62 Mb, which is in line with genome sizes estimated using flow cytometry and the k-mer analysis. These contigs were further clustered and ordered into 22 pseudo-chromosomes based on Hi-C data, matching the chromosome counts in Nyssa obtained from previous cytological studies. In addition, a total of 664.91 Mb of repetitive elements were identified and a total of 37,884 protein-coding genes were predicted in the genome of N. sinensis. All data were deposited in publicly available repositories, and should be a valuable resource for genomics, evolution, and conservation biology.

Download Full-text

Chromosome-level reference genome of the European wasp spider Argiope bruennichi: a resource for studies on range expansion and evolutionary adaptation

GigaScience ◽

10.1093/gigascience/giaa148 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Monica M Sheffer ◽

Anica Hoppe ◽

Henrik Krehenwinkel ◽

Gabriele Uhl ◽

Andreas W Kuss ◽

...

Keyword(s):

Genome Assembly ◽

Range Expansion ◽

Reference Genome ◽

De Novo ◽

Sequencing Data ◽

High Quality ◽

Proximity Ligation ◽

Genomic Resource ◽

Paired End Sequencing ◽

Chromosome Level

Abstract Background Argiope bruennichi, the European wasp spider, has been investigated intensively as a focal species for studies on sexual selection, chemical communication, and the dynamics of rapid range expansion at a behavioral and genetic level. However, the lack of a reference genome has limited insights into the genetic basis for these phenomena. Therefore, we assembled a high-quality chromosome-level reference genome of the European wasp spider as a tool for more in-depth future studies. Findings We generated, de novo, a 1.67 Gb genome assembly of A. bruennichi using 21.8× Pacific Biosciences sequencing, polished with 19.8× Illumina paired-end sequencing data, and proximity ligation (Hi-C)-based scaffolding. This resulted in an N50 scaffold size of 124 Mb and an N50 contig size of 288 kb. We found 98.4% of the genome to be contained in 13 scaffolds, fitting the expected number of chromosomes (n = 13). Analyses showed the presence of 91.1% of complete arthropod BUSCOs, indicating a high-quality assembly. Conclusions We present the first chromosome-level genome assembly in the order Araneae. With this genomic resource, we open the door for more precise and informative studies on evolution and adaptation not only in A. bruennichi but also in arachnids overall, shedding light on questions such as the genomic architecture of traits, whole-genome duplication, and the genomic mechanisms behind silk and venom evolution.

Download Full-text

Chromosome-level reference genome of the European wasp spider Argiope bruennichi: a resource for studies on range expansion and evolutionary adaptation

10.1101/2020.05.21.103564 ◽

2020 ◽

Cited By ~ 2

Author(s):

Monica M. Sheffer ◽

Anica Hoppe ◽

Henrik Krehenwinkel ◽

Gabriele Uhl ◽

Andreas W. Kuss ◽

...

Keyword(s):

Genome Assembly ◽

Range Expansion ◽

Reference Genome ◽

De Novo ◽

Sequencing Data ◽

High Quality ◽

Proximity Ligation ◽

Genomic Resource ◽

Paired End Sequencing ◽

Chromosome Level

AbstractBackgroundArgiope bruennichi, the European wasp spider, has been studied intensively as to sexual selection, chemical communication, and the dynamics of rapid range expansion at a behavioral and genetic level. However, the lack of a reference genome has limited insights into the genetic basis for these phenomena. Therefore, we assembled a high-quality chromosome-level reference genome of the European wasp spider as a tool for more in-depth future studies.FindingsWe generated, de novo, a 1.67Gb genome assembly of A. bruennichi using 21.5X PacBio sequencing, polished with 30X Illumina paired-end sequencing data, and proximity ligation (Hi-C) based scaffolding. This resulted in an N50 scaffold size of 124Mb and an N50 contig size of 288kb. We found 98.4% of the genome to be contained in 13 scaffolds, fitting the expected number of chromosomes (n = 13). Analyses showed the presence of 91.1% of complete arthropod BUSCOs, indicating a high quality of the assembly.ConclusionsWe present the first chromosome-level genome assembly in the class Arachnida. With this genomic resource, we open the door for more precise and informative studies on evolution and adaptation in A. bruennichi, as well as on several interesting topics in Arachnids, such as the genomic architecture of traits, whole genome duplication and the genomic mechanisms behind silk and venom evolution.

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Chromosome-level de novo assembly of Coprinopsis cinerea A43mut B43mut pab1-1 #326 and genetic variant identification of mutants using Nanopore MinION sequencing

10.1101/2020.11.09.367581 ◽

2020 ◽

Author(s):

Yichun Xie ◽

Yiyi Zhong ◽

Jinhui Chang ◽

Hoi Shan Kwan

Keyword(s):

Genome Assembly ◽

Dna Isolation ◽

Variant Calling ◽

Genomic Analysis ◽

Single Nucleotide ◽

Sequencing Platform ◽

Genomic Dna Isolation ◽

Long Reads ◽

Dna Isolation Protocol ◽

Chromosome Level

AbstractThe homokaryotic Coprinopsis cinerea strain A43mut B43mut pab1-1 #326 is a widely used experimental model for developmental studies in mushroom-forming fungi. It can grow on defined artificial media and complete the whole lifecycle within two weeks. The mutations in mating type factors A and B result in the special feature of clamp formation and fruiting without mating. This feature allows investigations and manipulations with a homokaryotic genetic background. Current genome assembly of strain #326 was based on short-read sequencing data and was highly fragmented, leading to the bias in gene annotation and downstream analyses. Here, we report a chromosome-level genome assembly of strain #326. Oxford Nanopore Technology (ONT) MinION sequencing was used to get long reads. Illumina short reads was used to polish the sequences. A combined assembly yield 13 chromosomes and a mitochondrial genome as individual scaffolds. The assembly has 15,250 annotated genes with a high synteny with the C. cinerea strain Okayama-7 #130. This assembly has great improvement on contiguity and annotations. It is a suitable reference for further genomic studies, especially for the genetic, genomic and transcriptomic analyses in ONT long reads. Single nucleotide variants and structural variants in six mutagenized and cisplatin-screened mutants could be identified and validated. A 66 bp deletion in Ras GTPase-activating protein (RasGAP) was found in all mutants. To make a better use of ONT sequencing platform, we modified a high-molecular-weight genomic DNA isolation protocol based on magnetic beads for filamentous fungi. This study showed the use of MinION to construct a fungal reference genome and to perform downstream studies in an individual laboratory. An experimental workflow was proposed, from DNA isolation and whole genome sequencing, to genome assembly and variant calling. Our results provided solutions and parameters for fungal genomic analysis on MinION sequencing platform.HighlightA chromosome-level genome assembly of C. cinerea #326A fast and efficient high-molecular-weight fungal genomic DNA isolation protocolStructural variant and single nucleotide variant calling using Nanopore readsA series of solutions and reference parameters for fungal genomic analysis on MinION

Download Full-text

Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400922 ◽

2020 ◽

Vol 10 (3) ◽

pp. 891-897 ◽

Cited By ~ 3

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text