Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400922 ◽

2020 ◽

Vol 10 (3) ◽

pp. 891-897 ◽

Cited By ~ 3

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass)

F1000Research ◽

10.12688/f1000research.38156.1 ◽

2021 ◽

Vol 10 ◽

pp. 289

Author(s):

Xiao Ma ◽

Jeanine L. Olsen ◽

Thorsten B.H. Reusch ◽

Gabriele Procaccini ◽

Dave Kudrna ◽

...

Keyword(s):

Genome Assembly ◽

Zostera Marina ◽

Draft Genome ◽

High Molecular Weight Dna ◽

Protein Coding ◽

New Findings ◽

Long Read ◽

Sanger Sequence ◽

Assembly Pipeline ◽

Chromosome Level

Background: Seagrasses (Alismatales) are the only fully marine angiosperms. Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7× genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings. Methods: The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A high-quality reference genome was assembled with the MECAT assembly pipeline combining PacBio long-read sequencing and Hi-C scaffolding. Results: In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 protein-encoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: As an important marine angiosperm, the improved Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life.

Download Full-text

The draft genome sequence of Eucalyptus polybractea based on hybrid assembly with short- and long-reads reads

10.1101/2021.05.18.444652 ◽

2021 ◽

Author(s):

Teng Li ◽

David Kainer ◽

William J Foley ◽

Allen Rodrigo ◽

Carsten Kuelheim

Keyword(s):

Population Genomics ◽

De Novo ◽

Draft Genome ◽

Hybrid Assembly ◽

Illumina Hiseq ◽

Protein Coding ◽

Genome Coverage ◽

Protein Coding Genes ◽

Long Reads ◽

Long Read

Eucalyptus polybractea is a small, multi-stemmed tree, which is widely cultivated in Australia for the production of Eucalyptus oil. We report the hybrid assembly of the E. polybractea genome utilizing both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83 and 15 times genome coverage, respectively. The hybrid-assembled genome, after polishing, contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de novo approaches. We have provided the first assembled genome based on hybrid sequences from the highly diverse Eucalyptus subgenus Symphyomyrtus, and revealed the value of including long-reads from Nanopore technology for enhancing the contiguity of the assembled genome, as well as for improving its completeness. We anticipate that the E. polybractea genome will be an invaluable resource supporting a range of studies in genetics, population genomics and evolution of related species in Eucalyptus.

Download Full-text

Construction of a new chromosome-scale, long-read reference genome assembly of the Syrian hamster, Mesocricetus auratus

10.1101/2021.07.05.451071 ◽

2021 ◽

Author(s):

R. Alan Harris ◽

Muthuswamy Raveendran ◽

Dustin T Lyfoung ◽

Fritz J Sedlazeck ◽

Medhat Mahmoud ◽

...

Keyword(s):

Genome Assembly ◽

Syrian Hamster ◽

Reference Genome ◽

Sequence Data ◽

Mesocricetus Auratus ◽

Protein Coding ◽

Protein Coding Genes ◽

Sequencing Technologies ◽

Long Read ◽

Short Read Sequence

Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.

Download Full-text

The draft chromosome-level genome assembly of tetraploid ground cherry (Prunus fruticosa Pall.) from long reads

10.1101/2021.06.01.446499 ◽

2021 ◽

Author(s):

Thomas W Woehner ◽

Ofere Francis Emeriewen ◽

Alexander Wittenberg ◽

Harrie Schneiders ◽

Ilse Vrijenhoek ◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

Draft Genome ◽

Sour Cherry ◽

Evolutionary Analysis ◽

Protein Coding ◽

Draft Genome Assembly ◽

Repeat Content ◽

Long Read ◽

Chromosome Level

Background: Cherries are stone fruits and belong to the economically important plant family of Rosaceae with worldwide cultivation of different species. The ground cherry, Prunus fruticosa Pall. is one ancestor of cultivated sour cherry, an important tetraploid cherry species. Here, we present a long read chromosome-level draft genome assembly and related plastid sequences using the Oxford Nanopore Technology PromethION platform and R10.3 pore type. Finding: The final assemblies obtained from 117.3 Gb cleaned reads representing 97x coverage of expected 1.2 Gb tetraploid (2n=4x=32) and 0.3 Gb haploid (1n=8) genome sequence of P. fruticosa were calculated. The N50 contig length ranged between 0.3 and 0.5 Mb with the longest contig being ~6 Mb. BUSCO estimated a completeness between 98.7 % for the 4n and 96.1 % for the 1n datasets. Using a homology and reference based scaffolding method, we generated a final consensus genome sequence of 366 Mb comprising eight chromosomes. The N50 scaffold was ~44 Mb with the longest chromosome being 66.5 Mb. The repeat content was estimated to ~190 Mb (52 %) and 58,880 protein-coding genes were annotated. The chloroplast and mitochondrial genomes were 158,217 bp and 383,281 bp long, which is in accordance with previously published plastid sequences. Conclusion: This is the first report of the genome of ground cherry (P. fruticosa) sequenced by long read technology only. The datasets obtained from this study provide a foundation for future breeding, molecular and evolutionary analysis in Prunus studies.

Download Full-text

A chromosome-level genome assembly of the Chinese tupelo Nyssa sinensis

Scientific Data ◽

10.1038/s41597-019-0296-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Xuchen Yang ◽

Minghui Kang ◽

Yanting Yang ◽

Haifeng Xiong ◽

Mingcheng Wang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Chromosome Conformation ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Data Matching ◽

Long Reads ◽

Autumn Leaf ◽

Chromosome Level

AbstractThe deciduous Chinese tupelo (Nyssa sinensis Oliv.) is a popular ornamental tree for the spectacular autumn leaf color. Here, using single-molecule sequencing and chromosome conformation capture data, we report a high-quality, chromosome-level genome assembly of N. sinensis. PacBio long reads were de novo assembled into 647 polished contigs with a total length of 1,001.42 megabases (Mb) and an N50 size of 3.62 Mb, which is in line with genome sizes estimated using flow cytometry and the k-mer analysis. These contigs were further clustered and ordered into 22 pseudo-chromosomes based on Hi-C data, matching the chromosome counts in Nyssa obtained from previous cytological studies. In addition, a total of 664.91 Mb of repetitive elements were identified and a total of 37,884 protein-coding genes were predicted in the genome of N. sinensis. All data were deposited in publicly available repositories, and should be a valuable resource for genomics, evolution, and conservation biology.

Download Full-text

Chromosome-level genome assembly of golden pompano (Trachinotus ovatus) in the family Carangidae

Scientific Data ◽

10.1038/s41597-019-0238-8 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 7

Author(s):

Dian-Chang Zhang ◽

Liang Guo ◽

Hua-Yang Guo ◽

Ke-Cheng Zhu ◽

Shang-Qi Li ◽

...

Keyword(s):

Comparative Analysis ◽

Genome Assembly ◽

Protein Coding ◽

Trachinotus Ovatus ◽

Golden Pompano ◽

Protein Coding Genes ◽

Wide Geographical Distribution ◽

The Family ◽

Conserved Genes ◽

Chromosome Level

Abstract Golden pompano (Trachinotus ovatus), a marine fish in the Carangidae family, has a wide geographical distribution and adapts to severe environmental rigours. It is also an economically valuable aquaculture fish. To understand the genetic mechanism of adaption to environmental rigours and improve the production in aquaculture, we assembled its genome. By combination of Illumina and Pacbio reads, the obtained genome sequence is 647.5 Mb with the contig N50 of 1.80 Mb and the scaffold N50 of 5.05 Mb. The assembly covers 98.9% of the estimated genome size (655 Mb). Based on Hi-C data, 99.4% of the assembled bases are anchored into 24 pseudo-chromosomes. The annotation includes 21,915 protein-coding genes, in which 95.7% of 2,586 BUSCO vertebrate conserved genes are complete. This genome is expected to contribute to the comparative analysis of the Carangidae family.

Download Full-text

Chromosome-level genome assembly of the African pike, Hepsetus odoe

10.1101/2020.05.13.094987 ◽

2020 ◽

Author(s):

Xiao Du ◽

Xiaoning Hong ◽

Guangyi Fan ◽

Xiaoyun Huang ◽

Shuai Sun ◽

...

Keyword(s):

Genome Assembly ◽

Freshwater Teleost ◽

Protein Coding ◽

Protein Coding Genes ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Evolutionary Studies ◽

Long Fragment Read ◽

Representative Member ◽

Chromosome Level

AbstractThe order Characiformes is one of the largest components of the freshwater teleost fauna inhabiting exclusively in South America and Africa with great ecological and economical significance. Yet, quite limited genomic resources are available to study this group and their transatlantic vicariance. In this study we present a chromosome-level genome assembly of the African pike (Hepsetus odoe), a representative member of the African Characiformes. To this end, we generated 119, 11, and 67 Gb reads using the single tube long fragment read (stLFR), Oxford Nanopore, and Hi-C sequencing technologies, respectively. We obtained an 862.1 Mb genome assembly with the contig and scaffold N50 of 347.4 kb and 25.8 Mb, respectively. Hi-C sequencing produced 29 chromosomes with 742.5 Mb, representing 86.1% of the genome. 24,314 protein-coding genes were predicted and 23,999 (98.7%) genes were functionally annotated. The chromosomal-scale genome assembly will be useful for functional and evolutionary studies of the African pike and promote the study of Characiformes speciation and evolution.

Download Full-text

High-Quality de novo Chromosome-Level Genome Assembly of a Single Bombyx mori With BmNPV Resistance by a Combination of PacBio Long-Read Sequencing, Illumina Short-Read Sequencing, and Hi-C Sequencing

Frontiers in Genetics ◽

10.3389/fgene.2021.718266 ◽

2021 ◽

Vol 12 ◽

Author(s):

Min Tang ◽

Suqun He ◽

Xun Gong ◽

Peng Lü ◽

Rehab H. Taha ◽

...

Keyword(s):

Bombyx Mori ◽

Genome Assembly ◽

De Novo ◽

High Quality ◽

Single Strain ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Reference Genomes ◽

Chromosome Level

The reference genomes of Bombyx mori (B. mori), Silkworm Knowledge-based database (SilkDB) and SilkBase, have served as the gold standard for nearly two decades. Their use has fundamentally shaped model organisms and accelerated relevant studies on lepidoptera. However, the current reference genomes of B. mori do not accurately represent the full set of genes for any single strain. As new genome-wide sequencing technologies have emerged and the cost of high-throughput sequencing technology has fallen, it is now possible for standard laboratories to perform full-genome assembly for specific strains. Here we present a high-quality de novo chromosome-level genome assembly of a single B. mori with nuclear polyhedrosis virus (BmNPV) resistance through the integration of PacBio long-read sequencing, Illumina short-read sequencing, and Hi-C sequencing. In addition, regular bioinformatics analyses, such as gene family, phylogenetic, and divergence analyses, were performed. The sample was from our unique B. mori species (NB), which has strong inborn resistance to BmNPV. Our genome assembly showed good collinearity with SilkDB and SilkBase and particular regions. To the best of our knowledge, this is the first genome assembly with BmNPV resistance, which should be a more accurate insect model for resistance studies.

Download Full-text

Draft genome of a porcupinefish, Diodon Holocanthus

10.1101/775387 ◽

2019 ◽

Author(s):

Mengyang Xu ◽

Xiaoshan Su ◽

Mengqi Zhang ◽

Ming Li ◽

Xiaoyun Huang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Draft Genome ◽

Single Copy ◽

Single Individual ◽

Protein Coding ◽

Long Read ◽

Phylogeny And Evolution ◽

Downstream Analysis

AbstractThe long-spine porcupinefish, Diodon holocanthus (Diodontidae, Tetraodontiformes, Actinopterygii), also known as the freckled porcupinefish, attracts great interest of ecology and economy. Its distinct characteristics including inflation reaction, spiny skin and tetradotoxin, however, have not been fully studied without a complete genome assembly.In this study, the whole genome of a single individual was sequenced using single tube-Long Fragment Read co-barcode reads, generating 154.3 Gb of paired-end data (219.8× depth). The gap was further filled using small amount of Oxford Nanopore MinION long read dataset (11.4Gb, 15.9× depth). Taking full use of long, medium, short-range of genome assembly information, the final assembled sequences with a total length of 650.02 Mb obtained contig and scaffold N50 sizes of 2.15 Mb and 8.13 Mb, respectively, despite of high repetitive content. Benchmarking Universal Single-Copy Orthologs captured 95.7% (2,474) of core genes to assess the completeness. In addition, 206.5 Mb (32.10%) of repetitive sequences were identified, and 20,840 protein-coding genes were annotated, among which 18,281 (87.72%) proteins were assigned with possible functions.This is the first demonstration of de novo genome of the porcupinefish, which will benefit downstream analysis of ontogeny, phylogeny, and evolution, and improve the exploration of its unique defensive mechanism.

Download Full-text