A chromosome-level reference genome of Ensete glaucum gives insight into diversity, chromosomal and repetitive sequence evolution in the Musaceae

Background: Ensete glaucum (2n = 2x = 18) is a giant herbaceous monocotyledonous plant in the small Musaceae family along with banana (Musa). A high-quality reference genome sequence of E. glaucum offers a vital genomic resource for functional and evolutionary studies of Ensete, the Musaceae, and more widely in the Zingiberales. Findings: Using a combination of Illumina and Oxford Nanopore Technologies (ONT) sequencing, genome-wide chromosome conformation capture (Hi-C), and RNA survey sequence, we report a high-quality assembly of the 481.5Mb genome with 9 pseudochromosomes and 36,836 genes (BUSCO 94.7%). A total of 55% of the genome is composed of repetitive sequences with LTR-retroelements (37%) and DNA transposons (7%) predominant. The 5S and 45S rDNA were each present at one locus, and the 5S rDNA had an exceptionally long monomer length of c.1,056 bp, contrasting with the c. 450 bp monomer at multiple loci in Musa. A tandemly repeated c. 134 bp satellite, 1.1% of the genome (with no similar sequence in Musa), was present around all nine centromeres, with a LINE retroelement also found at Musa centromeres. The assembly, including centromeric positions, enabled us to characterize in detail the chromosomal rearrangements occurring between the x = 9 species and x = 11 species of Musa. Only one chromosome has the same gene content as M. acuminata (ma). Three ma chromosomes represent part of only one E. glaucum (eg) chromosome, while the remaining seven ma chromosomes are fusions of parts of two, three, or four eg chromosomes, demonstrating complex and multiple evolutionary rearrangements in the change between x = 9 and x = 11. Conclusions: The advance towards a Musaceae pangenome including E. glaucum, tolerant of extreme environments, makes a complete set of gene alleles available for crop breeding and understanding environmental responses. The chromosome-scale genome assembly show how chromosome number evolves, and features of the rapid evolution of repetitive sequences.

Download Full-text

A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.)

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab085 ◽

2021 ◽

Author(s):

Tomas N Generalovic ◽

Shane A McCarthy ◽

Ian A Warren ◽

Jonathan M D Wood ◽

James Torrance ◽

...

Keyword(s):

Genome Assembly ◽

Animal Feed ◽

Repetitive Sequences ◽

Genomic Variation ◽

Runs Of Homozygosity ◽

High Quality ◽

Black Soldier Fly ◽

Hermetia Illucens ◽

Chromosome Conformation ◽

Important Species

Abstract Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a BUSCO completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analysed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of a lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome five. Release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterisation of genes of interest and genetic modification of this economically important species.

Download Full-text

Draft genome of the Reindeer (Rangifer tarandus)

10.1101/154724 ◽

2017 ◽

Author(s):

Zhipeng Li ◽

Zeshan Lin ◽

Lei Chen ◽

Hengxing Ba ◽

Yongzhi Yang ◽

...

Keyword(s):

Rangifer Tarandus ◽

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Repetitive Sequences ◽

Draft Genome ◽

Divergence Time ◽

Capra Hircus ◽

High Quality ◽

Illumina Hiseq

AbstractBackgroundReindeer (Rangifer tarandus) is the only fully domesticated species in the Cervidae family, and is the only cervid with a circumpolar distribution. Unlike all other cervids, female reindeer regularly grow cranial appendages (antlers, the defining characteristics of cervids), as well as males. Moreover, reindeer milk contains more protein and less lactose than bovids’ milk. A high quality reference genome of this specie will assist efforts to elucidate these and other important features in the reindeer.FindingsWe obtained 723.2 Gb (Gigabase) of raw reads by an Illumina Hiseq 4000 platform, and a 2.64 Gb final assembly, representing 95.7% of the estimated genome (2.76 Gb according to k-mer analysis), including 92.6% of expected genes according to BUSCO analysis. The contig N50 and scaffold N50 sizes were 89.7 kilo base (kb) and 0.94 mega base (Mb), respectively. We annotated 21,555 protein-coding genes and 1.07 Gb of repetitive sequences by de novo and homology-based prediction. Homology-based searches detected 159 rRNA, 547 miRNA, 1,339 snRNA and 863 tRNA sequences in the genome of R. tarandus. The divergence time between R. tarandus, and ancestors of Bos taurus and Capra hircus, is estimated to be 29.55 million years ago (Mya).ConclusionsOur results provide the first high-quality reference genome for the reindeer, and a valuable resource for studying evolution, domestication and other unusual characteristics of the reindeer.

Download Full-text

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

10.1101/345983 ◽

2018 ◽

Cited By ~ 2

Author(s):

Huilong Du ◽

Chengzhi Liang

Keyword(s):

Single Molecule ◽

High Efficiency ◽

Reference Genome ◽

Repetitive Sequences ◽

Sequencing Data ◽

High Quality ◽

Single Molecule Sequencing ◽

Genome Maps ◽

Long Reads ◽

Novel Method

AbstractDue to the large number of repetitive sequences in complex eukaryotic genomes, fragmented and incompletely assembled genomes lose value as reference sequences, often due to short contigs that cannot be anchored or mispositioned onto chromosomes. Here we report a novel method Highly Efficient Repeat Assembly (HERA), which includes a new concept called a connection graph as well as algorithms for constructing the graph. HERA resolves repeats at high efficiency with single-molecule sequencing data, and enables the assembly of chromosome-scale contigs by further integrating genome maps and Hi-C data. We tested HERA with the genomes of rice R498, maize B73, human HX1 and Tartary buckwheat Pinku1. HERA can correctly assemble most of the tandemly repetitive sequences in rice using single-molecule sequencing data only. Using the same maize and human sequencing data published by Jiao et al. (2017) and Shi et al. (2016), respectively, we dramatically improved on the sequence contiguity compared with the published assemblies, increasing the contig N50 from 1.3 Mb to 61.2 Mb in maize B73 assembly and from 8.3 Mb to 54.4 Mb in human HX1 assembly with HERA. We provided a high-quality maize reference genome with 96.9% of the gaps filled (only 76 gaps left) and several incorrectly positioned sequences fixed compared with the B73 RefGen_v4 assembly. Comparisons between the HERA assembly of HX1 and the human GRCh38 reference genome showed that many gaps in GRCh38 could be filled, and that GRCh38 contained some potential errors that could be fixed. We assembled the Pinku1 genome into 12 scaffolds with a contig N50 size of 27.85 Mb. HERA serves as a new genome assembly/phasing method to generate high quality sequences for complex genomes and as a curation tool to improve the contiguity and completeness of existing reference genomes, including the correction of assembly errors in repetitive regions.

Download Full-text

Chromosomal-level reference genome of Chinese peacock butterfly (Papilio bianor) based on third-generation DNA sequencing and Hi-C analysis

GigaScience ◽

10.1093/gigascience/giz128 ◽

2019 ◽

Vol 8 (11) ◽

Cited By ~ 4

Author(s):

Sihan Lu ◽

Jie Yang ◽

Xuelei Dai ◽

Feiang Xie ◽

Jinwu He ◽

...

Keyword(s):

Reference Genome ◽

De Novo ◽

Demographic History ◽

Repetitive Sequences ◽

Population Expansion ◽

Chromatin Interaction ◽

Interglacial Period ◽

Last Interglacial ◽

High Quality ◽

Chromosome Level

AbstractBackgroundPapilio bianor Cramer, 1777 (commonly known as the Chinese peacock butterfly) (Insecta, Lepidoptera, Papilionidae) is a widely distributed swallowtail butterfly with a wide number of geographic populations ranging from the southeast of Russia to China, Japan, India, Vietnam, Myanmar, and Thailand. Its wing color consists of both pigmentary colored scales (black, reddish) and structural colored scales (iridescent blue or green dust). A high-quality reference genome of P. bianor is an important foundation for investigating iridescent color evolution, phylogeography, and the evolution of swallowtail butterflies.FindingsWe obtained a chromosome-level de novo genome assembly of the highly heterozygous P. bianor using long Pacific Biosciences sequencing reads and high-throughput chromosome conformation capture technology. The final assembly is 421.52 Mb on 30 chromosomes (29 autosomes and 1 Z sex chromosome) with 13.12 Mb scaffold N50. In total, 15,375 protein-coding genes and 233.09 Mb of repetitive sequences were identified. Phylogenetic analyses indicated that P. bianor separated from a common ancestor of swallowtails ∼23.69–36.04 million years ago. Demographic history suggested that the population expansion of this species from the last interglacial period to the last glacial maximum possibly resulted from its decreased natural enemies and its adaptation to climate change during the glacial period.ConclusionsWe present a high-quality chromosome-level reference genome of P. bianor using long-read single-molecule sequencing and Hi-C–based chromatin interaction maps. Our results lay the foundation for exploring the genetic basis of special biological features of P. bianor and also provide a useful data source for comparative genomics and phylogenomics among butterflies and moths.

Download Full-text

A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant

Horticulture Research ◽

10.1038/s41438-020-00391-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Qingzhen Wei ◽

Jinglei Wang ◽

Wuhong Wang ◽

Tianhua Hu ◽

Haijiao Hu ◽

...

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Specific Gene ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Fruit Length ◽

Protein Coding Genes

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.

Download Full-text

A high-quality Actinidia chinensis (kiwifruit) genome

Horticulture Research ◽

10.1038/s41438-019-0202-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 11

Author(s):

Haolin Wu ◽

Tao Ma ◽

Minghui Kang ◽

Fandi Ai ◽

Junlin Zhang ◽

...

Keyword(s):

Reference Genome ◽

Gene Annotation ◽

Repetitive Sequences ◽

Economic Value ◽

Actinidia Chinensis ◽

Long Terminal Repeats ◽

High Quality ◽

Horticultural Crop ◽

Crop Species ◽

Protein Coding

Abstract Actinidia chinensis (kiwifruit) is a perennial horticultural crop species of the Actinidiaceae family with high nutritional and economic value. Two versions of the A. chinensis genomes have been previously assembled, based mainly on relatively short reads. Here, we report an improved chromosome-level reference genome of A. chinensis (v3.0), based mainly on PacBio long reads and Hi-C data. The high-quality assembled genome is 653 Mb long, with 0.76% heterozygosity. At least 43% of the genome consists of repetitive sequences, and the most abundant long terminal repeats were further identified and account for 23.38% of our novel genome. It has clear improvements in contiguity, accuracy, and gene annotation over the two previous versions and contains 40,464 annotated protein-coding genes, of which 94.41% are functionally annotated. Moreover, further analyses of genetic collinearity revealed that the kiwifruit genome has undergone two whole-genome duplications: one affecting all Ericales families near the K-T extinction event and a recent genus-specific duplication. The reference genome presented here will be highly useful for further molecular elucidation of diverse traits and for the breeding of this horticultural crop, as well as evolutionary studies with related taxa.

Download Full-text

Genomic Analysis of Mycobacterium tuberculosis Isolates and Construction of a Beijing Lineage Reference Genome

Genome Biology and Evolution ◽

10.1093/gbe/evaa009 ◽

2020 ◽

Vol 12 (2) ◽

pp. 3890-3905 ◽

Cited By ~ 1

Author(s):

Woei-Fuh Wang ◽

Mei-Yeh Jade Lu ◽

Ting-Jen Rachel Cheng ◽

Yi-Ching Tang ◽

Yu-Chuan Teng ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Reference Genome ◽

Rapid Evolution ◽

Genomic Analysis ◽

Sequencing Data ◽

High Quality ◽

Whole Genome Analysis ◽

Genetic Changes ◽

Ngs Data ◽

Gains And Losses

Abstract Tuberculosis (TB), an infectious disease caused by Mycobacterium tuberculosis, kills over 1 million people worldwide annually. Development of drug resistance (DR) in the pathogen is a major challenge for TB control. We conducted whole-genome analysis of seven Taiwan M. tuberculosis isolates: One drug susceptible (DS) and five DR Beijing lineage isolates and one DR Euro-American lineage isolate. Developing a new method for DR mutation identification and applying it to the next-generation sequencing (NGS) data from the 6 Beijing lineage isolates, we identified 13 known and 6 candidate DR mutations and provided experimental support for 4 of them. We assembled the genomes of one DS and two DR Beijing lineage isolates and the Euro-American lineage isolate using NGS data. Moreover, using both PacBio and NGS sequencing data, we obtained a high-quality assembly of an extensive DR Beijing lineage isolate. Comparative analysis of these five newly assembled genomes and two published complete genomes revealed a large number of genetic changes, including gene gains and losses, indels and translocations, suggesting rapid evolution of M. tuberculosis. We found the MazEF toxin–antitoxin system in all the seven isolates studied and several interesting mutations in MazEF proteins. Finally, we used the four assembled Beijing lineage genomes to construct a high-quality Beijing lineage reference genome that is DS and contains all the genes in the four genomes. It contains 212 genes not found in the standard reference H37Rv, which is Euro-American. It is therefore a better reference than H37Rv for the Beijing lineage, the predominant lineage in Asia.

Download Full-text

Chromosome-level assembly of the mustache toad genome using third-generation DNA sequencing and Hi-C analysis

GigaScience ◽

10.1093/gigascience/giz114 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 7

Author(s):

Yongxin Li ◽

Yandong Ren ◽

Dongru Zhang ◽

Hui Jiang ◽

Zhongkai Wang ◽

...

Keyword(s):

Breeding Season ◽

Reference Genome ◽

Gene Families ◽

Sequencing Data ◽

High Quality ◽

Chromosome Conformation ◽

Functional Studies ◽

Sequencing Technologies ◽

A Genome ◽

Chromosome Level

Abstract Background The mustache toad, Vibrissaphora ailaonica, is endemic to China and belongs to the Megophryidae family. Like other mustache toad species, V. ailaonica males temporarily develop keratinized nuptial spines on their upper jaw during each breeding season, which fall off at the end of the breeding season. This feature is likely result of the reversal of sexual dimorphism in body size, with males being larger than females. A high-quality reference genome for the mustache toad would be invaluable to investigate the genetic mechanism underlying these repeatedly developing keratinized spines. Findings To construct the mustache toad genome, we generated 225 Gb of short reads and 277 Gb of long reads using Illumina and Pacific Biosciences (PacBio) sequencing technologies, respectively. Sequencing data were assembled into a 3.53-Gb genome assembly, with a contig N50 length of 821 kb. We also used high-throughput chromosome conformation capture (Hi-C) technology to identify contacts between contigs, then assembled contigs into scaffolds and assembled a genome with 13 chromosomes and a scaffold N50 length of 412.42 Mb. Based on the 26,227 protein-coding genes annotated in the genome, we analyzed phylogenetic relationships between the mustache toad and other chordate species. The mustache toad has a relatively higher evolutionary rate and separated from a common ancestor of the marine toad, bullfrog, and Tibetan frog 206.1 million years ago. Furthermore, we identified 201 expanded gene families in the mustache toad, which were mainly enriched in immune pathway, keratin filament, and metabolic processes. Conclusions Using Illumina, PacBio, and Hi-C technologies, we constructed the first high-quality chromosome-level mustache toad genome. This work not only offers a valuable reference genome for functional studies of mustache toad traits but also provides important chromosomal information for wider genome comparisons.

Download Full-text

Chromosome-level reference genome of the European wasp spider Argiope bruennichi: a resource for studies on range expansion and evolutionary adaptation

GigaScience ◽

10.1093/gigascience/giaa148 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Monica M Sheffer ◽

Anica Hoppe ◽

Henrik Krehenwinkel ◽

Gabriele Uhl ◽

Andreas W Kuss ◽

...

Keyword(s):

Genome Assembly ◽

Range Expansion ◽

Reference Genome ◽

De Novo ◽

Sequencing Data ◽

High Quality ◽

Proximity Ligation ◽

Genomic Resource ◽

Paired End Sequencing ◽

Chromosome Level

Abstract Background Argiope bruennichi, the European wasp spider, has been investigated intensively as a focal species for studies on sexual selection, chemical communication, and the dynamics of rapid range expansion at a behavioral and genetic level. However, the lack of a reference genome has limited insights into the genetic basis for these phenomena. Therefore, we assembled a high-quality chromosome-level reference genome of the European wasp spider as a tool for more in-depth future studies. Findings We generated, de novo, a 1.67 Gb genome assembly of A. bruennichi using 21.8× Pacific Biosciences sequencing, polished with 19.8× Illumina paired-end sequencing data, and proximity ligation (Hi-C)-based scaffolding. This resulted in an N50 scaffold size of 124 Mb and an N50 contig size of 288 kb. We found 98.4% of the genome to be contained in 13 scaffolds, fitting the expected number of chromosomes (n = 13). Analyses showed the presence of 91.1% of complete arthropod BUSCOs, indicating a high-quality assembly. Conclusions We present the first chromosome-level genome assembly in the order Araneae. With this genomic resource, we open the door for more precise and informative studies on evolution and adaptation not only in A. bruennichi but also in arachnids overall, shedding light on questions such as the genomic architecture of traits, whole-genome duplication, and the genomic mechanisms behind silk and venom evolution.

Download Full-text

Genome Assembly of Salicaceae Populus deltoides (Eastern Cottonwood) I-69 Based on Nanopore Sequencing and Hi-C Technologies

Journal of Heredity ◽

10.1093/jhered/esab010 ◽

2021 ◽

Author(s):

Shengjun Bai ◽

Hainan Wu ◽

Jinpeng Zhang ◽

Zhiliang Pan ◽

Wei Zhao ◽

...

Keyword(s):

Genome Assembly ◽

Populus Deltoides ◽

Wood Quality ◽

Repetitive Sequences ◽

Nanopore Sequencing ◽

Chromosome Conformation ◽

Protein Coding ◽

Eastern Cottonwood ◽

Genomic Resource ◽

Chromosome Level

Abstract Populus deltoides has important ecological and economic values, widely used in poplar breeding programs due to its superior characteristics such as rapid growth and resistance to disease. Although the genome sequence of P. deltoides WV94 is available, the assembly is fragmented. Here, we reported an improved chromosome-level assembly of the P. deltoides cultivar I-69 by combining Nanopore sequencing and chromosome conformation capture (Hi-C) technologies. The assembly was 429.3 Mb in size and contained 657 contigs with a contig N50 length of 2.62 Mb. Hi-C scaffolding of the contigs generated 19 chromosome-level sequences, which covered 97.4% (418 Mb) of the total assembly size. Moreover, repetitive sequences annotation showed that 39.28% of the P. deltoides genome was composed of interspersed elements, including retroelements (23.66%), DNA transposons (6.83%), and unclassified elements (8.79%). We also identified a total of 44 362 protein-coding genes in the current P. deltoides assembly. Compared with the previous genome assembly of P. deltoides WV94, the current assembly had some significantly improved qualities: the contig N50 increased 3.5-fold and the proportion of gaps decreased from 3.2% to 0.08%. This high-quality, well-annotated genome assembly provides a reliable genomic resource for identifying genome variants among individuals, mining candidate genes that control growth and wood quality traits, and facilitating further application of genomics-assisted breeding in populations related to P. deltoides.

Download Full-text