First annotated genome of a mandibulate moth, Neomicropteryx cornuta, generated using PacBio HiFi sequencing

Abstract We provide a new, annotated genome assembly of Neomicropteryx cornuta, a species of the so-called “mandibulate archaic moths” (Lepidoptera: Micropterigidae). These moths belong to a lineage that is thought to have split from all other Lepidoptera more than 300 million years ago and are consequently vital to understanding the early evolution of superorder Amphiesmenoptera, which contains the order Lepidoptera (butterflies and moths) and its sister order Trichoptera (caddisflies). Using PacBio HiFi sequencing reads, we assembled a highly-contiguous genome with a contig N50 of nearly 17 Mbp. The assembled genome length of 541,115,538 bp is about half the length of the largest published Amphiesmenoptera genome (Limnephilus lunatus, Trichoptera) and double the length of the smallest (Papilio polytes, Lepidoptera). We find high recovery of universal single copy orthologs with 98.1% of BUSCO genes present and provide a genome annotation of 15,643 genes aided by resolved isoforms from PacBio IsoSeq data. This high-quality genome assembly provides an important resource for studying ecological and evolutionary transitions in the early evolution of Amphiesmenoptera.

Download Full-text

Genome Assembly of the Dogface Butterfly Zerene cesonia

Genome Biology and Evolution ◽

10.1093/gbe/evz254 ◽

2019 ◽

Vol 12 (1) ◽

pp. 3580-3585 ◽

Cited By ~ 2

Author(s):

Luis Rodriguez-Caro ◽

Jennifer Fenner ◽

Caleb Benson ◽

Steven M Van Belleghem ◽

Brian A Counterman

Keyword(s):

Genome Assembly ◽

Developmental Plasticity ◽

Hybrid Approach ◽

Single Copy ◽

Z Chromosome ◽

High Quality ◽

Protein Coding ◽

Genomic Change ◽

A Genome ◽

Genomic Studies

Abstract Comparisons of high-quality, reference butterfly, and moth genomes have been instrumental to advancing our understanding of how hybridization, and natural selection drive genomic change during the origin of new species and novel traits. Here, we present a genome assembly of the Southern Dogface butterfly, Zerene cesonia (Pieridae) whose brilliant wing colorations have been implicated in developmental plasticity, hybridization, sexual selection, and speciation. We assembled 266,407,278 bp of the Z. cesonia genome, which accounts for 98.3% of the estimated 271 Mb genome size. Using a hybrid approach involving Chicago libraries with Hi-Rise assembly and a diploid Meraculous assembly, the final haploid genome was assembled. In the final assembly, nearly all autosomes and the Z chromosome were assembled into single scaffolds. The largest 29 scaffolds accounted for 91.4% of the genome assembly, with the remaining ∼8% distributed among another 247 scaffolds and overall N50 of 9.2 Mb. Tissue-specific RNA-seq informed annotations identified 16,442 protein-coding genes, which included 93.2% of the arthropod Benchmarking Universal Single-Copy Orthologs (BUSCO). The Z. cesonia genome assembly had ∼9% identified as repetitive elements, with a transposable element landscape rich in helitrons. Similar to other Lepidoptera genomes, Z. cesonia showed a high conservation of chromosomal synteny. The Z. cesonia assembly provides a high-quality reference for studies of chromosomal arrangements in the Pierid family, as well as for population, phylo, and functional genomic studies of adaptation and speciation.

Download Full-text

A high-quality genome assembly for the endangered golden snub-nosed monkey (Rhinopithecus roxellana)

GigaScience ◽

10.1093/gigascience/giz098 ◽

2019 ◽

Vol 8 (8) ◽

Cited By ~ 5

Author(s):

Lu Wang ◽

Jinwei Wu ◽

Xiaomei Liu ◽

Dandan Di ◽

Yuhong Liang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Gene Families ◽

Rhinopithecus Roxellana ◽

High Quality ◽

Chromosome Conformation ◽

Protein Coding ◽

A Genome ◽

Close Relationship ◽

High Quality Genome

Abstract Background The golden snub-nosed monkey (Rhinopithecus roxellana) is an endangered colobine species endemic to China, which has several distinct traits including a unique social structure. Although a genome assembly for R. roxellana is available, it is incomplete and fragmented because it was constructed using short-read sequencing technology. Thus, important information such as genome structural variation and repeat sequences may be absent. Findings To obtain a high-quality chromosomal assembly for R. roxellana qinlingensis, we used 5 methods: Pacific Bioscience single-molecule real-time sequencing, Illumina paired-end sequencing, BioNano optical maps, 10X Genomics link-reads, and high-throughput chromosome conformation capture. The assembled genome was ∼3.04 Gb, with a contig N50 of 5.72 Mb and a scaffold N50 of 144.56 Mb. This represented a 100-fold improvement over the previously published genome. In the new genome, 22,497 protein-coding genes were predicted, of which 22,053 were functionally annotated. Gene family analysis showed that 993 and 2,745 gene families were expanded and contracted, respectively. The reconstructed phylogeny recovered a close relationship between R. rollexana and Macaca mulatta, and these 2 species diverged ∼13.4 million years ago. Conclusion We constructed a high-quality genome assembly of the Qinling golden snub-nosed monkey; it had superior continuity and accuracy, which might be useful for future genetic studies in this species and as a new standard reference genome for colobine primates. In addition, the updated genome assembly might improve our understanding of this species and could assist conservation efforts.

Download Full-text

High-Quality Genome Sequence Resource of a Rice False Smut Fungus Ustilaginoidea virens Isolate, UV-FJ-1

Phytopathology ◽

10.1094/phyto-01-21-0007-a ◽

2021 ◽

Author(s):

Jiandong Bao ◽

Rong Wang ◽

Shilei Gao ◽

Zhe Wang ◽

Yu Fang ◽

...

Keyword(s):

Genome Assembly ◽

Gene Annotation ◽

Gene Clusters ◽

Single Copy ◽

Effector Proteins ◽

High Quality ◽

Ustilaginoidea Virens ◽

Rice False Smut ◽

False Smut ◽

High Quality Genome

Ustilaginoidea virens is the fungal pathogen causing rice false smut, resulting in not only yield lost but also grain pollution with toxic mycotoxins. Here we deployed PacBio Sequel II HIFI-read sequencing technology to generate a near-complete genome assembly for the U. virens isolate UV-FJ-1 (38.48 Mb), which was isolated from Fujian province, China. The genome assembly contains 116 contigs with N50 of 0.65 Mb and a maximum length of 2.10 Mb, and the genome completeness is ≥98% assessed by benchmarking universal single-copy orthologs (BUSCOs) and the mapping rate of Illumina short reads. Excluding 35.78% repeat sequences, we identified a total of 7,164 protein-coding genes, of which 5,818 were functionally annotated and 223 encode putative effector proteins. Moreover, 21 secondary metabolite biosynthesis gene clusters were found in UV-FJ-1 genome. Taken together, this high-quality genome assembly and gene annotation resource will provide a better insight for characterizing the biological and pathogenic mechanisms of U. virens.

Download Full-text

Twelve quick steps for genome assembly and annotation in the classroom

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008325 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008325

Author(s):

Hyungtaek Jung ◽

Tomer Ventura ◽

J. Sook Chung ◽

Woo-Jin Kim ◽

Bo-Hye Nam ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Genome Project ◽

Model Organisms ◽

High Quality ◽

Sequencing Technologies ◽

A Genome ◽

Sequencing Platforms ◽

High Quality Genome

Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.

Download Full-text

A genome assembly of the Atlantic chub mackerel (Scomber colias): a valuable teleost fishing resource

10.1101/2021.11.19.468211 ◽

2021 ◽

Author(s):

Andre Machado ◽

Andre Gomes-dos-Santos ◽

Miguel Fonseca ◽

Rute da Fonseca ◽

Ana Verissimo ◽

...

Keyword(s):

Genome Assembly ◽

Draft Genome ◽

Single Copy ◽

Medium Size ◽

Chub Mackerel ◽

High Quality ◽

Protein Coding ◽

Draft Genome Assembly ◽

Scomber Colias ◽

A Genome

The Atlantic chub mackerel, Scomber colias Gmelin, 1789, is a medium-size pelagic fish with substantial importance in the fisheries of the Atlantic Ocean and the Mediterranean Sea. Over the past decade, this species has gained special relevance being one of the main targets of pelagic fisheries in the NE Atlantic. Here, we sequenced and annotated the first high-quality draft genome assembly of S. colias, produced with Pacbio HiFi long reads and Illumina Paired-End short reads. The estimated genome size is 814 Mb distributed into 2,028 scaffolds and 2,093 contigs with an N50 length of 4,19 and 3,34 Mb, respectively. We annotated 27,675 protein-coding genes and the BUSCO analyses indicated high completeness, with 97.3 % of the single-copy orthologs in the Actinopterygii library profile. The present genome assembly represents a valuable resource to address the biology and management of this relevant fishery. Finally, this is the fourth high-quality genome assembly within the Order Scombriformes and the first in the genus Scomber.

Download Full-text

Faculty Opinions recommendation of How can a high-quality genome assembly help plant breeders?

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.735958664.793561468 ◽

2019 ◽

Author(s):

Dirk Hincha

Keyword(s):

Genome Assembly ◽

High Quality ◽

High Quality Genome

Download Full-text

High-quality genome assembly of Huazhan and Tianfeng, the parents of an elite rice hybrid Tian-you-hua-zhan

Science China Life Sciences ◽

10.1007/s11427-020-1940-9 ◽

2021 ◽

Author(s):

Hui Zhang ◽

Yuexing Wang ◽

Ce Deng ◽

Sheng Zhao ◽

Peng Zhang ◽

...

Keyword(s):

Genome Assembly ◽

High Quality ◽

Rice Hybrid ◽

High Quality Genome

Download Full-text

Whole-Genome Sequencing of Chinese Yellow Catfish Provides a Valuable Genetic Resource for High-Throughput Identification of Toxin Genes

Toxins ◽

10.3390/toxins10120488 ◽

2018 ◽

Vol 10 (12) ◽

pp. 488 ◽

Cited By ~ 5

Author(s):

Shiyong Zhang ◽

Jia Li ◽

Qin Qin ◽

Wei Liu ◽

Chao Bian ◽

...

Keyword(s):

High Throughput ◽

Genome Assembly ◽

Raw Materials ◽

Pelteobagrus Fulvidraco ◽

Yellow Catfish ◽

High Quality ◽

Protein Coding ◽

Toxin Genes ◽

Sequencing Platforms ◽

High Quality Genome

Naturally derived toxins from animals are good raw materials for drug development. As a representative venomous teleost, Chinese yellow catfish (Pelteobagrus fulvidraco) can provide valuable resources for studies on toxin genes. Its venom glands are located in the pectoral and dorsal fins. Although with such interesting biologic traits and great value in economy, Chinese yellow catfish is still lacking a sequenced genome. Here, we report a high-quality genome assembly of Chinese yellow catfish using a combination of next-generation Illumina and third-generation PacBio sequencing platforms. The final assembly reached 714 Mb, with a contig N50 of 970 kb and a scaffold N50 of 3.65 Mb, respectively. We also annotated 21,562 protein-coding genes, in which 97.59% were assigned at least one functional annotation. Based on the genome sequence, we analyzed toxin genes in Chinese yellow catfish. Finally, we identified 207 toxin genes and classified them into three major groups. Interestingly, we also expanded a previously reported sex-related region (to ≈6 Mb) in the achieved genome assembly, and localized two important toxin genes within this region. In summary, we assembled a high-quality genome of Chinese yellow catfish and performed high-throughput identification of toxin genes from a genomic view. Therefore, the limited number of toxin sequences in public databases will be remarkably improved once we integrate multi-omics data from more and more sequenced species.

Download Full-text

Whole-genome assembly of Ganoderma leucocontextum (Ganodermataceae, Fungi) discovered from the Tibetan Plateau of China

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab337 ◽

2021 ◽

Author(s):

Yuanchao Liu ◽

Longhua Huang ◽

Huiping Hu ◽

Manjun Cai ◽

Xiaowei Liang ◽

...

Keyword(s):

Genome Assembly ◽

Southwest China ◽

Reference Genome ◽

Biological Activities ◽

Single Copy ◽

The Tibetan Plateau ◽

Whole Genome ◽

High Quality ◽

Pharmacological Activities ◽

Genetic Studies

Abstract Ganoderma leucocontextum, a newly discovered species of Ganodermataceae in China, has diverse pharmacological activities. G. leucocontextum was widely cultivated in southwest China, but the systematic genetic study has been impeded by the lack of a reference genome. Herein, we present the first whole-genome assembly of G. leucocontextum based on the Illumina and Nanopore platform from high-quality DNA extracted from a monokaryon strain (DH-8). The generated genome was 50.05 Mb in size with a N50 scaﬀold size of 3.06 Mb, 78,206 coding sequences and 13,390 putative genes. Genome completeness was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool, which identified 96.55% of the 280 Fungi BUSCO genes. Furthermore, differences in functional genes of secondary metabolites (terpenoids) were analyzed between G. leucocontextum and G. lucidum. G. leucocontextum has more genes related to terpenoids synthesis compared to G. lucidum, which may be one of the reasons why they exhibit different biological activities. This is the first genome assembly and annotation for G. leucocontextum, which would enrich the toolbox for biological and genetic studies in G. leucocontextum.

Download Full-text

Genome Assembly and Population Resequencing Reveal the Geographical Divergence of 'Shanmei'(Rubus corchorifolius)

10.1101/2021.11.22.469527 ◽

2021 ◽

Author(s):

Yinqing Yang ◽

Kang Zhang ◽

Ya Xiao ◽

Lingkui Zhang ◽

Yile Huang ◽

...

Keyword(s):

Genome Assembly ◽

Ancestral Population ◽

Effective Population ◽

High Quality ◽

Local Environments ◽

Rubus Species ◽

Rubus Chingii ◽

Rubus Chingii Hu ◽

Genomic Regions ◽

High Quality Genome

Rubus corchorifolius (Shanmei or mountain berry, 2n =14) is widely distributed in China, and its fruit has high nutritional and medicinal values. Here, we report a high-quality chromosome-scale genome assembly of Shanmei, with a size of 215.69 Mb and encompassing 26696 genes. Genome comparisons among Rosaceae species show that Shanmei and Fupenzi(Rubus chingii Hu) are most closely related, and then is blackberry (Rubus occidentalis). Further resequencing of 101 samples of Shanmei collected from four regions in provinces of Yunnan, Hunan, Jiangxi and Sichuan in South China reveals that the Hunan population of Shanmei possesses the highest diversity and may represent the relatively more ancestral population. Moreover, the Yunnan population undergoes strong selection based on nucleotide diversity, linkage disequilibrium and the historical effective population size analyses. Furthermore, genes from candidate genomic regions that show strong divergence are significantly enriched in flavonoid biosynthesis and plant hormone signal transduction, indicating the genetic basis of adaptation of Shanmei to the local environments. The high-quality genome sequences and the variome dataset of Shanmei provide valuable resources for breeding applications and for elucidating the genome evolution and ecological adaptation of Rubus species.

Download Full-text