The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies

Baohua Chen; Zhixiong Zhou; Qiaozhen Ke; Yidi Wu; Huaqiang Bai; Fei Pu; Peng Xu

doi:10.1038/s41597-019-0194-3

The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies

Scientific Data ◽

10.1038/s41597-019-0194-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Baohua Chen ◽

Zhixiong Zhou ◽

Qiaozhen Ke ◽

Yidi Wu ◽

Huaqiang Bai ◽

...

Keyword(s):

Marine Fish ◽

Single Molecule ◽

Large Scale ◽

Reference Genome ◽

De Novo ◽

Larimichthys Crocea ◽

Chromosome Conformation ◽

Protein Coding ◽

Total Length ◽

Chromosome Level

Abstract Larimichthys crocea is an endemic marine fish in East Asia that belongs to Sciaenidae in Perciformes. L. crocea has now been recognized as an “iconic” marine fish species in China because not only is it a popular food fish in China, it is a representative victim of overfishing and still provides high value fish products supported by the modern large-scale mariculture industry. Here, we report a chromosome-level reference genome of L. crocea generated by employing the PacBio single molecule sequencing technique (SMRT) and high-throughput chromosome conformation capture (Hi-C) technologies. The genome sequences were assembled into 1,591 contigs with a total length of 723.86 Mb and a contig N50 length of 2.83 Mb. After chromosome-level scaffolding, 24 scaffolds were constructed with a total length of 668.67 Mb (92.48% of the total length). Genome annotation identified 23,657 protein-coding genes and 7262 ncRNAs. This highly accurate, chromosome-level reference genome of L. crocea provides an essential genome resource to support the development of genome-scale selective breeding and restocking strategies of L. crocea.

Get full-text (via PubEx)

The sequence and de novo assembly of Takifugu bimaculatus genome using PacBio and Hi-C technologies

Scientific Data ◽

10.1038/s41597-019-0195-2 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 7

Author(s):

Zhixiong Zhou ◽

Bo Liu ◽

Baohua Chen ◽

Yue Shi ◽

Fei Pu ◽

...

Keyword(s):

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Breeding Programs ◽

Chromosome Conformation ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Total Length ◽

Genome Scale ◽

Chromosome Level

Abstract Takifugu bimaculatus is a native teleost species of the southeast coast of China where it has been cultivated as an important edible fish in the last decade. Genetic breeding programs, which have been recently initiated for improving the aquaculture performance of T. bimaculatus, urgently require a high-quality reference genome to facilitate genome selection and related genetic studies. To address this need, we produced a chromosome-level reference genome of T. bimaculatus using the PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. The genome was assembled into 2,193 contigs with a total length of 404.21 Mb and a contig N50 length of 1.31 Mb. After chromosome-level scaffolding, 22 chromosomes with a total length of 371.68 Mb were constructed. Moreover, a total of 21,117 protein-coding genes and 3,471 ncRNAs were annotated in the reference genome. The highly accurate, chromosome-level reference genome of T. bimaculatus provides an essential genome resource for not only the genome-scale selective breeding of T. bimaculatus but also the exploration of the evolutionary basis of the speciation and local adaptation of the Takifugu genus.

Get full-text (via PubEx)

A chromosome-level genome assembly of the Chinese tupelo Nyssa sinensis

Scientific Data ◽

10.1038/s41597-019-0296-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Xuchen Yang ◽

Minghui Kang ◽

Yanting Yang ◽

Haifeng Xiong ◽

Mingcheng Wang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Chromosome Conformation ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Data Matching ◽

Long Reads ◽

Autumn Leaf ◽

Chromosome Level

AbstractThe deciduous Chinese tupelo (Nyssa sinensis Oliv.) is a popular ornamental tree for the spectacular autumn leaf color. Here, using single-molecule sequencing and chromosome conformation capture data, we report a high-quality, chromosome-level genome assembly of N. sinensis. PacBio long reads were de novo assembled into 647 polished contigs with a total length of 1,001.42 megabases (Mb) and an N50 size of 3.62 Mb, which is in line with genome sizes estimated using flow cytometry and the k-mer analysis. These contigs were further clustered and ordered into 22 pseudo-chromosomes based on Hi-C data, matching the chromosome counts in Nyssa obtained from previous cytological studies. In addition, a total of 664.91 Mb of repetitive elements were identified and a total of 37,884 protein-coding genes were predicted in the genome of N. sinensis. All data were deposited in publicly available repositories, and should be a valuable resource for genomics, evolution, and conservation biology.

Get full-text (via PubEx)

De novo assembly of the cattle reference genome with single-molecule sequencing

GigaScience ◽

10.1093/gigascience/giaa021 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 35

Author(s):

Benjamin D Rosen ◽

Derek M Bickhart ◽

Robert D Schnabel ◽

Sergey Koren ◽

Christine G Elsik ◽

...

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Future Research ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Assembly Accuracy ◽

Genomic Tools

Abstract Background Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10–12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. Results We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. Conclusions We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.

Get full-text (via PubEx)

The chromosome-level draft genome of Dalbergia odorifera

GigaScience ◽

10.1093/gigascience/giaa084 ◽

2020 ◽

Vol 9 (8) ◽

Cited By ~ 1

Author(s):

Zhou Hong ◽

Jiang Li ◽

Xiaojin Liu ◽

Jinmin Lian ◽

Ningnan Zhang ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Populus Trichocarpa ◽

Draft Genome ◽

Heartwood Formation ◽

Genetic Studies ◽

Protein Coding ◽

Dalbergia Odorifera ◽

Paired End Sequencing ◽

Chromosome Level

Abstract Background Dalbergia odorifera T. Chen (Fabaceae) is an International Union for Conservation of Nature red-listed tree. This tree is of high medicinal and commercial value owing to its officinal, insect-proof, durable heartwood. However, there is a lack of genome reference, which has hindered development of studies on the heartwood formation. Findings We presented the first chromosome-scale genome assembly of D. odorifera obtained on the basis of Illumina paired-end sequencing, Pacific Biosciences single-molecule real-time sequencing, 10x Genomics linked reads, and Hi-C technology. We assembled 97.68% of the 653.45 Mb D. odorifera genome with scaffold N50 and contig sizes of 56.16 and 5.92 Mb, respectively. Ten super-scaffolds corresponding to the 10 chromosomes were assembled, with the longest scaffold reaching 79.61 Mb. Repetitive elements account for 54.17% of the genome, and 30,310 protein-coding genes were predicted from the genome, of which ∼92.6% were functionally annotated. The phylogenetic tree showed that D. odorifera diverged from the ancestor of Arabidopsis thaliana and Populus trichocarpa and then separated from Glycine max and Cajanus cajan. Conclusions We sequence and reveal the first chromosome-level de novo genome of D. odorifera. These studies provide valuable genomic resources for the research of heartwood formation in D. odorifera and other timber trees. The high-quality assembled genome can also be used as reference for comparative genomics analysis and future population genetic studies of D. odorifera.

Get full-text (via PubEx)

High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome

GigaScience ◽

10.1093/gigascience/giaa050 ◽

2020 ◽

Vol 9 (5) ◽

Cited By ~ 5

Author(s):

Annarita Marrano ◽

Monica Britton ◽

Paulo A Zaini ◽

Aleksey V Zimin ◽

Rachael E Workman ◽

...

Keyword(s):

Single Molecule ◽

Reference Genome ◽

Juglans Regia ◽

Male Flower ◽

Fold Increase ◽

Functional Variation ◽

Chromosome Conformation ◽

Protein Coding ◽

Assembly Features ◽

Proteome Changes

Abstract Background The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes. Findings Here, we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, with the 16 chromosomal pseudomolecules assembled and representing 95% of its total length. Using full-length transcripts from single-molecule real-time sequencing, we predicted 37,554 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) present both start and stop codons, which represents a significant improvement compared with Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during male flower development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of a new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars. Conclusion Overall, Chandler v2.0 will serve as a valuable resource to better understand and explore walnut biology.

Get full-text (via PubEx)

Chromosome-Level Assembly of the Southern Rock Bream (Oplegnathus fasciatus) Genome Using PacBio and Hi-C Technologies

Frontiers in Genetics ◽

10.3389/fgene.2021.811798 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yulin Bai ◽

Jie Gong ◽

Zhixiong Zhou ◽

Bijun Li ◽

Ji Zhao ◽

...

Keyword(s):

Single Molecule ◽

Reference Genome ◽

Rocky Reef ◽

Northwest Pacific ◽

Northwest Pacific Ocean ◽

Oplegnathus Fasciatus ◽

Marine Aquaculture ◽

Total Length ◽

Rock Bream ◽

Chromosome Level

The Rock Bream (Oplegnathus fasciatus) is an economically important rocky reef fish of the Northwest Pacific Ocean. In recent years, it has been cultivated as an important edible fish in coastal areas of China. Despite its economic importance, genome-wide adaptions of domesticated O. fasciatus are largely unknown. Here we report a chromosome-level reference genome of female O. fasciatus (from the southern population in the subtropical region) using the PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. The genome was assembled into 120 contigs with a total length of 732.95 Mb and a contig N50 length of 27.33 Mb. After chromosome-level scaffolding, 24 chromosomes with a total length of 723.22 Mb were constructed. Moreover, a total of 27,015 protein-coding genes and 5,880 ncRNAs were annotated in the reference genome. This reference genome of O. fasciatus will provide an important resource not only for basic ecological and population genetic studies but also for dissect artificial selection mechanisms in marine aquaculture.

Get full-text (via PubEx)

Chromosome-level genome assembly of Aldrichina grahami, a forensically important blowfly

GigaScience ◽

10.1093/gigascience/giaa020 ◽

2020 ◽

Vol 9 (3) ◽

Author(s):

Fanming Meng ◽

Zhuoying Liu ◽

Han Han ◽

Dmitrijs Finkelbergs ◽

Yangshuai Jiang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Phylogenetic Reconstruction ◽

Gene Families ◽

Development Rate ◽

Future Research ◽

Chromosome Conformation ◽

Protein Coding ◽

Sequencing Platform ◽

Chromosome Level

Abstract Background Blowflies (Diptera: Calliphoridae) are the most commonly found entomological evidence in forensic investigations. Distinguished from other blowflies, Aldrichina grahami has some unique biological characteristics and is a species of forensic importance. Its development rate, pattern, and life cycle can provide valuable information for the estimation of the minimum postmortem interval. Findings Herein we provide a chromosome-level genome assembly of A. grahami that was generated by Pacific BioSciences sequencing platform and chromosome conformation capture (Hi-C) technology. A total of 50.15 Gb clean reads of the A. grahami genome were generated. FALCON and Wtdbg were used to construct the genome of A. grahami, resulting in an assembly of 600 Mb and 1,604 contigs with an N50 size of 1.93 Mb. We predicted 12,823 protein-coding genes, 99.8% of which was functionally annotated on the basis of the de novo genome (SRA: PRJNA513084) and transcriptome (SRA: SRX5207346) of A. grahami. According to the co-analysis with 11 other insect species, clustering and phylogenetic reconstruction of gene families were performed. Using Hi-C sequencing, a chromosome-level assembly of 6 chromosomes was generated with scaffold N50 of 104.7 Mb. Of these scaffolds, 96.4% were anchored to the total A. grahami genome contig bases. Conclusions The present study provides a robust genome reference for A. grahami that supplements vital genetic information for nonhuman forensic genomics and facilitates the future research of A. grahami and other necrophagous blowfly species used in forensic medicine.

Get full-text (via PubEx)

Chromosome-level assembly of the common vetch reference genome (Vicia sativa)

10.1101/2021.10.11.464017 ◽

2021 ◽

Author(s):

Hangwei Xi ◽

Vy nguyen ◽

Christopher M Ward ◽

Iain R Searle

Keyword(s):

Reference Genome ◽

Cropping Systems ◽

Single Copy ◽

Vicia Sativa ◽

Arid Environments ◽

Common Vetch ◽

Chromosome Conformation ◽

Protein Coding ◽

The Usa ◽

Chromosome Level

Background: Vicia sativa L. (Common Vetch, n = 6) is an annual, herbaceous, climbing legume that is distributed in tropical, sub-tropical and temperate climates. Originating in the Fertile Crescent of the Middle East, V. sativa is now widespread and grows in the Mediterranean basin, West, Central and Eastern Asia, North and South America. V. sativa is of economic importance as a forage legume in countries such as Australia, China, and the USA and contributes valuable nitrogen to agricultural rotation cropping systems. To accelerate precision genome breeding and genomics-based selection of this legume, we here present a chromosome-level reference genome sequence for V. sativa. Results: We applied a combination of long-read Oxford Nanopore sequencing, short-read Illumina sequencing, and high-throughput chromosome conformation data (CHiCAGO and Hi-C) analysis to construct a chromosome-level genome of V. sativa. The chromosome-level assembly of six pseudo-chromosomes has a total genome length of 1.9 gigabases (Gb) with a median contig length of 684 kb. Benchmarking Universal Single-Copy Orthologs (BUSCO) of the assembly demonstrated a very high completeness of 98 % of the dicotyledonous orthologs. RNA-seq analysis and gene modelling enabled the annotation of 58,415 protein-coding genes. Conclusions: The high-quality chromosome-level genome assembly of V. sativa will provide novel insights into vetch genome evolution and be a valuable resource for genomic breeding, genetic diversity and for understanding adaption to diverse arid environments.

Get full-text (via PubEx)

High-quality chromosome-scale assembly of the walnut (Juglans regia L) reference genome

10.1101/809798 ◽

2019 ◽

Cited By ~ 1

Author(s):

Annarita Marrano ◽

Monica Britton ◽

Paulo A. Zaini ◽

Aleksey V. Zimin ◽

Rachael E. Workman ◽

...

Keyword(s):

Single Molecule ◽

Reference Genome ◽

Juglans Regia ◽

Fold Increase ◽

Full Length ◽

Functional Variation ◽

Chromosome Conformation ◽

Protein Coding ◽

Assembly Features ◽

Proteome Changes

ABSTRACTThe release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes. Here we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, and the full sequence of all 16 chromosomal pseudomolecules, nine of which present telomere sequences at both ends. Using full-length transcripts from single-molecule real-time sequencing, we predicted 40,491 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) are full-length, which represents a significant improvement compared to Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during catkin development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of a new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars. Overall, Chandler v2.0 is a valuable resource to understand and explore walnut biology better.

Get full-text (via PubEx)

AStrap: identification of alternative splicing from transcript sequences without a reference genome

Bioinformatics ◽

10.1093/bioinformatics/bty1008 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2654-2656 ◽

Cited By ~ 5

Author(s):

Guoli Ji ◽

Wenbin Ye ◽

Yaru Su ◽

Moliang Chen ◽

Guangzao Huang ◽

...

Keyword(s):

Machine Learning ◽

Alternative Splicing ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Supplementary Information ◽

Model Organisms ◽

Sequencing Data ◽

Extensive Evaluation ◽

Reference Genomes

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.

Get full-text (via PubEx)