Pacific Biosciences assembly with Hi-C mapping generates an improved, chromosome-level goose genome

ABSTRACT Background The domestic goose is an economically important and scientifically valuable waterfowl; however, a lack of high-quality genomic data has hindered research concerning its genome, genetics, and breeding. As domestic geese breeds derive from both the swan goose (Anser cygnoides) and the graylag goose (Anser anser), we selected a female Tianfu goose for genome sequencing. We generated a chromosome-level goose genome assembly by adopting a hybrid de novo assembly approach that combined Pacific Biosciences single-molecule real-time sequencing, high-throughput chromatin conformation capture mapping, and Illumina short-read sequencing. Findings We generated a 1.11-Gb goose genome with contig and scaffold N50 values of 1.85 and 33.12 Mb, respectively. The assembly contains 39 pseudo-chromosomes (2n = 78) accounting for ∼88.36% of the goose genome. Compared with previous goose assemblies, our assembly has more continuity, completeness, and accuracy; the annotation of core eukaryotic genes and universal single-copy orthologs has also been improved. We have identified 17,568 protein-coding genes and a repeat content of 8.67% (96.57 Mb) in this genome assembly. We also explored the spatial organization of chromatin and gene expression in the goose liver tissues, in terms of inter-pseudo-chromosomal interaction patterns, compartments, topologically associating domains, and promoter-enhancer interactions. Conclusions We present the first chromosome-level assembly of the goose genome. This will be a valuable resource for future genetic and genomic studies on geese.

Download Full-text

A chromosome-level genome assembly for the beet armyworm (Spodoptera exigua) using PacBio and Hi-C sequencing

10.1101/2019.12.26.889121 ◽

2019 ◽

Cited By ~ 1

Author(s):

Feng Zhang ◽

Jianpeng Zhang ◽

Yihua Yang ◽

Yidong Wu

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Spodoptera Exigua ◽

Management Strategies ◽

Single Copy ◽

Gene Family Evolution ◽

Beet Armyworm ◽

Host Plant Specialization ◽

Niche Adaptation ◽

Chromosome Level

AbstractBackgroundThe beet armyworm, Spodoptera exigua (Hübner), is a worldwide, polyphagous agricultural pest feeding on vegetable, field, and flower crops. However, the lack of genome information on this insect severely limits our understanding of its rapid adaptation and hampers the development of efficient pest management strategies.FindingsWe report a chromosome-level genome assembly using single-molecule real-time PacBio sequencing and Hi-C data. The final genome assembly was 446.80 Mb with a scaffold N50 of 14.36 Mb, and captured 97.9% complete arthropod Benchmarking Universal Single-Copy Orthologs (BUSCO, n=1,658). A total of 367 contigs were anchored to 32 pseudo-chromosomes, covering 96.18% (429.74 Mb) of the total genome length. We predicted 17,727 protein-coding genes, of which 81.60% were supported by transcriptome evidence and 96.47% matched UniProt protein records. We also identified 867,102 (147.97 Mb/33.12%) repetitive elements and 1,609 noncoding RNAs. Synteny inference indicated a conserved collinearity between three lepidopteran species. Gene family evolution and function enrichment analyses showed the significant expansions in families related to development, dietary, detoxification and chemosensory system, indicating these families may play a role in host plant specialization and niche adaptation.ConclusionsWe have generated a high-quality chromosomal-level genome that could provide a valuable resource for a better understanding and management of the beet armyworm.

Download Full-text

A chromosome-level genome assembly of the Chinese tupelo Nyssa sinensis

Scientific Data ◽

10.1038/s41597-019-0296-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Xuchen Yang ◽

Minghui Kang ◽

Yanting Yang ◽

Haifeng Xiong ◽

Mingcheng Wang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Chromosome Conformation ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Data Matching ◽

Long Reads ◽

Autumn Leaf ◽

Chromosome Level

AbstractThe deciduous Chinese tupelo (Nyssa sinensis Oliv.) is a popular ornamental tree for the spectacular autumn leaf color. Here, using single-molecule sequencing and chromosome conformation capture data, we report a high-quality, chromosome-level genome assembly of N. sinensis. PacBio long reads were de novo assembled into 647 polished contigs with a total length of 1,001.42 megabases (Mb) and an N50 size of 3.62 Mb, which is in line with genome sizes estimated using flow cytometry and the k-mer analysis. These contigs were further clustered and ordered into 22 pseudo-chromosomes based on Hi-C data, matching the chromosome counts in Nyssa obtained from previous cytological studies. In addition, a total of 664.91 Mb of repetitive elements were identified and a total of 37,884 protein-coding genes were predicted in the genome of N. sinensis. All data were deposited in publicly available repositories, and should be a valuable resource for genomics, evolution, and conservation biology.

Download Full-text

Live-cell imaging reveals the spatiotemporal organization of endogenous RNA polymerase II phosphorylation at a single gene

Nature Communications ◽

10.1038/s41467-021-23417-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Linda S. Forero-Quintero ◽

William Raymond ◽

Tetsuya Handa ◽

Matthew N. Saxton ◽

Tatsuya Morisaki ◽

...

Keyword(s):

Rna Polymerase ◽

Rna Polymerase Ii ◽

Single Molecule ◽

Computational Models ◽

Spatial Organization ◽

Fluorescent Antibody ◽

Single Gene ◽

Single Copy ◽

Living Cells ◽

Live Cell

AbstractThe carboxyl-terminal domain of RNA polymerase II (RNAP2) is phosphorylated during transcription in eukaryotic cells. While residue-specific phosphorylation has been mapped with exquisite spatial resolution along the 1D genome in a population of fixed cells using immunoprecipitation-based assays, the timing, kinetics, and spatial organization of phosphorylation along a single-copy gene have not yet been measured in living cells. Here, we achieve this by combining multi-color, single-molecule microscopy with fluorescent antibody-based probes that specifically bind to different phosphorylated forms of endogenous RNAP2 in living cells. Applying this methodology to a single-copy HIV-1 reporter gene provides live-cell evidence for heterogeneity in the distribution of RNAP2 along the length of the gene as well as Serine 5 phosphorylated RNAP2 clusters that remain separated in both space and time from nascent mRNA synthesis. Computational models determine that 5 to 40 RNAP2 cluster around the promoter during a typical transcriptional burst, with most phosphorylated at Serine 5 within 6 seconds of arrival and roughly half escaping the promoter in ~1.5 minutes. Taken together, our data provide live-cell support for the notion of efficient transcription clusters that transiently form around promoters and contain high concentrations of RNAP2 phosphorylated at Serine 5.

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Genomic Analysis of Sarcomyxa edulis Reveals the Basis of Its Medicinal Properties and Evolutionary Relationships

Frontiers in Microbiology ◽

10.3389/fmicb.2021.652324 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fenghua Tian ◽

Changtian Li ◽

Yu Li

Keyword(s):

Single Molecule ◽

De Novo ◽

Genomic Analysis ◽

Single Copy ◽

Whole Genome Sequence ◽

Type I ◽

Whole Genome ◽

Uridine Diphosphate ◽

Protein Coding ◽

Medicinal Value

Yuanmo [Sarcomyxa edulis (Y.C. Dai, Niemelä & G.F. Qin) T. Saito, Tonouchi & T. Harada] is an important edible and medicinal mushroom endemic to Northeastern China. Here we report the de novo sequencing and assembly of the S. edulis genome using single-molecule real-time sequencing technology. The whole genome was approximately 35.65 Mb, with a G + C content of 48.31%. Genome assembly generated 41 contigs with an N50 length of 1,772,559 bp. The genome comprised 9,364 annotated protein-coding genes, many of which encoded enzymes involved in the modification, biosynthesis, and degradation of glycoconjugates and carbohydrates or enzymes predicted to be involved in the biosynthesis of secondary metabolites such as terpene, type I polyketide, siderophore, and fatty acids, which are responsible for the pharmacodynamic activities of S. edulis. We also identified genes encoding 1,3-β-glucan synthase and endo-1,3(4)-β-glucanase, which are involved in polysaccharide and uridine diphosphate glucose biosynthesis. Phylogenetic and comparative analyses of Basidiomycota fungi based on a single-copy orthologous protein indicated that the Sarcomyxa genus is an independent group that evolved from the Pleurotaceae family. The annotated whole-genome sequence of S. edulis can serve as a reference for investigations of bioactive compounds with medicinal value and the development and commercial production of superior S. edulis varieties.

Download Full-text

De novo whole-genome assembly in interspecific hybrid table grape, ‘Shine Muscat’

10.1101/730762 ◽

2019 ◽

Cited By ~ 2

Author(s):

Kenta Shirasawa ◽

Akifumi Azuma ◽

Fumiya Taniguchi ◽

Toshiya Yamamoto ◽

Akihiko Sato ◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

De Novo ◽

Sequence Data ◽

Genome Structure ◽

Table Grape ◽

Sequencing Analysis ◽

Entire Genome ◽

Table Grapes ◽

Eukaryotic Genes

AbstractThis study presents the first genome sequence of an interspecific grape hybrid, ‘Shine Muscat’ (Vitis labruscana × V. vinifera), an elite table grape cultivar bred in Japan. The complexity of the genome structure, arising from the interspecific hybridization, necessitated the use of a sophisticated genome assembly pipeline with short-read genome sequence data. The resultant genome assemblies consisted of two types of sequences: a haplotype-phased sequence of the highly heterozygous genomes and an unphased sequence representing a “haploid” genome. The unphased sequences spanned 490.1 Mb in length, 99.4% of the estimated genome size, with 8,696 scaffold sequences with an N50 length of 13.2 Mb. The phased sequences had 15,650 scaffolds spanning 1.0 Gb with N50 of 4.2 Mb. The two sequences comprised 94.7% and 96.3% of the core eukaryotic genes, indicating that the entire genome of ‘Shine Muscat’ was represented. Examination of genome structures revealed possible genome rearrangements between the genomes of ‘Shine Muscat’ and a V. vinifera line. Furthermore, full-length transcriptome sequencing analysis revealed 13,947 gene loci on the ‘Shine Muscat’ genome, from which 26,199 transcript isoforms were transcribed. These genome resources provide new insights that could help cultivation and breeding strategies produce more high-quality table grapes such as ‘Shine Muscat’.

Download Full-text

De novo chromosome-level genome assembly of Chinese walnut (Juglans cathayensis Dode)

10.22541/au.159007642.20504888 ◽

2020 ◽

Author(s):

Feng Yan ◽

Rui Min Xi ◽

Rui Xue She ◽

Yu Jie Yan ◽

Peng Peng Chen ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Chinese Walnut ◽

Chromosome Level

Download Full-text

Chromosome-level de novo genome assembly of Sarcophaga peregrina provides insights into the evolutionary adaptation of flesh flies

Authorea ◽

10.22541/au.158446791.17846463 ◽

2020 ◽

Author(s):

Lipin Ren ◽

Yanjie Shang ◽

Li Yang ◽

Shiwen Wang ◽

Xiang Wang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Evolutionary Adaptation ◽

De Novo Genome Assembly ◽

Flesh Flies ◽

Chromosome Level

Download Full-text

De Novo Assembly of a High-Quality Reference Genome for the Horned Lark (Eremophila alpestris)

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400846 ◽

2019 ◽

Vol 10 (2) ◽

pp. 475-478 ◽

Cited By ~ 3

Author(s):

Nicholas A. Mason ◽

Paulo Pulgarin ◽

Carlos Daniel Cadena ◽

Irby J. Lovette

Keyword(s):

Genome Assembly ◽

De Novo ◽

Single Copy ◽

Single Copy Gene ◽

High Quality ◽

Data Set ◽

Copy Gene ◽

Assembly Pipeline ◽

Horned Lark ◽

Gene Orthologs

The Horned Lark (Eremophila alpestris) is a small songbird that exhibits remarkable geographic variation in appearance and habitat across an expansive distribution. While E. alpestris has been the focus of many ecological and evolutionary studies, we still lack a highly contiguous genome assembly for the Horned Lark and related taxa (Alaudidae). Here, we present CLO_EAlp_1.0, a highly contiguous assembly for E. alpestris generated from a blood sample of a wild, male bird captured in the Altiplano Cundiboyacense of Colombia. By combining short-insert and mate-pair libraries with the ALLPATHS-LG genome assembly pipeline, we generated a 1.04 Gb assembly comprised of 2713 scaffolds, with a largest scaffold size of 31.81 Mb, a scaffold N50 of 9.42 Mb, and a scaffold L50 of 30. These scaffolds were assembled from 23685 contigs, with a largest contig size of 1.69 Mb, a contig N50 of 193.81 kb, and a contig L50 of 1429. Our assembly pipeline also produced a single mitochondrial DNA contig of 14.00 kb. After polishing the genome, we identified 94.5% of single-copy gene orthologs from an Aves data set and 97.7% of single-copy gene orthologs from a vertebrata data set, which further demonstrates the high quality of our assembly. We anticipate that this genomic resource will be useful to the broader ornithological community and those interested in studying the evolutionary history and ecological interactions of larks, which comprise a widespread, yet understudied lineage of songbirds.

Download Full-text

The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies

Scientific Data ◽

10.1038/s41597-019-0194-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Baohua Chen ◽

Zhixiong Zhou ◽

Qiaozhen Ke ◽

Yidi Wu ◽

Huaqiang Bai ◽

...

Keyword(s):

Marine Fish ◽

Single Molecule ◽

Large Scale ◽

Reference Genome ◽

De Novo ◽

Larimichthys Crocea ◽

Chromosome Conformation ◽

Protein Coding ◽

Total Length ◽

Chromosome Level

Abstract Larimichthys crocea is an endemic marine fish in East Asia that belongs to Sciaenidae in Perciformes. L. crocea has now been recognized as an “iconic” marine fish species in China because not only is it a popular food fish in China, it is a representative victim of overfishing and still provides high value fish products supported by the modern large-scale mariculture industry. Here, we report a chromosome-level reference genome of L. crocea generated by employing the PacBio single molecule sequencing technique (SMRT) and high-throughput chromosome conformation capture (Hi-C) technologies. The genome sequences were assembled into 1,591 contigs with a total length of 723.86 Mb and a contig N50 length of 2.83 Mb. After chromosome-level scaffolding, 24 scaffolds were constructed with a total length of 668.67 Mb (92.48% of the total length). Genome annotation identified 23,657 protein-coding genes and 7262 ncRNAs. This highly accurate, chromosome-level reference genome of L. crocea provides an essential genome resource to support the development of genome-scale selective breeding and restocking strategies of L. crocea.

Download Full-text