High quality, phased genomes of Phytophthora ramorum clonal lineages NA1 and EU1

Mapping Intimacies ◽

10.1101/2021.06.23.449625 ◽

2021 ◽

Author(s):

Nicholas C Carleson ◽

Caroline M Press ◽

Niklaus J Grunwald

Keyword(s):

Reference Genome ◽

Phytophthora Ramorum ◽

Sudden Oak Death ◽

Valuable Resource ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Protein Coding Genes ◽

Live Oak ◽

Long Read

Phytophthora ramorum is the causal agent of sudden oak death in West Coast forests and currently two clonal lineages, NA1 and EU1, cause epidemics in Oregon forests. Here, we report on two high-quality genomes of individuals belonging to the NA1 and EU1 clonal lineages respectively, using PacBio long-read sequencing. The NA1 strain Pr102, originally isolated from coast live oak in California, is the current reference genome and was previously sequenced independently using either Sanger (P. ramorum v1) or PacBio (P. ramorum v2) technology. The EU1 strain PR-15-019 was obtained from tanoak in Oregon. These new genomes have a total size of 57.5 Mb, with a contig N50 length of ~3.5-3.6 Mb and encode ~15,300 predicted protein-coding genes. Genomes were assembled into 27 and 28 scaffolds with 95% BUSCO scores and are considerably improved relative to the current JGI reference genome with 2,575 or the PacBio genomes with 1,512 scaffolds. These high-quality genomes provide a valuable resource for studying the genetics, evolution, and adaptation of these two clonal lineages.

Download Full-text

A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant

Horticulture Research ◽

10.1038/s41438-020-00391-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Qingzhen Wei ◽

Jinglei Wang ◽

Wuhong Wang ◽

Tianhua Hu ◽

Haijiao Hu ◽

...

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Specific Gene ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Fruit Length ◽

Protein Coding Genes

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.

Download Full-text

A new long-read dog assembly uncovers thousands of exons and functional elements missing in the previous reference

10.1101/2020.07.02.185108 ◽

2020 ◽

Cited By ~ 2

Author(s):

Chao Wang ◽

Ola Wallerman ◽

Maja-Louise Arendt ◽

Elisabeth Sundström ◽

Åsa Karlsson ◽

...

Keyword(s):

Reference Genome ◽

Cancer Genes ◽

Rna Seq ◽

Structural Variants ◽

Functional Elements ◽

High Quality ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Genomic Regions

AbstractHere we present a new high-quality canine reference genome with gap number reduced 41-fold, from 23,836 to 585. Analysis of existing and novel data, RNA-seq, miRNA-seq and ATAC-seq, revealed a large proportion of these harboured previously hidden elements, including genes, promoters and miRNAs. Short-read dark regions were detected, and genomic regions completed, including the DLA, TCR and 366 cancer genes. 10x sequencing of 27 dogs uncovered a total of 22.1 million SNPs, Indels and larger structural variants (SVs). 1.4% overlap with protein coding genes and could provide a source of normal or aberrant phenotypic modifications.

Download Full-text

A Chromosome-Scale Genome Assembly Resource for Myriosclerotinia sulcatula Infecting Sedge Grass (Carex sp.)

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-03-20-0060-a ◽

2020 ◽

Vol 33 (7) ◽

pp. 880-883

Author(s):

Stefan Kusch ◽

Heba M. M. Ibrahim ◽

Catherine Zanchetta ◽

Celine Lopez-Roques ◽

Cecile Donnadieu ◽

...

Keyword(s):

Host Range ◽

Sclerotinia Sclerotiorum ◽

Genome Assembly ◽

Plant Pathogens ◽

Reference Genome ◽

Close Relative ◽

High Quality ◽

Protein Coding ◽

Protein Coding Genes ◽

Reference Genome Assembly

The fungus Myriosclerotinia sulcatula is a close relative of the notorious polyphagous plant pathogens Botrytis cinerea and Sclerotinia sclerotiorum but exhibits a host range restricted to plants from the Carex genus (Cyperaceae family). To date, there are no genomic resources available for fungi in the Myriosclerotinia genus. Here, we present a chromosome-scale reference genome assembly for M. sulcatula. The assembly contains 24 contigs with a total length of 43.53 Mbp, with scaffold N50 of 2,649.7 kbp and N90 of 1,133.1 kbp. BRAKER-predicted gene models were manually curated using WebApollo, resulting in 11,275 protein-coding genes that we functionally annotated. We provide a high-quality reference genome assembly and annotation for M. sulcatula as a resource for studying evolution and pathogenicity in fungi from the Sclerotiniaceae family.

Download Full-text

Genome Sequence of Fusarium oxysporum f. sp. conglutinans, the Etiological Agent of Cabbage Fusarium Wilt

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-08-20-0245-a ◽

2020 ◽

pp. MPMI-08-20-0245

Author(s):

Fangwei Yu ◽

Wei Zhang ◽

Shenyun Wang ◽

Hong Wang ◽

Li Yu ◽

...

Keyword(s):

Fusarium Oxysporum ◽

Fusarium Wilt ◽

Genome Sequence ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Long Read ◽

Race 1 ◽

High Quality Genome ◽

Open Access Article

Fusarium oxysporum f. sp. conglutinans is the causal agent of Fusarium wilt of cabbage (Brassica oleracea var. capitata L.), which results in severe yield loss. Here, we report a high-quality genome sequence of a race 1 strain (IVC-1) of F. oxysporum f. sp. conglutinans, which was assembled using a combination of PacBio long-read and Illumina short-read sequences. The assembled IVC-1 genome has a total size of 71.18 Mb, with a contig N50 length of 4.59 Mb, and encodes 23,374 predicted protein-coding genes. The high-quality genome of IVC-1 provides a valuable resource for facilitating our understanding of F. oxysporum f. sp. conglutinans–cabbage interaction. [Formula: see text] Copyright © 2020 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license .

Download Full-text

Construction of a new chromosome-scale, long-read reference genome assembly of the Syrian hamster, Mesocricetus auratus

10.1101/2021.07.05.451071 ◽

2021 ◽

Author(s):

R. Alan Harris ◽

Muthuswamy Raveendran ◽

Dustin T Lyfoung ◽

Fritz J Sedlazeck ◽

Medhat Mahmoud ◽

...

Keyword(s):

Genome Assembly ◽

Syrian Hamster ◽

Reference Genome ◽

Sequence Data ◽

Mesocricetus Auratus ◽

Protein Coding ◽

Protein Coding Genes ◽

Sequencing Technologies ◽

Long Read ◽

Short Read Sequence

Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400922 ◽

2020 ◽

Vol 10 (3) ◽

pp. 891-897 ◽

Cited By ~ 3

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Genome sequence of the model rice variety KitaakeX

10.1101/653089 ◽

2019 ◽

Author(s):

Rashmi Jain ◽

Jerry Jenkins ◽

Shengqiang Shu ◽

Mawsheng Chern ◽

Joel A. Martin ◽

...

Keyword(s):

Oryza Sativa ◽

Rice Plant ◽

De Novo ◽

Rice Variety ◽

High Quality ◽

Protein Coding ◽

Genomic Variations ◽

Protein Coding Genes ◽

Gene Annotations ◽

High Quality Genome

AbstractHere, we report the de novo genome sequencing and analysis of Oryza sativa ssp. japonica variety KitaakeX, a Kitaake plant carrying the rice XA21 immune receptor. Our KitaakeX sequence assembly contains 377.6 Mb, consisting of 33 scaffolds (476 contigs) with a contig N50 of 1.4 Mb. Complementing the assembly are detailed gene annotations of 35,594 protein coding genes. We identified 331,335 genomic variations between KitaakeX and Nipponbare (ssp. japonica), and 2,785,991 variations between KitaakeX and Zhenshan97 (ssp. indica). We also compared Kitaake resequencing reads to the KitaakeX assembly and identified 219 small variations. The high-quality genome of the model rice plant KitaakeX will accelerate rice functional genomics.

Download Full-text

Near Chromosome-Level Genome Assembly and Annotation of Rhodotorula babjevae Strains Reveals High Intraspecific Divergence

10.20944/preprints202111.0517.v1 ◽

2021 ◽

Author(s):

Giselle C. Martin-Hernandez ◽

Bettina Müller ◽

Christian Brandt ◽

Martin Hölzer ◽

Adrian Viehweger ◽

...

Keyword(s):

Gc Content ◽

Pairwise Identity ◽

Protein Coding ◽

Intraspecific Divergence ◽

Protein Coding Genes ◽

Fungal Evolution ◽

Lignocellulose Hydrolysate ◽

Long Read ◽

Genome Assemblies ◽

High Quality Genome

The genus Rhodotorula includes basidiomycetous oleaginous yeast species. R. babjevae can produce compounds of biotechnological interest such as lipids, carotenoids and biosurfactants from low value substrates such as lignocellulose hydrolysate. High-quality genome assemblies are needed to develop genetic tools and to understand fungal evolution and genetics. Here, we combined short- and long-read sequencing to resolve the genomes of two R. babjevae strains, CBS 7808 (type strain) and DBVPG 8058 at chromosomal level. Both genomes have a size of 21 Mbp and a GC content of 68.2%. Allele frequency analysis indicated tetraploidy in both strains. They harbor 21 putative chromosomes with sizes ranging from 0.4 to 2.4 Mb. In both assemblies, the mitochondrial genome was recovered in a single contig, which shared 97% pairwise identity. The pairwise identity between the majority of chromosomes ranges from 82% to 87%. We found indications for strain-specific extrachromosomal endogenous DNA. 7,591 protein-coding genes and 7,607 associated transcripts were annotated in CBS 7808 and 7,481 protein-coding genes and 7,516 associated transcripts in DBVPG 8058. CBS 7808 has accumulated a higher number of tandem duplications than DBVPG 8058. We identified large translocation events between putative chromosomes and a high genetic divergence between the two strains.

Download Full-text

High-quality genome assembly and high-density genetic map of asparagus bean

10.1101/521179 ◽

2019 ◽

Author(s):

Qiuju Xia ◽

Ru Zhang ◽

Xuemei Ni ◽

Lei Pan ◽

Yangzi Wang ◽

...

Keyword(s):

Genome Assembly ◽

Agronomic Traits ◽

High Density ◽

High Quality ◽

Protein Coding ◽

Protein Coding Genes ◽

Economically Valuable Traits ◽

Sequencing Strategy ◽

Asparagus Bean ◽

High Quality Genome

AbstractAsparagus bean (Vigna. unguiculata ssp. sesquipedialis), known for its very long and tender green pods, is an important vegetable crop broadly grown in the developing countries. Despite its agricultural and economic values, asparagus bean does not have a high-quality genome assembly for breeding novel agronomic traits. In this study, we reported a high-quality 632.8 Mb assembly of asparagus bean based on the whole genome shotgun sequencing strategy. We also generated a high-density linkage map for asparagus bean, which helped anchor 94.42% of the scaffolds into 11 pseudo-chromosomes. A total of 42,609 protein-coding genes and 3,579 non-protein-coding genes were predicted from the assembly. Taken together, these genomic resources of asparagus bean will facilitate the investigation of economically valuable traits in a variety of legume species, so that the cultivation of these plants would help combat the protein and energy malnutrition in the developing world.

Download Full-text