Genome Size Estimation and Full-length Transcriptome of Sphingonotus tsinlingensis: Genetic Background for the Drought-Adapted Grasshopper

AbstractSphingonotus Fieber, 1852 (Orthoptera: Acrididae) is a species-rich grasshopper genus with ~146 species. All species of this genus prefer dry environments, such as: desert, steppe, sand, and stony benchland. This genomic study aimed to understand the evolution and ecology of these grasshopper species. Here, the genome size of Sphingonotus tsinlingensis was estimated using flow cytometry and the first high-quality full-length transcriptome of this species is presented, which may serve as a reference genetic resource for the drought-adapted grasshopper species of Sphingonotus Fieber. The genome size of Sphingonotus tsinlingensis was ~12.8 Gb. Based on the 146.98 Gb Pacbio isoform sequencing data, 221.47 Mb full-length transcripts were assembled. Among these transcripts, 88,693 non-redundant isoforms were identified with an average length of 2,497 bp and an N50 value of 2,726 bp, which was much longer than the formal grasshopper transcriptome assemblies. A total of 48,502 protein coding sequences were determined, and 37,569 were annotated in public gene function databases. A total of 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were also identified. According to gene functions, 70 heat shock proteins and 61 P450 genes that may correspond to drought adaptation of S. tsinlingensis were identified. The genome of Sphingonotus tsinlingensis is an ultra-large and complex genome. Full-length transcriptome sequencing is an ideal strategy for genomic research. This is the first full-length transcriptome of the genus Sphingonotus. The assembly parameters were better than all known grasshopper transcriptomes. This full-length transcriptome may be used to understand its genetic background and the evolution and ecology of grasshoppers.

Download Full-text

Genome Size Estimation and Full-Length Transcriptome of Sphingonotus tsinlingensis: Genetic Background of a Drought-Adapted Grasshopper

Frontiers in Genetics ◽

10.3389/fgene.2021.678625 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lu Zhao ◽

Hang Wang ◽

Ping Li ◽

Kuo Sun ◽

De-Long Guan ◽

...

Keyword(s):

Genome Size ◽

Genetic Background ◽

Tandem Repeats ◽

Full Length ◽

Arid Environments ◽

Size Estimation ◽

Sequencing Data ◽

Protein Coding ◽

Grasshopper Species ◽

Hsp Genes

Sphingonotus Fieber, 1852 (Orthoptera: Acrididae), is a grasshopper genus comprising approximately 170 species, all of which prefer dry environments such as deserts, steppes, and stony benchlands. In this study, we aimed to examine the adaptation of grasshopper species to arid environments. The genome size of Sphingonotus tsinlingensis was estimated using flow cytometry, and the first high-quality full-length transcriptome of this species was produced. The genome size of S. tsinlingensis is approximately 12.8 Gb. Based on 146.98 Gb of PacBio sequencing data, 221.47 Mb full-length transcripts were assembled. Among these, 88,693 non-redundant isoforms were identified with an N50 value of 2,726 bp, which was markedly longer than previous grasshopper transcriptome assemblies. In total, 48,502 protein-coding sequences were identified, and 37,569 were annotated using public gene function databases. Moreover, 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were identified. According to gene functions, 61 cytochrome P450 (CYP450) and 66 heat shock protein (HSP) genes, which may be associated with drought adaptation of S. tsinlingensis, were identified. We compared the transcriptomes of S. tsinlingensis and two other grasshopper species which were less tolerant to drought, namely Mongolotettix japonicus and Gomphocerus licenti. We observed the expression of CYP450 and HSP genes in S. tsinlingensis were higher. We produced the first full-length transcriptome of a Sphingonotus species that has an ultra-large genome. The assembly characteristics were better than those of all known grasshopper transcriptomes. This full-length transcriptome may thus be used to understand the genetic background and evolution of grasshoppers.

Download Full-text

A high-quality assembly of the nine-spined stickleback (Pungitius pungitius) genome

Genome Biology and Evolution ◽

10.1093/gbe/evz240 ◽

2019 ◽

Cited By ~ 3

Author(s):

Srinidhi Varadharajan ◽

Pasi Rastas ◽

Ari Löytynoja ◽

Michael Matschiner ◽

Federico C F Calboli ◽

...

Keyword(s):

Gasterosteus Aculeatus ◽

Tandem Repeats ◽

Copy Number Variations ◽

Repetitive Elements ◽

Genomic Research ◽

Sequencing Data ◽

Pungitius Pungitius ◽

High Quality ◽

Total Size ◽

Fish Family

Abstract The Gasterosteidae fish family hosts several species that are important models for eco-evolutionary, genetic and genomic research. In particular, a wealth of genetic and genomic data has been generated for the three-spined stickleback (Gasterosteus aculeatus), the ‘ecology’s supermodel’, while the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and ca. 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromeric-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years (MYA) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 MYA. Compared to the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.

Download Full-text

Full-length SMRT transcriptome sequencing and microsatellite characterization in Paulownia catalpifolia

Scientific Reports ◽

10.1038/s41598-021-87538-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yanzhi Feng ◽

Yang Zhao ◽

Jiajia Zhang ◽

Baoping Wang ◽

Chaowei Yang ◽

...

Keyword(s):

Single Molecule ◽

Average Length ◽

Full Length ◽

Timber Species ◽

Sequencing Data ◽

Genetic Studies ◽

Ssr Loci ◽

Average Distribution ◽

Nucleotide Repeats ◽

Distribution Distance

AbstractPaulownia catalpifolia is an important, fast-growing timber species known for its high density, color and texture. However, few transcriptomic and genetic studies have been conducted in P. catalpifolia. In this study, single-molecule real-time sequencing technology was applied to obtain the full-length transcriptome of P. catalpifolia leaves treated with varying degrees of drought stress. The sequencing data were then used to search for microsatellites, or simple sequence repeats (SSRs). A total of 28.83 Gb data were generated, 25,969 high-quality (HQ) transcripts with an average length of 1624 bp were acquired after removing the redundant reads, and 25,602 HQ transcripts (98.59%) were annotated using public databases. Among the HQ transcripts, 16,722 intact coding sequences, 149 long non-coding RNAs and 179 alternative splicing events were predicted, respectively. A total of 7367 SSR loci were distributed throughout 6293 HQ transcripts, of which 763 complex SSRs and 6604 complete SSRs. The SSR appearance frequency was 28.37%, and the average distribution distance was 5.59 kb. Among the 6604 complete SSR loci, 1–3 nucleotide repeats were dominant, occupying 97.85% of the total SSR loci, of which mono-, di- and tri-nucleotide repeats were 44.68%, 33.86% and 19.31%, respectively. We detected 112 repeat motifs, of which A/T (42.64%), AG/CT (12.22%), GA/TC (9.63%), GAA/TTC (1.57%) and CCA/TGG (1.54%) were most common in mono-, di- and tri-nucleotide repeats, respectively. The length of the repeat SSR motifs was 10–88 bp, and 4997 (75.67%) were ≤ 20 bp. This study provides a novel full-length transcriptome reference for P. catalpifolia and will facilitate the identification of germplasm resources and breeding of new drought-resistant P. catalpifolia varieties.

Download Full-text

Global Survey of the Full-Length Cabbage Transcriptome (Brassica oleracea Var. capitata L.) Reveals Key Alternative Splicing Events Involved in Growth and Disease Response

International Journal of Molecular Sciences ◽

10.3390/ijms221910443 ◽

2021 ◽

Vol 22 (19) ◽

pp. 10443

Author(s):

Yong Wang ◽

Jialei Ji ◽

Long Tong ◽

Zhiyuan Fang ◽

Limei Yang ◽

...

Keyword(s):

Alternative Splicing ◽

Brassica Oleracea ◽

Single Molecule ◽

Alternative Polyadenylation ◽

Full Length ◽

Genomic Research ◽

Accurate Information ◽

Sequencing Data ◽

Specific Expression ◽

Global Survey

Cabbage (Brassica oleracea L. var. capitata L.) is an important vegetable crop cultivated around the world. Previous studies of cabbage gene transcripts were primarily based on next-generation sequencing (NGS) technology which cannot provide accurate information concerning transcript assembly and structure analysis. To overcome these issues and analyze the whole cabbage transcriptome at the isoform level, PacBio RS II Single-Molecule Real-Time (SMRT) sequencing technology was used for a global survey of the full-length transcriptomes of five cabbage tissue types (root, stem, leaf, flower, and silique). A total of 77,048 isoforms, capturing 18,183 annotated genes, were discovered from the sequencing data generated through SMRT. The patterns of both alternative splicing (AS) and alternative polyadenylation (APA) were comprehensively analyzed. In total, we detected 13,468 genes which had isoforms containing APA sites and 8978 genes which underwent AS events. Moreover, 5272 long non-coding RNAs (lncRNAs) were discovered, and most exhibited tissue-specific expression. In total, 3147 transcription factors (TFs) were detected and 10 significant gene co-expression network modules were identified. In addition, we found that Fusarium wilt, black rot and clubroot infection significantly influenced AS in resistant cabbage. In summary, this study provides abundant cabbage isoform transcriptome data, which promotes reannotation of the cabbage genome, deepens our understanding of their post-transcriptional regulation mechanisms, and can be used for future functional genomic research.

Download Full-text

Genome sequencing of the nine-spined stickleback (Pungitius pungitius) provides insights into chromosome evolution

10.1101/741751 ◽

2019 ◽

Cited By ~ 1

Author(s):

Srinidhi Varadharajan ◽

Pasi Rastas ◽

Ari Löytynoja ◽

Michael Matschiner ◽

Federico C. F. Calboli ◽

...

Keyword(s):

Gasterosteus Aculeatus ◽

Tandem Repeats ◽

Copy Number Variations ◽

Repetitive Elements ◽

Genomic Research ◽

Sequencing Data ◽

Pungitius Pungitius ◽

High Quality ◽

Total Size ◽

Fish Family

AbstractThe Gasterostidae fish family hosts several species that are important models for eco-evolutionary, genetic and genomic research. In particular, a wealth of genetic and genomic data have been generated for the three-spined stickleback (Gasterosteus aculeatus), the ‘ecology’s supermodel’, while the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and ca. 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromeric-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years (MYA) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 MYA. Compared to the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.

Download Full-text

Rooibos (Aspalathus linearis) Genome Size Estimation Using Flow Cytometry and K-Mer Analyses

Plants ◽

10.3390/plants9020270 ◽

2020 ◽

Vol 9 (2) ◽

pp. 270 ◽

Cited By ~ 1

Author(s):

Yamkela Mgwatyu ◽

Allison Anne Stander ◽

Stephan Ferreira ◽

Wesley Williams ◽

Uljana Hesse

Keyword(s):

Flow Cytometry ◽

Genome Size ◽

Internal Standard ◽

Size Estimation ◽

Sequencing Data ◽

Aspalathus Linearis ◽

Paired End Sequencing ◽

Size Estimates ◽

Genome Projects ◽

Isolation Buffer

Plant genomes provide information on biosynthetic pathways involved in the production of industrially relevant compounds. Genome size estimates are essential for the initiation of genome projects. The genome size of rooibos (Aspalathus linearis species complex) was estimated using DAPI flow cytometry and k-mer analyses. For flow cytometry, a suitable nuclei isolation buffer, plant tissue and a transport medium for rooibos ecotype samples collected from distant locations were identified. When using radicles from commercial rooibos seedlings, Woody Plant Buffer and Vicia faba as an internal standard, the flow cytometry-estimated genome size of rooibos was 1.24 ± 0.01 Gbp. The estimates for eight wild rooibos growth types did not deviate significantly from this value. K-mer analysis was performed using Illumina paired-end sequencing data from one commercial rooibos genotype. For biocomputational estimation of the genome size, four k-mer analysis methods were investigated: A standard formula and three popular programs (BBNorm, GenomeScope, and FindGSE). GenomeScope estimates were strongly affected by parameter settings, specifically CovMax. When using the complete k-mer frequency histogram (up to 9 × 105), the programs did not deviate significantly, estimating an average rooibos genome size of 1.03 ± 0.04 Gbp. Differences between the flow cytometry and biocomputational estimates are discussed.

Download Full-text

Genome Size Diversity in Rare, Endangered, and Protected Orchids in Poland

Genes ◽

10.3390/genes12040563 ◽

2021 ◽

Vol 12 (4) ◽

pp. 563

Author(s):

Monika Rewers ◽

Iwona Jedrzejczyk ◽

Agnieszka Rewicz ◽

Anna Jakubska-Busse

Keyword(s):

Genome Size ◽

Species Identification ◽

Dna Content ◽

Size Estimation ◽

Orchid Species ◽

Infraspecific Taxon ◽

2C Dna Content ◽

Plant Families ◽

Liparis Loeselii ◽

Identification And Characterization

Orchidaceae is one of the largest and the most widespread plant families with many species threatened with extinction. However, only about 1.5% of orchids’ genome sizes have been known so far. The aim of this study was to estimate the genome size of 15 species and one infraspecific taxon of endangered and protected orchids growing wild in Poland to assess their variability and develop additional criterion useful in orchid species identification and characterization. Flow cytometric genome size estimation revealed that investigated orchid species possessed intermediate, large, and very large genomes. The smallest 2C DNA content possessed Liparis loeselii (14.15 pg), while the largest Cypripedium calceolus (82.10 pg). It was confirmed that the genome size is characteristic to the subfamily. Additionally, for four species Epipactis albensis, Ophrys insectifera, Orchis mascula, Orchis militaris and one infraspecific taxon, Epipactis purpurata f. chlorophylla the 2C DNA content has been estimated for the first time. Genome size estimation by flow cytometry proved to be a useful auxiliary method for quick orchid species identification and characterization.

Download Full-text

Common Treatment, Common Variant: Evolutionary Prediction of Functional Pharmacogenomic Variants

Journal of Personalized Medicine ◽

10.3390/jpm11020131 ◽

2021 ◽

Vol 11 (2) ◽

pp. 131

Author(s):

Laura B. Scheinfeldt ◽

Andrew Brangan ◽

Dara M. Kusic ◽

Sudhir Kumar ◽

Neda Gharani

Keyword(s):

In Silico ◽

Drug Efficacy ◽

Common Variant ◽

Genomic Research ◽

Whole Genome Sequencing Data ◽

Mendelian Disease ◽

Allele Frequency Distribution ◽

Sequencing Data ◽

Patient Race ◽

The Impact

Pharmacogenomics holds the promise of personalized drug efficacy optimization and drug toxicity minimization. Much of the research conducted to date, however, suffers from an ascertainment bias towards European participants. Here, we leverage publicly available, whole genome sequencing data collected from global populations, evolutionary characteristics, and annotated protein features to construct a new in silico machine learning pharmacogenetic identification method called XGB-PGX. When applied to pharmacogenetic data, XGB-PGX outperformed all existing prediction methods and identified over 2000 new pharmacogenetic variants. While there are modest pharmacogenetic allele frequency distribution differences across global population samples, the most striking distinction is between the relatively rare putatively neutral pharmacogene variants and the relatively common established and newly predicted functional pharamacogenetic variants. Our findings therefore support a focus on individual patient pharmacogenetic testing rather than on clinical presumptions about patient race, ethnicity, or ancestral geographic residence. We further encourage more attention be given to the impact of common variation on drug response and propose a new ‘common treatment, common variant’ perspective for pharmacogenetic prediction that is distinct from the types of variation that underlie complex and Mendelian disease. XGB-PGX has identified many new pharmacovariants that are present across all global communities; however, communities that have been underrepresented in genomic research are likely to benefit the most from XGB-PGX’s in silico predictions.

Download Full-text

First estimates of genome size in ribbon worms (phylum Nemertea) using flow cytometry and Feulgen image analysis densitometry

Canadian Journal of Zoology ◽

10.1139/cjz-2014-0068 ◽

2014 ◽

Vol 92 (10) ◽

pp. 847-851 ◽

Cited By ~ 3

Author(s):

Kelly L. Mulligan ◽

Terra C. Hiebert ◽

Nicholas W. Jeffery ◽

T. Ryan Gregory

Keyword(s):

Flow Cytometry ◽

Image Analysis ◽

Body Size ◽

Genome Size ◽

Positive Relationship ◽

Nuclear Dna ◽

Nuclear Dna Content ◽

Size Estimation ◽

Size Diversity ◽

Size Estimates

Ribbon worms (phylum Nemertea) are among several animal groups that have been overlooked in past studies of genome-size diversity. Here, we report genome-size estimates for eight species of nemerteans, including representatives of the major lineages in the phylum. Genome sizes in these species ranged more than fivefold, and there was some indication of a positive relationship with body size. Somatic endopolyploidy also appears to be common in these animals. Importantly, this study demonstrates that both of the most common methods of genome-size estimation (flow cytometry and Feulgen image analysis densitometry) can be used to assess genome size in ribbon worms, thereby facilitating additional efforts to investigate patterns of variability in nuclear DNA content in this phylum.

Download Full-text

GENOME SIZE ESTIMATION IN TWO POPULATIONS OF THE NORTHERN CHILEAN SCALLOP, ARGOPECTEN PURPURATUS, USING FLUORESCENCE IMAGE ANALYSIS

Journal of Shellfish Research ◽

10.2983/0730-8000(2005)24[55:gseitp]2.0.co;2 ◽

2005 ◽

Vol 24 (1) ◽

pp. 55-60 ◽

Cited By ~ 6

Keyword(s):

Image Analysis ◽

Genome Size ◽

Size Estimation ◽

Fluorescence Image ◽

Argopecten Purpuratus ◽

Two Populations

Download Full-text