scholarly journals Genome Size Estimation and Full-Length Transcriptome of Sphingonotus tsinlingensis: Genetic Background of a Drought-Adapted Grasshopper

2021 ◽  
Vol 12 ◽  
Author(s):  
Lu Zhao ◽  
Hang Wang ◽  
Ping Li ◽  
Kuo Sun ◽  
De-Long Guan ◽  
...  

Sphingonotus Fieber, 1852 (Orthoptera: Acrididae), is a grasshopper genus comprising approximately 170 species, all of which prefer dry environments such as deserts, steppes, and stony benchlands. In this study, we aimed to examine the adaptation of grasshopper species to arid environments. The genome size of Sphingonotus tsinlingensis was estimated using flow cytometry, and the first high-quality full-length transcriptome of this species was produced. The genome size of S. tsinlingensis is approximately 12.8 Gb. Based on 146.98 Gb of PacBio sequencing data, 221.47 Mb full-length transcripts were assembled. Among these, 88,693 non-redundant isoforms were identified with an N50 value of 2,726 bp, which was markedly longer than previous grasshopper transcriptome assemblies. In total, 48,502 protein-coding sequences were identified, and 37,569 were annotated using public gene function databases. Moreover, 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were identified. According to gene functions, 61 cytochrome P450 (CYP450) and 66 heat shock protein (HSP) genes, which may be associated with drought adaptation of S. tsinlingensis, were identified. We compared the transcriptomes of S. tsinlingensis and two other grasshopper species which were less tolerant to drought, namely Mongolotettix japonicus and Gomphocerus licenti. We observed the expression of CYP450 and HSP genes in S. tsinlingensis were higher. We produced the first full-length transcriptome of a Sphingonotus species that has an ultra-large genome. The assembly characteristics were better than those of all known grasshopper transcriptomes. This full-length transcriptome may thus be used to understand the genetic background and evolution of grasshoppers.

2021 ◽  
Author(s):  
Lu Zhao ◽  
Hang Wang ◽  
Le Wu ◽  
Kuo Sun ◽  
De-Long Guan ◽  
...  

AbstractSphingonotus Fieber, 1852 (Orthoptera: Acrididae) is a species-rich grasshopper genus with ~146 species. All species of this genus prefer dry environments, such as: desert, steppe, sand, and stony benchland. This genomic study aimed to understand the evolution and ecology of these grasshopper species. Here, the genome size of Sphingonotus tsinlingensis was estimated using flow cytometry and the first high-quality full-length transcriptome of this species is presented, which may serve as a reference genetic resource for the drought-adapted grasshopper species of Sphingonotus Fieber. The genome size of Sphingonotus tsinlingensis was ~12.8 Gb. Based on the 146.98 Gb Pacbio isoform sequencing data, 221.47 Mb full-length transcripts were assembled. Among these transcripts, 88,693 non-redundant isoforms were identified with an average length of 2,497 bp and an N50 value of 2,726 bp, which was much longer than the formal grasshopper transcriptome assemblies. A total of 48,502 protein coding sequences were determined, and 37,569 were annotated in public gene function databases. A total of 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were also identified. According to gene functions, 70 heat shock proteins and 61 P450 genes that may correspond to drought adaptation of S. tsinlingensis were identified. The genome of Sphingonotus tsinlingensis is an ultra-large and complex genome. Full-length transcriptome sequencing is an ideal strategy for genomic research. This is the first full-length transcriptome of the genus Sphingonotus. The assembly parameters were better than all known grasshopper transcriptomes. This full-length transcriptome may be used to understand its genetic background and the evolution and ecology of grasshoppers.


2017 ◽  
Author(s):  
Anuj Kumar ◽  
Aditi Chauhan ◽  
Mansi Sharma ◽  
Sai Kumar Kompelli ◽  
Vijay Gahlaut ◽  
...  

AbstractSimple Sequence Repeats (SSRs), also known as microsatellites are short tandem repeats of DNA sequences that are 1-6 bp long. In plants, SSRs serve as a source of important class of molecular markers because of their hypervariabile and co-dominant nature, making them useful both for the genetic studies and marker-assisted breeding. The SSRs are widespread throughout the genome of an organism, so that a large number of SSR datasets are available, most of them from either protein-coding regions or untranslated regions. It is only recently, that their occurrence within microRNAs (miRNA) genes has received attention. As is widely known, miRNA themselves are a class of non-coding RNAs (ncRNAs) with varying length of 19-22 nucleotides (nts), which play an important role in regulating gene expression in plants under different biotic and abiotic stresses. In this communication, we describe the results of a study, where miRNA-SSRs in full length pre-miRNA sequences of Arabidopsis thaliana were mined. The sequences were retrieved by annotations available at EnsemblPlants using BatchPrimer3 server with miRNA-SSR flanking primers found to be well distributed. Our analysis shows that miRNA-SSRs are relatively rare in protein-coding regions but abundant in non-coding region. All the observed 147 di-, tri-, tetra-, penta- and hexanucleotide SSRs were located in non-coding regions of all the 5 chromosomes of A. thaliana. While we confirm that miRNA-SSRs were commonly spread across the full length pre-miRNAs, we envisage that such studies would allow us to identify newly discovered markers for breeding studies.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Yingnan Chen ◽  
Nan Hu ◽  
Huaitong Wu

Salix wilsonii is an important ornamental willow tree widely distributed in China. In this study, an integrated circular chloroplast genome was reconstructed for S. wilsonii based on the chloroplast reads screened from the whole-genome sequencing data generated with the PacBio RSII platform. The obtained pseudomolecule was 155,750 bp long and had a typical quadripartite structure, comprising a large single copy region (LSC, 84,638 bp) and a small single copy region (SSC, 16,282 bp) separated by two inverted repeat regions (IR, 27,415 bp). The S. wilsonii chloroplast genome encoded 115 unique genes, including four rRNA genes, 30 tRNA genes, 78 protein-coding genes, and three pseudogenes. Repetitive sequence analysis identified 32 tandem repeats, 22 forward repeats, two reverse repeats, and five palindromic repeats. Additionally, a total of 118 perfect microsatellites were detected, with mononucleotide repeats being the most common (89.83%). By comparing the S. wilsonii chloroplast genome with those of other rosid plant species, significant contractions or expansions were identified at the IR-LSC/SSC borders. Phylogenetic analysis of 17 willow species confirmed that S. wilsonii was most closely related to S. chaenomeloides and revealed the monophyly of the genus Salix. The complete S. wilsonii chloroplast genome provides an additional sequence-based resource for studying the evolution of organelle genomes in woody plants.


Plants ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 270 ◽  
Author(s):  
Yamkela Mgwatyu ◽  
Allison Anne Stander ◽  
Stephan Ferreira ◽  
Wesley Williams ◽  
Uljana Hesse

Plant genomes provide information on biosynthetic pathways involved in the production of industrially relevant compounds. Genome size estimates are essential for the initiation of genome projects. The genome size of rooibos (Aspalathus linearis species complex) was estimated using DAPI flow cytometry and k-mer analyses. For flow cytometry, a suitable nuclei isolation buffer, plant tissue and a transport medium for rooibos ecotype samples collected from distant locations were identified. When using radicles from commercial rooibos seedlings, Woody Plant Buffer and Vicia faba as an internal standard, the flow cytometry-estimated genome size of rooibos was 1.24 ± 0.01 Gbp. The estimates for eight wild rooibos growth types did not deviate significantly from this value. K-mer analysis was performed using Illumina paired-end sequencing data from one commercial rooibos genotype. For biocomputational estimation of the genome size, four k-mer analysis methods were investigated: A standard formula and three popular programs (BBNorm, GenomeScope, and FindGSE). GenomeScope estimates were strongly affected by parameter settings, specifically CovMax. When using the complete k-mer frequency histogram (up to 9 × 105), the programs did not deviate significantly, estimating an average rooibos genome size of 1.03 ± 0.04 Gbp. Differences between the flow cytometry and biocomputational estimates are discussed.


2021 ◽  
Author(s):  
Ronen E Mukamel ◽  
Robert E Handsaker ◽  
Maxwell A Sherman ◽  
Alison R Barton ◽  
Yiming Zheng ◽  
...  

Hundreds of the proteins encoded in human genomes contain domains that vary in size or copy number due to variable numbers of tandem repeats (VNTRs) in protein-coding exons. VNTRs have eluded analysis by the molecular methods-SNP arrays and high-throughput sequencing-used in large-scale human genetic studies to date; thus, the relationships of VNTRs to most human phenotypes are unknown. We developed ways to estimate VNTR lengths from whole-exome sequencing data, identify the SNP haplotypes on which VNTR alleles reside, and use imputation to project these haplotypes into abundant SNP data. We analyzed 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 791 phenotypes. Analysis revealed some of the strongest associations of common variants with human phenotypes including height, hair morphology, and biomarkers of human health; for example, a VNTR encoding 13-44 copies of a 19-amino-acid repeat in the chondroitin sulfate domain of aggrecan (ACAN) associated with height variation of 3.4 centimeters (s.e. 0.3 cm). Incorporating large-effect VNTRs into analysis also made it possible to map many additional effects at the same loci: for the blood biomarker lipoprotein(a), for example, analysis of the kringle IV-2 VNTR within the LPA gene revealed that 18 coding SNPs and the VNTR in LPA explained 90% of lipoprotein(a) heritability in Europeans, enabling insights about population differences and epidemiological significance of this clinical biomarker. These results point to strong, cryptic effects of highly polymorphic common structural variants that have largely eluded molecular analyses to date.


Oncogene ◽  
2021 ◽  
Author(s):  
Yiyun Chen ◽  
Wing Yin Cheng ◽  
Hongyu Shi ◽  
Shengshuo Huang ◽  
Huarong Chen ◽  
...  

AbstractMolecular-based classifications of gastric cancer (GC) were recently proposed, but few of them robustly predict clinical outcomes. While mutation and expression signature of protein-coding genes were used in previous molecular subtyping methods, the noncoding genome in GC remains largely unexplored. Here, we developed the fast long-noncoding RNA analysis (FLORA) method to study RNA sequencing data of GC cases, and prioritized tumor-specific long-noncoding RNAs (lncRNAs) by integrating clinical and multi-omic data. We uncovered 1235 tumor-specific lncRNAs, based on which three subtypes were identified. The lncRNA-based subtype 3 (L3) represented a subgroup of intestinal GC with worse survival, characterized by prevalent TP53 mutations, chromatin instability, hypomethylation, and over-expression of oncogenic lncRNAs. In contrast, the lncRNA-based subtype 1 (L1) has the best survival outcome, while LINC01614 expression further segregated a subgroup of L1 cases with worse survival and increased chance of developing distal metastasis. We demonstrated that LINC01614 over-expression is an independent prognostic factor in L1 and network-based functional prediction implicated its relevance to cell migration. Over-expression and CRISPR-Cas9-guided knockout experiments further validated the functions of LINC01614 in promoting GC cell growth and migration. Altogether, we proposed a lncRNA-based molecular subtype of GC that robustly predicts patient survival and validated LINC01614 as an oncogenic lncRNA that promotes GC proliferation and migration.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zhongbo Chen ◽  
◽  
David Zhang ◽  
Regina H. Reynolds ◽  
Emil K. Gustavsson ◽  
...  

AbstractKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ammar Zaghlool ◽  
Adnan Niazi ◽  
Åsa K. Björklund ◽  
Jakub Orzechowski Westholm ◽  
Adam Ameur ◽  
...  

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we separated cytosolic and nuclear RNA from human fetal and adult brain samples and performed a comprehensive analysis of cytosolic and nuclear transcriptomes. There are significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. We show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Differential expression analysis between fetal and adult frontal cortex show that results obtained from the cytosolic RNA differ from results using nuclear RNA both at the level of transcript types and the number of differentially expressed genes. Our data provide a resource for the subcellular localization of thousands of RNA transcripts in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for expression analysis.


Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 563
Author(s):  
Monika Rewers ◽  
Iwona Jedrzejczyk ◽  
Agnieszka Rewicz ◽  
Anna Jakubska-Busse

Orchidaceae is one of the largest and the most widespread plant families with many species threatened with extinction. However, only about 1.5% of orchids’ genome sizes have been known so far. The aim of this study was to estimate the genome size of 15 species and one infraspecific taxon of endangered and protected orchids growing wild in Poland to assess their variability and develop additional criterion useful in orchid species identification and characterization. Flow cytometric genome size estimation revealed that investigated orchid species possessed intermediate, large, and very large genomes. The smallest 2C DNA content possessed Liparis loeselii (14.15 pg), while the largest Cypripedium calceolus (82.10 pg). It was confirmed that the genome size is characteristic to the subfamily. Additionally, for four species Epipactis albensis, Ophrys insectifera, Orchis mascula, Orchis militaris and one infraspecific taxon, Epipactis purpurata f. chlorophylla the 2C DNA content has been estimated for the first time. Genome size estimation by flow cytometry proved to be a useful auxiliary method for quick orchid species identification and characterization.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tsung-Yu Lu ◽  
Katherine M. Munson ◽  
Alexandra P. Lewis ◽  
Qihui Zhu ◽  
Luke J. Tallon ◽  
...  

AbstractVariable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.


Sign in / Sign up

Export Citation Format

Share Document