scholarly journals Full-length transcriptome assembly of Andrias davidianus (Amphibia: Caudata) skin via hybrid sequencing

2021 ◽  
Author(s):  
Yu Bai ◽  
Yonglu Meng ◽  
Jianlin Luo ◽  
Hui Wang ◽  
Guoyong Li ◽  
...  

The Chinese giant salamander, Andrias davidianus, is the largest amphibian species in the world; it is thus an economically and ecologically important species. The skin of A. davidianus exhibits complex adaptive structural and functional adaptations to facilitate survival in aquatic and terrestrial ecosystems. Here, we report the first full-length amphibian transcriptome from the dorsal skin of A. davidianus, which was assembled using hybrid sequencing and the PacBio and Illumina platforms. A total of 153,038 transcripts were hybrid assembled (mean length of 2,039 bp and N50 of 2,172 bp), and 133,794 were annotated in at least one database (nr, Swiss-Prot, KEGG, KOGs, GO, and nt). A total of 58,732, 68,742, and 115,876 transcripts were classified into 24 KOG categories, 1,903 GO term categories, and 46 KEGG pathways (level 2), respectively. A total of 207,627 protein-coding regions, 785 transcription factors, 27,237 potential long non-coding RNAs, and 8,299 simple sequence repeats were also identified. The hybrid-assembled transcriptome recovered more full-length transcripts, had a higher N50 contig length, and a higher annotation rate of unique genes compared with that assembled in previous studies using next-generation sequencing. The high-quality full-length reference gene set generated in this study will help elucidate the genetic characteristics of A. davidianus skin and aid the identification of functional skin proteins.

2019 ◽  
Vol 11 (8) ◽  
pp. 2232-2243 ◽  
Author(s):  
Tsai-Ming Lu ◽  
Miyuki Kanda ◽  
Hidetaka Furuya ◽  
Noriyuki Satoh

Abstract Dicyemids, previously called “mesozoans” (intermediates between unicellular protozoans and multicellular metazoans), are an enigmatic animal group. They have a highly simplified adult body, comprising only ∼30 cells, and they have a unique parasitic lifestyle. Recently, dicyemids were shown to be spiralians, with affinities to the Platyhelminthes. In order to understand molecular mechanisms involved in evolution of this odd animal, we sequenced the genome of Dicyema japonicum and a reference transcriptome assembly using mixed-stage samples. The D. japonicum genome features a high proportion of repetitive sequences that account for 49% of the genome. The dicyemid genome is reduced to ∼67.5 Mb with 5,012 protein-coding genes. Only four Hox genes exist in the genome, with no clustering. Gene distribution in KEGG pathways shows that D. japonicum has fewer genes in most pathways. Instead of eliminating entire critical metabolic pathways, parasitic lineages likely simplify pathways by eliminating pathway-specific genes, while genes with fundamental functions may be retained in multiple pathways. In principle, parasites can stand to lose genes that are unnecessary, in order to conserve energy. However, whether retained genes in incomplete pathways serve intermediate functions and how parasites overcome the physiological needs served by lost genes, remain to be investigated in future studies.


2018 ◽  
Author(s):  
Federico Vita ◽  
Amedeo Alpi ◽  
Edoardo Bertolini

AbstractThe Italian white truffle (Tuber magnatum Pico) is a gastronomic delicacy that dominates the worldwide truffle market. Despite its importance, the genomic resources currently available for this species are still limited. Here we present the first de novo transcriptome assembly of T. magnatum. Illumina RNA-seq data were assembled using a single-k-mer approach into 22,932 transcripts with N50 of 1,524 bp. Our approach allowed to predict and annotate 12,367 putative protein coding sequences, reunited in 6,723 loci. In addition, we identified 2,581 gene-based SSR markers. This work provides the first publicly available reference transcriptome for genomics and genetic studies providing insight into the molecular mechanisms underlying the biology of this important species.


2019 ◽  
Author(s):  
Xiujuan Zhang ◽  
Jiabin Zhou ◽  
Linmiao Li ◽  
Wenzhong Huang ◽  
Hafiz Ishfaq Ahmad ◽  
...  

Abstract Background: Sturgeons (Acipenseriformes) are polyploid chondrostean fish that constitute an important model species for studying development and evolution in vertebrates. To better understand the mechanisms of reproduction regulation in sturgeon, this study combined PacBio isoform sequencing (Iso-Seq) with Illumina short-read RNA-seq methods to discover full-length genes involved in early gametogenesis of the Amur sturgeon, Acipenser schrenckii. Results: A total of 50.04 G subread bases were generated from two SMRT cells, and herein 164,618 nonredundant full-length transcripts (unigenes) were produced with an average length of 2,782 bp from gonad tissues (three testes and four ovaries) from seven 3-year-old A. schrenckii individuals. The number of ovary-specific expressed unigenes was greater than those of testis (19,716 vs. 3,028), and completely different KEGG pathways were significantly enriched between the ovary-biased and testis-biased DEUs. Importantly, 60 early gametogenesis-related genes (involving 755 unigenes) were successfully identified, and exactly 50 percent (30/60) genes of those showed significantly differential expression in testes and ovaries. Among these, the Amh and Gsdf with testis-biased expression, and the Foxl2 and Cyp19a with ovary-biased expression strongly suggested the important regulatory roles in spermatogenesis and oogenesis of A. schrenckii, respectively. We also found the four novel Sox9 transcript variants, which increase the numbers of regulatory genes and imply function complexity in early gametogenesis. Finally, a total of 236,672 AS events (involving 36,522 unigenes) were detected, and 10,556 putative long noncoding RNAs (lncRNAs) and 4,339 predicted transcript factors (TFs) were also respectively identified, which were all significantly associated with the early gametogenesis of A. schrenckii. Conclusions: Overall, our results provide new genetic resources of full-length transcription data and information as a genomic-level reference for sturgeon. Crucially, we explored the comprehensive genetic characteristics that differ between the testes and ovaries of A. schrenckii in the early gametogenesis stage, which could provide candidate genes and theoretical basis for further the mechanisms of reproduction regulation of sturgeon.


2019 ◽  
Author(s):  
Xiujuan Zhang ◽  
Jiabin Zhou ◽  
Linmiao Li ◽  
Wenzhong Huang ◽  
Hafiz Ishfaq Ahmad ◽  
...  

Abstract Background Sturgeons (Acipenseriformes) are polyploid chondrostean fish that constitute an important model species for studying development and evolution in vertebrates. To better understand the mechanisms of reproduction regulation in sturgeon, this study combined PacBio isoform sequencing (Iso-Seq) with Illumina short-read RNA-seq methods to discover full-length genes involved in early gametogenesis of the Amur sturgeon, Acipenser schrenckii .Results A total of 50.04 G subread bases were generated from two SMRT cells, and herein 164,618 nonredundant full-length transcripts (unigenes) were produced with an average length of 2,782 bp from gonad tissues (three testes and four ovaries) from seven 3-year-old A. schrenckii individuals. The number of ovary-specific expressed unigenes was greater than those of testis (19,716 vs. 3,028), and functional assignment indicated that 6 of 14 annotated KEGG pathways were directly ovary-related and had abundant transcripts and differential expression genes. Importantly, 60 early gametogenesis-related genes (involving 755 unigenes) were successfully identified, and exactly 50 percent (30/60) of those showed differential expression in testes and ovaries. The Amh and Gsdf with testis-biased expression, and Foxl2 and Cyp19a with ovary-biased expression strongly suggested the important regulatory roles in spermatogenesis and oogenesis of A. schrenckii , respectively. We also found the four novel Sox9 transcript variants, which increase the numbers of regulatory genes and imply function complexity of early gametogenesis. Finally, a total of 236,672 AS events (involving 36,522 unigenes) were detected, and 10,556 putative long noncoding RNAs (lncRNAs) and 4,339 predicted transcript factors (TFs) were also respectively identified, which all significantly associated with the early gametogenesis of A. schrenckii .Conclusions Overall, our results provide new genetic resources of full-length transcription data and information as a genomic-level reference for sturgeon. Crucially, we explored the comprehensive genetic characteristics that differ between the testes and ovaries of A. schrenckii in the early gametogenesis stage. These provide candidate genes and theoretical basis for further the mechanisms of reproduction regulation of sturgeon.


2020 ◽  
Author(s):  
Xiujuan Zhang ◽  
Jiabin Zhou ◽  
Linmiao Li ◽  
Wenzhong Huang ◽  
Hafiz Ishfaq Ahmad ◽  
...  

Abstract Background: Sturgeons (Acipenseriformes) are polyploid chondrostean fish that constitute an important model species for studying development and evolution in vertebrates. To better understand the mechanisms of reproduction regulation in sturgeon, this study combined PacBio isoform sequencing (Iso-Seq) with Illumina short-read RNA-seq methods to discover full-length genes involved in early gametogenesis of the Amur sturgeon, Acipenser schrenckii . Results: A total of 50.04 G subread bases were generated from two SMRT cells, and herein 164,618 nonredundant full-length transcripts (unigenes) were produced with an average length of 2,782 bp from gonad tissues (three testes and four ovaries) from seven 3-year-old A. schrenckii individuals. The number of ovary-specific expressed unigenes was greater than those of testis (19,716 vs. 3,028), and completely different KEGG pathways were significantly enriched between the ovary-biased and testis-biased DEUs. Importantly, 60 early gametogenesis-related genes (involving 755 unigenes) were successfully identified, and exactly 50 percent (30/60) genes of those showed significantly differential expression in testes and ovaries. Among these, the Amh and Gsdf with testis-biased expression, and the Foxl2 and Cyp19a with ovary-biased expression strongly suggested the important regulatory roles in spermatogenesis and oogenesis of A. schrenckii , respectively. We also found the four novel Sox9 transcript variants, which increase the numbers of regulatory genes and imply function complexity in early gametogenesis. Finally, a total of 236,672 AS events (involving 36,522 unigenes) were detected, and 10,556 putative long noncoding RNAs (lncRNAs) and 4,339 predicted transcript factors (TFs) were also respectively identified, which were all significantly associated with the early gametogenesis of A. schrenckii . Conclusions: Overall, our results provide new genetic resources of full-length transcription data and information as a genomic-level reference for sturgeon. Crucially, we explored the comprehensive genetic characteristics that differ between the testes and ovaries of A. schrenckii in the early gametogenesis stage, which could provide candidate genes and theoretical basis for further the mechanisms of reproduction regulation of sturgeon.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lu Zhao ◽  
Hang Wang ◽  
Ping Li ◽  
Kuo Sun ◽  
De-Long Guan ◽  
...  

Sphingonotus Fieber, 1852 (Orthoptera: Acrididae), is a grasshopper genus comprising approximately 170 species, all of which prefer dry environments such as deserts, steppes, and stony benchlands. In this study, we aimed to examine the adaptation of grasshopper species to arid environments. The genome size of Sphingonotus tsinlingensis was estimated using flow cytometry, and the first high-quality full-length transcriptome of this species was produced. The genome size of S. tsinlingensis is approximately 12.8 Gb. Based on 146.98 Gb of PacBio sequencing data, 221.47 Mb full-length transcripts were assembled. Among these, 88,693 non-redundant isoforms were identified with an N50 value of 2,726 bp, which was markedly longer than previous grasshopper transcriptome assemblies. In total, 48,502 protein-coding sequences were identified, and 37,569 were annotated using public gene function databases. Moreover, 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were identified. According to gene functions, 61 cytochrome P450 (CYP450) and 66 heat shock protein (HSP) genes, which may be associated with drought adaptation of S. tsinlingensis, were identified. We compared the transcriptomes of S. tsinlingensis and two other grasshopper species which were less tolerant to drought, namely Mongolotettix japonicus and Gomphocerus licenti. We observed the expression of CYP450 and HSP genes in S. tsinlingensis were higher. We produced the first full-length transcriptome of a Sphingonotus species that has an ultra-large genome. The assembly characteristics were better than those of all known grasshopper transcriptomes. This full-length transcriptome may thus be used to understand the genetic background and evolution of grasshoppers.


2019 ◽  
Author(s):  
Thomas F. Martinez ◽  
Qian Chu ◽  
Cynthia Donaldson ◽  
Dan Tan ◽  
Maxim N. Shokhirev ◽  
...  

Protein-coding small open reading frames (smORFs) are emerging as an important class of genes, however, the coding capacity of smORFs in the human genome is unclear. By integrating de novo transcriptome assembly and Ribo-Seq, we confidently annotate thousands of novel translated smORFs in three human cell lines. We find that smORF translation prediction is noisier than for annotated coding sequences, underscoring the importance of analyzing multiple experiments and footprinting conditions. These smORFs are located within non-coding and antisense transcripts, the UTRs of mRNAs, and unannotated transcripts. Analysis of RNA levels and translation efficiency during cellular stress identifies regulated smORFs, providing an approach to select smORFs for further investigation. Sequence conservation and signatures of positive selection indicate that encoded microproteins are likely functional. Additionally, proteomics data from enriched human leukocyte antigen complexes validates the translation of hundreds of smORFs and positions them as a source of novel antigens. Thus, smORFs represent a significant number of important, yet unexplored human genes.


Plants ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1354
Author(s):  
Slimane Khayi ◽  
Fatima Gaboun ◽  
Stacy Pirro ◽  
Tatiana Tatusova ◽  
Abdelhamid El Mousadik ◽  
...  

Argania spinosa (Sapotaceae), an important endemic Moroccan oil tree, is a primary source of argan oil, which has numerous dietary and medicinal proprieties. The plant species occupies the mid-western part of Morocco and provides great environmental and socioeconomic benefits. The complete chloroplast (cp) genome of A. spinosa was sequenced, assembled, and analyzed in comparison with those of two Sapotaceae members. The A. spinosa cp genome is 158,848 bp long, with an average GC content of 36.8%. The cp genome exhibits a typical quadripartite and circular structure consisting of a pair of inverted regions (IR) of 25,945 bp in length separating small single-copy (SSC) and large single-copy (LSC) regions of 18,591 and 88,367 bp, respectively. The annotation of A. spinosa cp genome predicted 130 genes, including 85 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, and 37 transfer RNA (tRNA) genes. A total of 44 long repeats and 88 simple sequence repeats (SSR) divided into mononucleotides (76), dinucleotides (7), trinucleotides (3), tetranucleotides (1), and hexanucleotides (1) were identified in the A. spinosa cp genome. Phylogenetic analyses using the maximum likelihood (ML) method were performed based on 69 protein-coding genes from 11 species of Ericales. The results confirmed the close position of A. spinosa to the Sideroxylon genus, supporting the revisiting of its taxonomic status. The complete chloroplast genome sequence will be valuable for further studies on the conservation and breeding of this medicinally and culinary important species and also contribute to clarifying the phylogenetic position of the species within Sapotaceae.


2020 ◽  
Author(s):  
Shinichi Namba ◽  
Toshihide Ueno ◽  
Shinya Kojima ◽  
Yosuke Tanaka ◽  
Satoshi Inoue ◽  
...  

AbstractAlthough transcriptome alteration is considered as one of the essential drivers of carcinogenesis, conventional short-read RNAseq technology has limited researchers from directly exploring full-length transcripts, only focusing on individual splice sites. We developed a pipeline for Multi-Sample long-read Transcriptome Assembly, MuSTA, and showed through simulations that it enables construction of transcriptome from the transcripts expressed in target samples and more accurate evaluation of transcript usage. We applied it to 22 breast cancer clinical specimens to successfully acquire cohort-wide full-length transcriptome from long-read RNAseq data. By comparing isoform existence and expression between estrogen receptor positive and triple-negative subtypes, we obtained a comprehensive set of subtype-specific isoforms and differentially used isoforms which consisted of both known and unannotated isoforms. We have also found that exon-intron structure of fusion transcripts tends to depend on their genomic regions, and have found three-piece fusion transcripts that were transcribed from complex structural rearrangements. For example, a three-piece fusion transcript resulted in aberrant expression of an endogenous retroviral gene, ERVFRD-1, which is normally expressed exclusively in placenta and supposed to protect fetus from maternal rejection, and expression of which were increased in several TCGA samples with ERVFRD-1 fusions. Our analyses of real clinical specimens and simulated data provide direct evidence that full-length transcript sequencing in multiple samples can add to our understanding of cancer biology and genomics in general.


Sign in / Sign up

Export Citation Format

Share Document