scholarly journals Single molecule, full-length transcript sequencing provides insight into the TPS gene family in Paeonia ostii

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11808
Author(s):  
Jing Sun ◽  
Tian Chen ◽  
Jun Tao

Background The tree peony (Paeonia section Moutan DC), one of the traditional famous flowers with both ornamental and medicinal value, was widely used in China. Surprisingly little is known about the full-length transcriptome sequencing in tree peony, limiting the research on its gene function and molecular mechanism. The trehalose phosphate phosphatase (TPS) family genes has been found to affect plant growth and development and the function of TPS genes in Paeonia ostii is unknown. Methods In our study, we performed single molecule, full-length transcript sequencing in P. ostii. 10 TPS family members were identified from PacBio sequencing for bioinformatics analysis and transcriptional expression analysis. Results A total of 230,736 reads of insert (ROI) sequences and 114,215 full-Length non-chimeric reads (FLNC) were obtained for further ORFs and transcription factors prediction, SSR analysis and lncRNA identification. NR, Swissprot, GO, COG, KOG, Pfam and KEGG databases were used to obtain annotation information of transcripts. 10 TPS family members were identified with molecular weights between 48.0 to 108.5 kD and isoelectric point between 5.61 to 6.37. Furthermore, we found that TPS family members contain conserved TPP or TPS domain. Based on phylogenetic tree analysis, PoTPS1 protein was highly similar to AtTPS1 protein in Arabidopsis. Finally, we analyzed the expression levels of all TPS genes in P. ostii and found PoTPS5 expressed at the highest level. In conclusion, this study combined the results of the transcriptome to systematically analyze the 10 TPS family members, and sets a framework for further research of this important gene family in development of tree peony.

GigaScience ◽  
2018 ◽  
Vol 7 (3) ◽  
Author(s):  
Rachael E Workman ◽  
Alexander M Myrka ◽  
G William Wong ◽  
Elizabeth Tseng ◽  
Kenneth C Welch ◽  
...  

Forests ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 866
Author(s):  
Lei Kan ◽  
Qicong Liao ◽  
Zhiyao Su ◽  
Yushan Tan ◽  
Shuyu Wang ◽  
...  

Madhuca pasquieri (Dubard) Lam. is a tree on the International Union for Conservation of Nature Red List and a national key protected wild plant (II) of China, known for its seed oil and timber. However, lacking of genomic and transcriptome data for this species hampers study of its reproduction, utilization, and conservation. Here, single-molecule long-read sequencing (PacBio) and next-generation sequencing (Illumina) were combined to obtain the transcriptome from five developmental stages of M. pasquieri. Overall, 25,339 transcript isoforms were detected by PacBio, including 24,492 coding sequences (CDSs), 9440 simple sequence repeats (SSRs), 149 long non-coding RNAs (lncRNAs), and 182 alternative splicing (AS) events, a majority was retained intron (RI). A further 1058 transcripts were identified as transcriptional factors (TFs) from 51 TF families. PacBio recovered more full-length transcript isoforms with a longer length, and a higher expression level, whereas larger number of transcripts (124,405) was captured in de novo from Illumina. Using Nr, Swissprot, KOG, and KEGG databases, 24,405 transcripts (96.31%) were annotated by PacBio. Functional annotation revealed a role for the auxin, abscisic acid, gibberellin, and cytokinine metabolic pathways in seed germination and post-germination. These findings support further studies on seed germination mechanism and genome of M. pasquieri, and better protection of this endangered species.


2018 ◽  
Vol 2018 ◽  
pp. 1-6 ◽  
Author(s):  
Shang-Qian Xie ◽  
Yue Han ◽  
Xiao-Zhou Chen ◽  
Tai-Yu Cao ◽  
Kai-Kai Ji ◽  
...  

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Kai Su ◽  
Yinshan Guo ◽  
Yuhui Zhao ◽  
Hongyan Gao ◽  
Zhendong Liu ◽  
...  

Abstract Background White rot is one of the most dangerous fungal diseases and can considerably affect grape berry production and quality. However, few studies have focused on this disease, and thus, finding candidate white rot resistance genes is of great importance for breeding resistant grapevine cultivars. Based on field observations and indoor experiments, the cultivars “Victoria” and “Zhuosexiang” showed significant differences in white rot resistance. For understanding the molecular mechanisms behind it, different phenotypes of grapevine leaves were used for RNA sequencing via Illumina and single-molecule real-time (SMRT) sequencing technology. Results A transcript library containing 53,906 reads, including known and novel transcripts, was constructed following the full-length transcriptome sequencing of the two grapevine cultivars. Genes involved in salicylic acid (SA) and jasmonic acid (JA) synthesis pathways showed different expression levels. Furthermore, four key transcription factors (TFs), NPR1, TGA4, Pti6, and MYC2, all involved in the SA and JA signal pathways were identified, and the expression profile revealed the different regulation of the pathogenesis related protein1 (PR1) resistance gene, as mediated by the four TFs. Conclusions Full-length transcript sequencing can substantially improve the accuracy and integrity of gene prediction and gene function research in grapevine. Our results contribute to identify candidate resistance genes and improve our understanding of the genes and regulatory mechanisms involved in grapevine resistance to white rot.


2017 ◽  
Author(s):  
Julien Lagarde ◽  
Barbara Uszczynska-Ratajczak ◽  
Silvia Carbonell ◽  
SÍlvia Pérez-Lluch ◽  
Amaya Abad ◽  
...  

AbstractAccurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete: many gene models are fragmentary, while thousands more remain uncatalogued–particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third-generation long-read sequencing. We present an experimental re-annotation of the GENCODE intergenic lncRNA population in matched human and mouse tissues, resulting in novel transcript models for 3574 / 561 gene loci, respectively. CLS approximately doubles the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.Abbreviationsbpbase pairFLfull lengthntnucleotideROIread of insert, i.e. PacBio readSJsplice junctionSMRTsingle-molecule real-timeTMtranscript model


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9320
Author(s):  
Jing Chen ◽  
Yaya Yu ◽  
Kui Kang ◽  
Daowei Zhang

The white-backed planthopper Sogatella furcifera is an economically important rice pest distributed throughout Asia. It damages rice crops by sucking phloem sap, resulting in stunted growth and plant virus transmission. We aimed to obtain the full-length transcriptome data of S. furcifera using PacBio single-molecule real-time (SMRT) sequencing. Total RNA extracted from S. furcifera at various developmental stages (egg, larval, and adult stages) was mixed and used to generate a full-length transcriptome for SMRT sequencing. Long non-coding RNA (lncRNA) identification, full-length coding sequence prediction, full-length non-chimeric (FLNC) read detection, simple sequence repeat (SSR) analysis, transcription factor detection, and transcript functional annotation were performed. A total of 12,514,449 subreads (15.64 Gbp, clean reads) were generated, including 630,447 circular consensus sequences and 388,348 FLNC reads. Transcript cluster analysis of the FLNC reads revealed 251,109 consensus reads including 29,700 high-quality reads. Additionally, 100,360 SSRs and 121,395 coding sequences were identified using SSR analysis and ANGEL software, respectively. Furthermore, 44,324 lncRNAs were annotated using four tools and 1,288 transcription factors were identified. In total, 95,495 transcripts were functionally annotated based on searches of seven different databases. To the best of our knowledge, this is the first study of the full-length transcriptome of the white-backed planthopper obtained using SMRT sequencing. The acquired transcriptome data can facilitate further studies on the ecological and viral-host interactions of this agricultural pest.


2021 ◽  
Vol 12 ◽  
Author(s):  
Aiping Deng ◽  
Jinpeng Li ◽  
Zebin Yao ◽  
Gyamfua Afriyie ◽  
Ziyang Chen ◽  
...  

Coelomactra antiquata is an important aquatic economic shellfish with high medicinal value. However, because C. antiquata has no reference genome, a lot of molecular biology research cannot be carried out, so the analysis of its transcripts is an important step to study the regulatory genes of various substances in C. antiquata. In the present study, we conducted the first full-length transcriptome analysis of C. antiquata by using PacBio single-molecule real-time (SMRT) sequencing technology. The results identified a total of 39,209 unigenes with an average length of 2,732 bp, 23,338 CDSs, 251 AS events, 9,881 lncRNAs, 20,106 SSRs, and 2,316 TFs. Subsequently, 59.22% (23,220) of the unigenes were successfully annotated, of which 23,164, 18,711, 15,840, 13,534, and 13,474 unigenes could be annotated using NR, Swiss-prot, KOG, GO, and KEGG databases, respectively. This study lays the foundation for the follow-up research of molecular biology and provides a reference for studying the more medicinal value of C. antiquata.


Gene Families ◽  
2001 ◽  
pp. 5-19
Author(s):  
JIAN GU ◽  
XIN-YAN WU ◽  
MIN YE ◽  
QING-HUA ZHANG ◽  
ZE-GUANG HAN ◽  
...  

2020 ◽  
Author(s):  
Yao Cheng ◽  
Hanbing Liu ◽  
Xuejiao Tong ◽  
Zaimin Liu ◽  
Xin Zhang ◽  
...  

Abstract Background: Members of the cytochrome P450 (CYP450) gene superfamily have been shown to play essential roles in regulating secondary metabolites biosynthesis. However, the systematic identification and bioinformatics analysis of CYP450s have not been reported in Aralia elata (Miq.) Seem , a highly valued medicinal plant. Results: In the present study we conducted the RNA-sequencing (RNA-seq) analysis of the leaves, stems, and roots of A. elata, yielding 66,713 total unigenes. Following the annotation and classification of these unigenes, we were able to identify two pathways and 19 putative genes associated with the synthesis of triterpenoid saponins in these plants, with qRT-PCR subsequently being used to validate these gene expression patterns. Scanning with the CYP450 model from Pfam resulted in the identification of 111 full-length and 143 partial-length CYP450s, with the full-length CYP450s being further clustered into 7 clans and 36 families. Through phylogenetic and conserved motif analyses, we were further able to group these CYP450 proteins into two primary branches: A-type (53%) and non-A type (47%). We further conducted representative protein sequence alignment for these CYP450 family members, with secondary elements being assigned in light of the recently published Arabidopsis CYP90B1 structure. Using the available sequence information, we further identified predicted substrate recognition sites (SRSs) and substrate binding sites within these putative proteins.We further assessed the expression patterns of these 111 CYP450 genes across A. elata tissues, with 12 members of this gene family being selected at random for qRT-PCR validation. From these data, we identified CYP716A295 and CYP716A296 as the candidate genes most likely to be associated with oleanolic acid synthesis, while CYP72A763 was identified as being the most likely to play a role in hederagenin biosynthesis. Finally, we assessed the subcellular localization of these CYP450 proteins within Arabidopsis protoplasts, highlighting the fact that they localize to the endoplasmic reticulum. Conclusions: This study presents a systematic analysis of the CYP450 gene family in A. elata and provided a foundation for further functional characterization of CYP450 genes.


Insects ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 938
Author(s):  
Huili Ouyang ◽  
Xiaoyun Wang ◽  
Xia-Lin Zheng ◽  
Wen Lu ◽  
Fengping Qin ◽  
...  

Bactrocera dorsalis (Hendel), as one of the most notorious and destructive invasive agricultural pests in the world, causes damage to over 250 different types of fruits and vegetables throughout tropical and subtropical areas. PacBio single-molecule real-time (SMRT) sequencing was used to generate the full-length transcriptome data of B. dorsalis. A total of 40,319,890 subreads (76.6 Gb, clean reads) were generated, including 535,241 circular consensus sequences (CCSs) and 386,916 full-length non-concatemer reads (FLNCs). Transcript cluster analysis of the FLNC reads revealed 22,780 high-quality reads (HQs). In total, 12,274 transcripts were functionally annotated based on four different databases. A total of 1978 SSR loci were distributed throughout 1714 HQ transcripts, of which 1926 were complete SSRs and 52 were complex SSRs. Among the total SSR loci, 2–3 nucleotide repeats were dominant, occupying 83.62%, of which di- and tri- nucleotide repeats were 39.38% and 44.24%, respectively. We detected 105 repeat motifs, of which AT/AT (50.19%), AC/GT (39.15%), CAA/TTG (32.46%), and ACA/TGT (10.86%) were the most common in di- and tri-nucleotide repeats. The repeat SSR motifs were 12–190 bp in length, and 1638 (88.02%) were shorter than 20 bp. According to the randomly selected microsatellite sequence, 80 pairs of primers were designed, and 174 individuals were randomly amplified by PCR using primers. The number of primers that had amplification products with clear bands and showed good polymorphism came to 41, indicating that this was a feasible way to explore SSR markers from the transcriptomic data of B. dorsalis. These results lay a foundation for developing highly polymorphic microsatellites for researching the functional genomics, population genetic structure, and genetic diversity of B. dorsalis.


Sign in / Sign up

Export Citation Format

Share Document