scholarly journals Single-Molecule Real-Time Transcript Sequencing of Turnips Unveiling the Complexity of the Turnip Transcriptome

2020 ◽  
Vol 10 (10) ◽  
pp. 3505-3514
Author(s):  
Hongmei Zhuang ◽  
Qiang Wang ◽  
Hongwei Han ◽  
Huifang Liu ◽  
Hao Wang

To generate the full-length transcriptome of Xinjiang green and purple turnips, Brassica rapa var. Rapa, using single-molecule real-time (SMRT) sequencing. The samples of two varieties of Brassica rapa var. Rapa at five developmental stages were collected and combined to perform SMRT sequencing. Meanwhile, next generation sequencing was performed to correct SMRT sequencing data. A series of analyses were performed to investigate the transcript structure. Finally, the obtained transcripts were mapped to the genome of Brassica rapa ssp. pekinesis Chiifu to identify potential novel transcripts. For green turnip (F01), a total of 19.54 Gb clean data were obtained from 8 cells. The number of reads of insert (ROI) and full-length non-chimeric (FLNC) reads were 510,137 and 267,666. In addition, 82,640 consensus isoforms were obtained in the isoform sequences clustering, of which 69,480 were high-quality, and 13,160 low-quality sequences were corrected using Illumina RNA seq data. For purple turnip (F02), there were 20.41 Gb clean data, 552,829 ROIs, and 274,915 FLNC sequences. A total of 93,775 consensus isoforms were obtained, of which 78,798 were high-quality, and the 14,977 low-quality sequences were corrected. Following the removal of redundant sequences, there were 46,516 and 49,429 non-redundant transcripts for F01 and F02, respectively; 7,774 and 9,385 alternative splicing events were predicted for F01 and F02; 63,890 simple sequence repeats, 59,460 complete coding sequences, and 535 long-non coding RNAs were predicted. Moreover, 5,194 and 5,369 novel transcripts were identified by mapping to Brassica rapa ssp. pekinesis Chiifu. The obtained transcriptome data may improve turnip genome annotation and facilitate further study of the Brassica rapa var. Rapa genome and transcriptome.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Yang Gao ◽  
Zeyang Suding ◽  
Lele Wang ◽  
Dandan Liu ◽  
Shijie Su ◽  
...  

Abstract Background Eimeria necatrix is one of the most pathogenic parasites, causing high mortality in chickens. Although its genome sequence has been published, the sequences and complete structures of its mRNA transcripts remain unclear, limiting exploration of novel biomarkers, drug targets and genetic functions in E. necatrix. Methods Second-generation merozoites (MZ-2) of E. necatrix were collected using Percoll density gradients, and high-quality RNA was extracted from them. Single-molecule real-time (SMRT) sequencing and Illumina sequencing were combined to generate the transcripts of MZ-2. Combined with the SMRT sequencing data of sporozoites (SZ) collected in our previous study, the transcriptome and transcript structures of E. necatrix were studied. Results SMRT sequencing yielded 21,923 consensus isoforms in MZ-2. A total of 17,151 novel isoforms of known genes and 3918 isoforms of novel genes were successfully identified. We also identified 2752 (SZ) and 3255 (MZ-2) alternative splicing (AS) events, 1705 (SZ) and 1874 (MZ-2) genes with alternative polyadenylation (APA) sites, 4019 (SZ) and 2588 (MZ-2) fusion transcripts, 159 (SZ) and 84 (MZ-2) putative transcription factors (TFs) and 3581 (SZ) and 2039 (MZ-2) long non-coding RNAs (lncRNAs). To validate fusion transcripts, reverse transcription-PCR was performed on 16 candidates, with an accuracy reaching up to 87.5%. Sanger sequencing of the PCR products further confirmed the authenticity of chimeric transcripts. Comparative analysis of transcript structures revealed a total of 3710 consensus isoforms, 815 AS events, 1139 genes with APA sites, 20 putative TFs and 352 lncRNAs in both SZ and MZ-2. Conclusions We obtained many long-read isoforms in E. necatrix SZ and MZ-2, from which a series of lncRNAs, AS events, APA events and fusion transcripts were identified. Information on TFs will improve understanding of transcriptional regulation, and fusion event data will greatly improve draft versions of gene models in E. necatrix. This information offers insights into the mechanisms governing the development of E. necatrix and will aid in the development of novel strategies for coccidiosis control. Graphical Abstract


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9320
Author(s):  
Jing Chen ◽  
Yaya Yu ◽  
Kui Kang ◽  
Daowei Zhang

The white-backed planthopper Sogatella furcifera is an economically important rice pest distributed throughout Asia. It damages rice crops by sucking phloem sap, resulting in stunted growth and plant virus transmission. We aimed to obtain the full-length transcriptome data of S. furcifera using PacBio single-molecule real-time (SMRT) sequencing. Total RNA extracted from S. furcifera at various developmental stages (egg, larval, and adult stages) was mixed and used to generate a full-length transcriptome for SMRT sequencing. Long non-coding RNA (lncRNA) identification, full-length coding sequence prediction, full-length non-chimeric (FLNC) read detection, simple sequence repeat (SSR) analysis, transcription factor detection, and transcript functional annotation were performed. A total of 12,514,449 subreads (15.64 Gbp, clean reads) were generated, including 630,447 circular consensus sequences and 388,348 FLNC reads. Transcript cluster analysis of the FLNC reads revealed 251,109 consensus reads including 29,700 high-quality reads. Additionally, 100,360 SSRs and 121,395 coding sequences were identified using SSR analysis and ANGEL software, respectively. Furthermore, 44,324 lncRNAs were annotated using four tools and 1,288 transcription factors were identified. In total, 95,495 transcripts were functionally annotated based on searches of seven different databases. To the best of our knowledge, this is the first study of the full-length transcriptome of the white-backed planthopper obtained using SMRT sequencing. The acquired transcriptome data can facilitate further studies on the ecological and viral-host interactions of this agricultural pest.


2020 ◽  
Author(s):  
Tao Wang ◽  
Feng Yang ◽  
Qiaosheng Guo ◽  
Qingjun Zou ◽  
Wenyan Zhang ◽  
...  

Abstract Background: The inflorescence of Chrysanthemum morifolium cv. ‘Hangju’ has been widely used in China due to its antioxidant and anti-inflammatory properties. The biosynthesis and regulation of flavonoids, a group of bioactive components, in C. morifolium are poorly understood. Transcriptome sequencing is an effective method for obtaining transcript information. Therefore, single-molecule real-time (SMRT) sequencing was performed to obtain the full-length genes involved in flavonoid biosynthesis and regulation in C. morifolium.Results: High-quality RNA was extracted from the inflorescence of C. morifolium at different developmental stages and used to construct two libraries (0-5 kb and 4.5-10 kb) for sequencing. Finally, 125,532 non-redundant isoforms with a mean length of 2,009 bp were obtained. Of these, 2,083 transcripts were annotated to pathways related to flavonoid biosynthesis, and 56 isoforms were annotated as CHS, CHI, F3H, F3’H, FNS Ⅱ, FLS, DFR and ANS genes. Based on gene expression levels at different stages, we predicted the major genes involved in flavonoid biosynthesis. By phylogenetic analysis, we found two candidate MYB transcription factors (CmMYBF1 and CmMYBF2) activating flavonol biosynthesis.Conclusions: Based on the full-length transcriptomic data and further quantitative analysis, the major genes involved in flavonoid biosynthesis and regulation in C. morifolium were predicted in our study. The results provide a valuable theoretical basis for the introduction and cultivation of C. morifolium cv. ‘Hangju’.


2020 ◽  
Vol 21 (9) ◽  
pp. 3288
Author(s):  
Yawei Wu ◽  
Juan Xu ◽  
Xiumei Han ◽  
Guang Qiao ◽  
Kun Yang ◽  
...  

To gain more valuable genomic information about betalain biosynthesis, the full-length transcriptome of pitaya pulp from ‘Zihonglong’ (red pulp) and ‘Jinghonglong’ (white pulp) in four fruit developmental stages was analyzed using Single-Molecule Real-Time (SMRT) sequencing corrected by Illumina RNA-sequence (Illumina RNA-Seq). A total of 65,317 and 91,638 genes were identified in ‘Zihonglong’ and ‘Jinghonglong’, respectively. A total of 11,377 and 15,551 genes with more than two isoforms were investigated from ‘Zihonglong’ and ‘Jinghonglong’, respectively. In total, 156,955 genes were acquired after elimination of redundancy, of which, 120,604 genes (79.63%) were annotated, and 30,875 (20.37%) sequences without hits to reference database were probably novel genes in pitaya. A total of 31,169 and 53,024 simple sequence repeats (SSRs) were uncovered from the genes of ‘Zihonglong’ and ‘Jinghonglong’, and 11,650 long non-coding RNAs (lncRNAs) in ‘Zihonglong’ and 11,113 lncRNAs in ‘Jinghonglong’ were obtained herein. qRT-PCR was conducted on ten candidate genes, the expression level of six novel genes were consistent with the Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values. In conclusion, we firstly undertook SMRT sequencing of the full-length transcriptome of pitaya, and the valuable resource that was acquired through this sequencing facilitated the identification of additional betalain-related genes. Notably, a list of novel putative genes related to the synthesis of betalain in pitaya fruits was assembled. This may provide new insights into betalain synthesis in pitaya.


2019 ◽  
Author(s):  
Yawei Wu ◽  
Juan Xu ◽  
Xiumei Han ◽  
Guang Qiao ◽  
Kun yang ◽  
...  

Abstract Background: In order to gain more valuable genomic information involved in betalain biosynthesis, the full-length transcriptome of pitaya was analyzed using Single-Molecule Real-Time (SMRT) sequencing corrected by RNA-seq in the present study. Two pitaya cultivars, ‘Zihonglong’ (red pulp) and ‘Jinghonglong’ (white pulp) were selected to analyze betalain transcriptome in four fruit developmental stages. Results: A total of 65,317 and 91,638 genes coding proteins were identified in ‘Zihonglong’ and ‘Jinghonglong’, respectively. A total of 11,377 and 15,551 genes with more than two isoforms were investigated from ‘Zihonglong’ and ‘Jinghonglong’, respectively. Also, 156,955 genes were acquired after elimination of redundancy , of which, 120,604 genes (79.63%) were annotated, and 30,875 (20.37%) sequences without hits to reference database were probably novel genes in pitaya. Totally, 31,169 and 53,024 SSRs were uncovered from the genes of ‘Zihonglong’ and ‘Jinghonglong’, and 11,650 lncRNAs in ‘Zihonglong’ and 11,113 lncRNAs in ‘Jinghonglong’ were obtained herein. Further, 104 genes involved in betalain metabolism were identified, and HpCYP76AD4 and HpDODA probably responded to betalains biosynthesis. Conclusions: Conclusively, this is the first study to perform SMRT sequencing of the full-length transcriptome of pitaya, which provides a useful genomic clue for exploring the structure and function of genes in pitaya, particularly for betalain biosynthesis.


Genes ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 1333
Author(s):  
Mariana R. Botton ◽  
Yao Yang ◽  
Erick R. Scott ◽  
Robert J. Desnick ◽  
Stuart A. Scott

The SLC6A4 gene has been implicated in psychiatric disorder susceptibility and antidepressant response variability. The SLC6A4 promoter is defined by a variable number of homologous 20–24 bp repeats (5-HTTLPR), and long (L) and short (S) alleles are associated with higher and lower expression, respectively. However, this insertion/deletion variant is most informative when considered as a haplotype with the rs25531 and rs25532 variants. Therefore, we developed a long-read single molecule real-time (SMRT) sequencing method to interrogate the SLC6A4 promoter region. A total of 120 samples were subjected to SLC6A4 long-read SMRT sequencing, primarily selected based on available short-read sequencing data. Short-read genome sequencing from the 1000 Genomes (1KG) Project (~5X) and the Genetic Testing Reference Material Coordination Program (~45X), as well as high-depth short-read capture-based sequencing (~330X), could not identify the 5-HTTLPR short (S) allele, nor could short-read sequencing phase any identified variants. In contrast, long-read SMRT sequencing unambiguously identified the 5-HTTLPR short (S) allele (frequency of 0.467) and phased SLC6A4 promoter haplotypes. Additionally, discordant rs25531 genotypes were reviewed and determined to be short-read errors. Taken together, long-read SMRT sequencing is an innovative and robust method for phased resolution of the SLC6A4 promoter, which could enable more accurate pharmacogenetic testing for both research and clinical applications.


2021 ◽  
Vol 22 (2) ◽  
pp. 787
Author(s):  
Ziqing He ◽  
Yingjuan Su ◽  
Ting Wang

Cephalotaxus oliveri is a tertiary relict conifer endemic to China, regarded as a national second-level protected plant in China. This species has experienced severe changes in temperature and precipitation in the past millions of years, adapting well to harsh environments. In view of global climate change and its endangered conditions, it is crucial to study how it responds to changes in temperature and precipitation for its conservation work. In this study, single-molecule real-time (SMRT) sequencing and Illumina RNA sequencing were combined to generate the complete transcriptome of C. oliveri. Using the RNA-seq data to correct the SMRT sequencing data, the four tissues obtained 63,831 (root), 58,108 (stem), 33,013 (leaf) and 62,436 (male cone) full-length unigenes, with a N50 length of 2523, 3480, 3181, and 3267 bp, respectively. Additionally, 35,887, 11,306, 36,422, and 25,439 SSRs were detected for the male cone, leaf, root, and stem, respectively. The number of long non-coding RNAs predicted from the root was the largest (11,113), and the other tissues were 3408 (stem), 3193 (leaf), and 3107 (male cone), respectively. Functional annotation and enrichment analysis of tissue-specific expressed genes revealed the special roles in response to environmental stress and adaptability in the different four tissues. We also characterized the gene families and pathways related to abiotic factors. This work provides a comprehensive transcriptome resource for C. oliveri, and this resource will facilitate further studies on the functional genomics and adaptive evolution of C. oliveri.


2021 ◽  
Author(s):  
Yaoxian Lv ◽  
Lei Cai ◽  
Jingyang Gao

Abstract Background: Single-molecule real-time (SMRT) sequencing data are characterized by long reads and high read depth. Compared with next-generation sequencing (NGS), SMRT sequencing data can present more structural variations (SVs) and has greater advantages in calling variation. However, there are high sequencing errors and noises in SMRT sequencing data, which brings inaccurately on calling SVs from sequencing data. Most existing tools are unable to overcome the sequencing errors and detect genomic deletions. Methods and results: In this investigation, we propose a new method for calling deletions from SMRT sequencing data, called MaxDEL. MaxDEL can effectively overcome the noise of SMRT sequencing data and integrates new machine learning and deep learning technologies. Firstly, it uses machine learning method to calibrate the deletions regions from variant call format (VCF) file. Secondly, MaxDEL develops a novel feature visualization method to convert the variant features to images and uses these images to accurately call the deletions based on convolutional neural network (CNN). The result shows that MaxDEL performs better in terms of accuracy and recall for calling variants when compared with existing methods in both real data and simulative data. Conclusions: We propose a method (MAXDEL) for calling deletion variations, which effectively utilizes both machine learning and deep learning methods. We tested it with different SMRT data and evaluated its effectiveness. The research result shows that the use of machine learning and deep learning methods has great potential in calling deletion variations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xingxing Yuan ◽  
Qiong Wang ◽  
Bin Yan ◽  
Jiong Zhang ◽  
Chenchen Xue ◽  
...  

Faba bean (Vicia faba L.) is one of the most widely grown cool season legume crops in the world. Winter faba bean normally has a vernalization requirement, which promotes an earlier flowering and pod setting than unvernalized plants. However, the molecular mechanisms of vernalization in faba bean are largely unknown. Discovering vernalization-related candidate genes is of great importance for faba bean breeding. In this study, the whole transcriptome of faba bean buds was profiled by using next-generation sequencing (NGS) and single-molecule, real-time (SMRT) full-length transcriptome sequencing technology. A total of 29,203 high-quality non-redundant transcripts, 21,098 complete coding sequences (CDS), 1,045 long non-coding RNAs (lncRNAs), and 12,939 simple sequence repeats (SSRs) were identified. Furthermore, 4,044 differentially expressed genes (DEGs) were identified through pairwise comparisons. By Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, these differentially expressed transcripts were found to be enriched in binding and transcription factor activity, electron carrier activity, rhythmic process, and receptor activity. Finally, 50 putative vernalization-related genes that played important roles in the vernalization of faba bean were identified; we also found that the levels of vernalization-responsive transcripts showed significantly higher expression levels in cold-treated buds. The expression of VfSOC1, one of the candidate genes, was sensitive to vernalization. Ectopic expression of VfSOC1 in Arabidopsis brought earlier flowering. In conclusion, the abundant vernalization-related transcripts identified in this study will provide a basis for future researches on the vernalization and faba bean breeding and established a reference full-length transcriptome for future studies on faba bean.


Sign in / Sign up

Export Citation Format

Share Document