Transcriptome Profiling Provides Insight into the Genes in Carotenoid Biosynthesis during the Mesocarp and Seed Developmental Stages of Avocado (Persea americana)

Avocado (Persea americana Mill.) is an economically important crop because of its high nutritional value. However, the absence of a sequenced avocado reference genome has hindered investigations of secondary metabolism. For next-generation high-throughput transcriptome sequencing, we obtained 365,615,152 and 348,623,402 clean reads as well as 109.13 and 104.10 Gb of sequencing data for avocado mesocarp and seed, respectively, during five developmental stages. High-quality reads were assembled into 100,837 unigenes with an average length of 847.40 bp (N50 = 1725 bp). Additionally, 16,903 differentially expressed genes (DEGs) were detected, 17 of which were related to carotenoid biosynthesis. The expression levels of most of these 17 DEGs were higher in the mesocarp than in the seed during five developmental stages. In this study, the avocado mesocarp and seed transcriptome were also sequenced using single-molecule long-read sequencing to acquired 25.79 and 17.67 Gb clean data, respectively. We identified 233,014 and 238,219 consensus isoforms in avocado mesocarp and seed, respectively. Furthermore, 104 and 59 isoforms were found to correspond to the putative 11 carotenoid biosynthetic-related genes in the avocado mesocarp and seed, respectively. The isoform numbers of 10 out of the putative 11 genes involved in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed. Besides, alpha- and beta-carotene contents in the avocado mesocarp and seed during five developmental stages were also measured, and they were higher in the mesocarp than in the seed, which validated the results of transcriptome profiling. Gene expression changes and the associated variations in gene dosage could influence carotenoid biosynthesis. These results will help to further elucidate carotenoid biosynthesis in avocado.

Download Full-text

PacBio single-molecule long-read sequencing shed new light on the complexity of the Carex breviculmis transcriptome

BMC Genomics ◽

10.1186/s12864-019-6163-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 2

Author(s):

Ke Teng ◽

Wenjun Teng ◽

Haifeng Wen ◽

Yuesen Yue ◽

Weier Guo ◽

...

Keyword(s):

Single Molecule ◽

Average Length ◽

Transcriptome Profiling ◽

Full Length ◽

Regulation Mechanism ◽

Forage Production ◽

Smrt Sequencing ◽

Reference Transcriptome ◽

Long Read ◽

Ecological Conservation

Abstract Background Carex L., a grass genus commonly known as sedges, is distributed worldwide and contributes constructively to turf management, forage production, and ecological conservation. The development of next-generation sequencing (NGS) technologies has considerably improved our understanding of transcriptome complexity of Carex L. and provided a valuable genetic reference. However, the current transcriptome is not satisfactory mainly because of the enormous difficulty in obtaining full-length transcripts. Results In this study, we employed PacBio single-molecule long-read sequencing (SMRT) technology for whole-transcriptome profiling in Carex breviculmis. We generated 60,353 high-confidence non-redundant transcripts with an average length of 2302-bp. A total of 3588 alternative splicing events, and 1273 long non-coding RNAs were identified. Furthermore, 40,347 complete coding sequences were predicted, providing an informative reference transcriptome. In addition, the transcriptional regulation mechanism of C. breviculmis in response to shade stress was further explored by mapping the NGS data to the reference transcriptome constructed by SMRT sequencing. Conclusions This study provided a full-length reference transcriptome of C. breviculmis using the SMRT sequencing method for the first time. The transcriptome atlas obtained will not only facilitate future functional genomics studies but also pave the way for further selective and genic engineering breeding projects for C. breviculmis.

Download Full-text

Single-Molecule Long-Read Sequencing Reveals the Diversity of Full-Length Transcripts in Leaves of Gnetum (Gnetales)

International Journal of Molecular Sciences ◽

10.3390/ijms20246350 ◽

2019 ◽

Vol 20 (24) ◽

pp. 6350 ◽

Cited By ~ 2

Author(s):

Nan Deng ◽

Chen Hou ◽

Fengfeng Ma ◽

Caixia Liu ◽

Yuxin Tian

Keyword(s):

Single Molecule ◽

Developmental Stages ◽

Alternative Polyadenylation ◽

Full Length ◽

Stomatal Development ◽

Rna Seq ◽

Leaf Transcriptome ◽

Long Read ◽

Non Coding Rnas ◽

A Site

The limitations of RNA sequencing make it difficult to accurately predict alternative splicing (AS) and alternative polyadenylation (APA) events and long non-coding RNAs (lncRNAs), all of which reveal transcriptomic diversity and the complexity of gene regulation. Gnetum, a genus with ambiguous phylogenetic placement in seed plants, has a distinct stomatal structure and photosynthetic characteristics. In this study, a full-length transcriptome of Gnetum luofuense leaves at different developmental stages was sequenced with the latest PacBio Sequel platform. After correction by short reads generated by Illumina RNA-Seq, 80,496 full-length transcripts were obtained, of which 5269 reads were identified as isoforms of novel genes. Additionally, 1660 lncRNAs and 12,998 AS events were detected. In total, 5647 genes in the G. luofuense leaves had APA featured by at least one poly(A) site. Moreover, 67 and 30 genes from the bHLH gene family, which play an important role in stomatal development and photosynthesis, were identified from the G. luofuense genome and leaf transcripts, respectively. This leaf transcriptome supplements the reference genome of G. luofuense, and the AS events and lncRNAs detected provide valuable resources for future studies of investigating low photosynthetic capacity of Gnetum.

Download Full-text

Single-Molecule Real-Time Sequencing of the Madhuca pasquieri (Dubard) Lam. Transcriptome Reveals the Diversity of Full-Length Transcripts

Forests ◽

10.3390/f11080866 ◽

2020 ◽

Vol 11 (8) ◽

pp. 866

Author(s):

Lei Kan ◽

Qicong Liao ◽

Zhiyao Su ◽

Yushan Tan ◽

Shuyu Wang ◽

...

Keyword(s):

Seed Germination ◽

Single Molecule ◽

Developmental Stages ◽

De Novo ◽

Full Length ◽

Wild Plant ◽

Transcript Isoforms ◽

Long Read ◽

Full Length Transcript ◽

Generation Sequencing

Madhuca pasquieri (Dubard) Lam. is a tree on the International Union for Conservation of Nature Red List and a national key protected wild plant (II) of China, known for its seed oil and timber. However, lacking of genomic and transcriptome data for this species hampers study of its reproduction, utilization, and conservation. Here, single-molecule long-read sequencing (PacBio) and next-generation sequencing (Illumina) were combined to obtain the transcriptome from five developmental stages of M. pasquieri. Overall, 25,339 transcript isoforms were detected by PacBio, including 24,492 coding sequences (CDSs), 9440 simple sequence repeats (SSRs), 149 long non-coding RNAs (lncRNAs), and 182 alternative splicing (AS) events, a majority was retained intron (RI). A further 1058 transcripts were identified as transcriptional factors (TFs) from 51 TF families. PacBio recovered more full-length transcript isoforms with a longer length, and a higher expression level, whereas larger number of transcripts (124,405) was captured in de novo from Illumina. Using Nr, Swissprot, KOG, and KEGG databases, 24,405 transcripts (96.31%) were annotated by PacBio. Functional annotation revealed a role for the auxin, abscisic acid, gibberellin, and cytokinine metabolic pathways in seed germination and post-germination. These findings support further studies on seed germination mechanism and genome of M. pasquieri, and better protection of this endangered species.

Download Full-text

Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping

10.1101/110163 ◽

2017 ◽

Cited By ~ 5

Author(s):

Tslil Gabrieli ◽

Hila Sharim ◽

Yael Michaeli ◽

Yuval Ebenstein

Keyword(s):

Single Molecule ◽

Genome Mapping ◽

Single Point ◽

Read Length ◽

Whole Genome ◽

Sequencing Analysis ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Whole Genome Analysis ◽

Long Read

ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.

Download Full-text

Single-Molecule Real-Time Transcript Sequencing of Turnips Unveiling the Complexity of the Turnip Transcriptome

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401434 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3505-3514

Author(s):

Hongmei Zhuang ◽

Qiang Wang ◽

Hongwei Han ◽

Huifang Liu ◽

Hao Wang

Keyword(s):

Real Time ◽

Brassica Rapa ◽

Single Molecule ◽

Developmental Stages ◽

Full Length ◽

Sequencing Data ◽

Smrt Sequencing ◽

High Quality ◽

Transcript Structure ◽

Novel Transcripts

To generate the full-length transcriptome of Xinjiang green and purple turnips, Brassica rapa var. Rapa, using single-molecule real-time (SMRT) sequencing. The samples of two varieties of Brassica rapa var. Rapa at five developmental stages were collected and combined to perform SMRT sequencing. Meanwhile, next generation sequencing was performed to correct SMRT sequencing data. A series of analyses were performed to investigate the transcript structure. Finally, the obtained transcripts were mapped to the genome of Brassica rapa ssp. pekinesis Chiifu to identify potential novel transcripts. For green turnip (F01), a total of 19.54 Gb clean data were obtained from 8 cells. The number of reads of insert (ROI) and full-length non-chimeric (FLNC) reads were 510,137 and 267,666. In addition, 82,640 consensus isoforms were obtained in the isoform sequences clustering, of which 69,480 were high-quality, and 13,160 low-quality sequences were corrected using Illumina RNA seq data. For purple turnip (F02), there were 20.41 Gb clean data, 552,829 ROIs, and 274,915 FLNC sequences. A total of 93,775 consensus isoforms were obtained, of which 78,798 were high-quality, and the 14,977 low-quality sequences were corrected. Following the removal of redundant sequences, there were 46,516 and 49,429 non-redundant transcripts for F01 and F02, respectively; 7,774 and 9,385 alternative splicing events were predicted for F01 and F02; 63,890 simple sequence repeats, 59,460 complete coding sequences, and 535 long-non coding RNAs were predicted. Moreover, 5,194 and 5,369 novel transcripts were identified by mapping to Brassica rapa ssp. pekinesis Chiifu. The obtained transcriptome data may improve turnip genome annotation and facilitate further study of the Brassica rapa var. Rapa genome and transcriptome.

Download Full-text

Comparative transcriptomic analysis reveals a series of single nucleotide polymorphism between red- and white-fleshed loquats (Eriobotrya japonica)

Czech Journal of Genetics and Plant Breeding ◽

10.17221/43/2016-cjgpb ◽

2017 ◽

Vol 53 (No. 3) ◽

pp. 97-106

Author(s):

S. Sun ◽

J. Li ◽

D. Chen ◽

H. Xie ◽

M. Tu ◽

...

Keyword(s):

Developmental Stages ◽

De Novo ◽

Average Length ◽

Carotenoid Biosynthesis ◽

Expression Level ◽

Eriobotrya Japonica ◽

Carotenoid Content ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Fruit Samples

Loquat (Eriobotrya japonica) is an economically important crop and red-fleshed cultivars have a much higher carotenoid content than white-fleshed cultivars. We used Illumina RNA-seq technology to gain a global overview of the loquat transcriptome from a mixture of fruit samples at different developmental stages for both red-fleshed and white-fleshed loquat. A total of 94.98 million paired-end short reads were obtained and 61 586 unigenes were generated from de novo assembly with an average length of 817 bp. Among these unigenes, 44 710 unigenes were annotated by blast against Nr, Swissprot, GO, COG and KEGG databases. For these annotated unigenes, 123 biosynthesis pathways were predicted by mapping these unigenes to the reference canonical pathways and 41 unigenes were predicted to be involved in carotenoid biosynthesis. RT-qPCR analysis showed that the expression level of the LCYB gene was higher in red-fleshed loquat and the CRTRB gene had a higher expression level in white-fleshed loquat. Comparative analysis of the two transcriptomes revealed 2396 single nucleotide polymorphisms (SNPs) between red- and white-fleshed loquats. The majority of SNPs identified between the two loquat cultivars were nonsense mutations and one out of eleven SNPs in candidate genes involved in carotenoid biosynthesis was a sense mutation. This suggests that the analysis based on transcriptomes can reveal key genes related to the carotenoid biosynthesis and more carotene in red-fleshed loquat cultivars may result from both more carotene produced by the higher expression of LCYB genes and less carotene converted because of the low expression of the CRTRB gene. All these results from the transcriptome analysis will be useful for the elucidation of genetic differences between red- and white-fleshed loquat fruits and further functional analysis for genes responsible for carotenoid accumulation.

Download Full-text

Full-length SMRT transcriptome sequencing and microsatellite characterization in Paulownia catalpifolia

Scientific Reports ◽

10.1038/s41598-021-87538-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yanzhi Feng ◽

Yang Zhao ◽

Jiajia Zhang ◽

Baoping Wang ◽

Chaowei Yang ◽

...

Keyword(s):

Single Molecule ◽

Average Length ◽

Full Length ◽

Timber Species ◽

Sequencing Data ◽

Genetic Studies ◽

Ssr Loci ◽

Average Distribution ◽

Nucleotide Repeats ◽

Distribution Distance

AbstractPaulownia catalpifolia is an important, fast-growing timber species known for its high density, color and texture. However, few transcriptomic and genetic studies have been conducted in P. catalpifolia. In this study, single-molecule real-time sequencing technology was applied to obtain the full-length transcriptome of P. catalpifolia leaves treated with varying degrees of drought stress. The sequencing data were then used to search for microsatellites, or simple sequence repeats (SSRs). A total of 28.83 Gb data were generated, 25,969 high-quality (HQ) transcripts with an average length of 1624 bp were acquired after removing the redundant reads, and 25,602 HQ transcripts (98.59%) were annotated using public databases. Among the HQ transcripts, 16,722 intact coding sequences, 149 long non-coding RNAs and 179 alternative splicing events were predicted, respectively. A total of 7367 SSR loci were distributed throughout 6293 HQ transcripts, of which 763 complex SSRs and 6604 complete SSRs. The SSR appearance frequency was 28.37%, and the average distribution distance was 5.59 kb. Among the 6604 complete SSR loci, 1–3 nucleotide repeats were dominant, occupying 97.85% of the total SSR loci, of which mono-, di- and tri-nucleotide repeats were 44.68%, 33.86% and 19.31%, respectively. We detected 112 repeat motifs, of which A/T (42.64%), AG/CT (12.22%), GA/TC (9.63%), GAA/TTC (1.57%) and CCA/TGG (1.54%) were most common in mono-, di- and tri-nucleotide repeats, respectively. The length of the repeat SSR motifs was 10–88 bp, and 4997 (75.67%) were ≤ 20 bp. This study provides a novel full-length transcriptome reference for P. catalpifolia and will facilitate the identification of germplasm resources and breeding of new drought-resistant P. catalpifolia varieties.

Download Full-text

Full-Length Transcriptome Sequencing and EST-SSR marker development of tiger lily (Lilium lancifolium Thunb.)

10.21203/rs.3.rs-415397/v1 ◽

2021 ◽

Author(s):

Mingwei Sun ◽

Yilian Zhao ◽

Xiaobin Shao ◽

Jintao Ge ◽

Xueyan Tang ◽

...

Keyword(s):

Single Molecule ◽

Consensus Sequence ◽

Transcriptome Profiling ◽

Full Length ◽

Biological Regulation ◽

Regulation Network ◽

Long Read ◽

Lilium Lancifolium ◽

Comprehensive Framework ◽

Whole Transcriptome

Abstract It is well known that transcriptional diversity plays important roles in plant biological regulation. But for the difficulty in full-length transcripts obtainment, the available tiger lily (Lilium lancifolium Thunb) transcriptome characterization are still not complete. To improve the integrity of tiger lily transcriptome information, (SMRT PacBio single-molecule long-read sequencing technology) was employed to accomplish the whole transcriptome profiling. A total of 815,624 CCS (Circular Consensus Sequence) reads with mean length of 1,295 bp were obtained. Based on these transcripts, 61,744 reads were full-length reads containing both the 5’ primer, 3’ primer and the poly (A) tail and 3,319 EST-derived SSRs were developed from 2968 unigenes. With the obtained informative reference transcriptome,768 transcription factors and 6,852 long non-coding RNAs were identified, providing a comprehensive framework of the transcriptional regulation network. Of all the annotated transcripts, 15,608 were distributed into 25 various Clusters of euKaryotic Orthologous Groups (KOG), and 10,706 unigenes were categorized into 52 functional groups which were divided into three categories. These results would provide a comprehensive set of reference transcripts and further improve our understanding of the tiger lily transcriptomes.

Download Full-text

DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation

Genome Biology ◽

10.1186/s13059-021-02510-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yang Liu ◽

Wojciech Rosikiewicz ◽

Ziwei Pan ◽

Nathaniel Jillette ◽

Ping Wang ◽

...

Keyword(s):

Dna Methylation ◽

Single Molecule ◽

Evaluation Criteria ◽

Systematic Evaluation ◽

Whole Genome ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Long Read ◽

Genome Scale ◽

Analytical Tools

Abstract Background Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies. Results We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database (https://nanome.jax.org) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts. Conclusions Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.

Download Full-text

Long-read sequencing of Chrysanthemum morifolium cv. ‘Hangju’ transcriptome reveals flavonoid biosynthesis and regulation

10.21203/rs.2.19942/v1 ◽

2020 ◽

Author(s):

Tao Wang ◽

Feng Yang ◽

Qiaosheng Guo ◽

Qingjun Zou ◽

Wenyan Zhang ◽

...

Keyword(s):

Single Molecule ◽

Developmental Stages ◽

Chrysanthemum Morifolium ◽

Flavonoid Biosynthesis ◽

Full Length ◽

Bioactive Components ◽

Smrt Sequencing ◽

Major Genes ◽

Long Read ◽

Gene Expression Levels

Abstract Background: The inflorescence of Chrysanthemum morifolium cv. ‘Hangju’ has been widely used in China due to its antioxidant and anti-inflammatory properties. The biosynthesis and regulation of flavonoids, a group of bioactive components, in C. morifolium are poorly understood. Transcriptome sequencing is an effective method for obtaining transcript information. Therefore, single-molecule real-time (SMRT) sequencing was performed to obtain the full-length genes involved in flavonoid biosynthesis and regulation in C. morifolium.Results: High-quality RNA was extracted from the inflorescence of C. morifolium at different developmental stages and used to construct two libraries (0-5 kb and 4.5-10 kb) for sequencing. Finally, 125,532 non-redundant isoforms with a mean length of 2,009 bp were obtained. Of these, 2,083 transcripts were annotated to pathways related to flavonoid biosynthesis, and 56 isoforms were annotated as CHS, CHI, F3H, F3’H, FNS Ⅱ, FLS, DFR and ANS genes. Based on gene expression levels at different stages, we predicted the major genes involved in flavonoid biosynthesis. By phylogenetic analysis, we found two candidate MYB transcription factors (CmMYBF1 and CmMYBF2) activating flavonol biosynthesis.Conclusions: Based on the full-length transcriptomic data and further quantitative analysis, the major genes involved in flavonoid biosynthesis and regulation in C. morifolium were predicted in our study. The results provide a valuable theoretical basis for the introduction and cultivation of C. morifolium cv. ‘Hangju’.

Download Full-text