scholarly journals Combined genomic, transcriptomic, and metabolomic analyses provide insights into chayote (Sechium edule) evolution and fruit development

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Anzhen Fu ◽  
Qing Wang ◽  
Jianlou Mu ◽  
Lili Ma ◽  
Changlong Wen ◽  
...  

AbstractChayote (Sechium edule) is an agricultural crop in the Cucurbitaceae family that is rich in bioactive components. To enhance genetic research on chayote, we used Nanopore third-generation sequencing combined with Hi–C data to assemble a draft chayote genome. A chromosome-level assembly anchored on 14 chromosomes (N50 contig and scaffold sizes of 8.40 and 46.56 Mb, respectively) estimated the genome size as 606.42 Mb, which is large for the Cucurbitaceae, with 65.94% (401.08 Mb) of the genome comprising repetitive sequences; 28,237 protein-coding genes were predicted. Comparative genome analysis indicated that chayote and snake gourd diverged from sponge gourd and that a whole-genome duplication (WGD) event occurred in chayote at 25 ± 4 Mya. Transcriptional and metabolic analysis revealed genes involved in fruit texture, pigment, flavor, flavonoids, antioxidants, and plant hormones during chayote fruit development. The analysis of the genome, transcriptome, and metabolome provides insights into chayote evolution and lays the groundwork for future research on fruit and tuber development and genetic improvements in chayote.

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Lili Ma ◽  
Qing Wang ◽  
Jianlou Mu ◽  
Anzhen Fu ◽  
Changlong Wen ◽  
...  

AbstractSnake gourd (Trichosanthes anguina L.), which belongs to the Cucurbitaceae family, is a popular ornamental and food crop species with medicinal value and is grown in many parts of the world. Although progress has been made in its genetic improvement, the organization, composition, and evolution of the snake gourd genome remain largely unknown. Here, we report a high-quality genome assembly for snake gourd, comprising 202 contigs, with a total size of 919.8 Mb and an N50 size of 20.1 Mb. These findings indicate that snake gourd has one of the largest genomes of Cucurbitaceae species sequenced to date. The snake gourd genome assembly harbors 22,874 protein-coding genes and 80.0% of the genome consists of repetitive sequences. Phylogenetic analysis reveals that snake gourd is closely related to sponge gourd but diverged from their common ancestor ~33–47 million years ago. The genome sequence reported here serves as a valuable resource for snake gourd genetic research and comparative genomic studies in Cucurbitaceae and other plant species. In addition, fruit transcriptome analysis reveals the candidate genes related to quality traits during snake gourd fruit development and provides a basis for future research on snake gourd fruit development and ripening at the transcript level.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zongrui Dai ◽  
Jianyu Ren ◽  
Xiaoling Tong ◽  
Hai Hu ◽  
Kunpeng Lu ◽  
...  

The domesticated silkworm, Bombyx mori, is an important model system for the order Lepidoptera. Currently, based on third-generation sequencing, the chromosome-level genome of Bombyx mori has been released. However, its transcripts were mainly assembled by using short reads of second-generation sequencing and expressed sequence tags which cannot explain the transcript profile accurately. Here, we used PacBio Iso-Seq technology to investigate the transcripts from 45 developmental stages of Bombyx mori. We obtained 25,970 non-redundant high-quality consensus isoforms capturing ∼60% of previous reported RNAs, 15,431 (∼47%) novel transcripts, and identified 7,253 long non-coding RNA (lncRNA) with a large proportion of novel lncRNA (∼56%). In addition, we found that transposable elements (TEs) exonization account for 11,671 (∼45%) transcripts including 5,980 protein-coding transcripts (∼32%) and 5,691 lncRNAs (∼79%). Overall, our results expand the silkworm transcripts and have general implications to understand the interaction between TEs and their host genes. These transcripts resource will promote functional studies of genes and lncRNAs as well as TEs in the silkworm.


DNA Research ◽  
2021 ◽  
Vol 28 (5) ◽  
Author(s):  
Ding Huang ◽  
Ruhong Ming ◽  
Shiqiang Xu ◽  
Jihua Wang ◽  
Shaochang Yao ◽  
...  

Abstract Gynostemma pentaphyllum (Thunb.) Makino is an economically valuable medicinal plant belonging to the Cucurbitaceae family that produces the bioactive compound gypenoside. Despite several transcriptomes having been generated for G. pentaphyllum, a reference genome is still unavailable, which has limited the understanding of the gypenoside biosynthesis and regulatory mechanism. Here, we report a high-quality G. pentaphyllum genome with a total length of 582 Mb comprising 1,232 contigs and a scaffold N50 of 50.78 Mb. The G. pentaphyllum genome comprised 59.14% repetitive sequences and 25,285 protein-coding genes. Comparative genome analysis revealed that G. pentaphyllum was related to Siraitia grosvenorii, with an estimated divergence time dating to the Paleogene (∼48 million years ago). By combining transcriptome data from seven tissues, we reconstructed the gypenoside biosynthetic pathway and potential regulatory network using tissue-specific gene co-expression network analysis. Four UDP-glucuronosyltransferases (UGTs), belonging to the UGT85 subfamily and forming a gene cluster, were involved in catalyzing glycosylation in leaf-specific gypenoside biosynthesis. Furthermore, candidate biosynthetic genes and transcription factors involved in the gypenoside regulatory network were identified. The genetic information obtained in this study provides insights into gypenoside biosynthesis and lays the foundation for further exploration of the gypenoside regulatory mechanism.


Plants ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 283
Author(s):  
Fei Dong ◽  
Zhicong Lin ◽  
Jing Lin ◽  
Ray Ming ◽  
Wenping Zhang

Rambutan (Nephelium lappaceum L.) is an important fruit tree that belongs to the family Sapindaceae and is widely cultivated in Southeast Asia. We sequenced its chloroplast genome for the first time and assembled 161,321 bp circular DNA. It is characterized by a typical quadripartite structure composed of a large (86,068 bp) and small (18,153 bp) single-copy region interspersed by two identical inverted repeats (IRs) (28,550 bp). We identified 132 genes including 78 protein-coding genes, 29 tRNA and 4 rRNA genes, with 21 genes duplicated in the IRs. Sixty-three simple sequence repeats (SSRs) and 98 repetitive sequences were detected. Twenty-nine codons showed biased usage and 49 potential RNA editing sites were predicted across 18 protein-coding genes in the rambutan chloroplast genome. In addition, coding gene sequence divergence analysis suggested that ccsA, clpP, rpoA, rps12, psbJ and rps19 were under positive selection, which might reflect specific adaptations of N. lappaceum to its particular living environment. Comparative chloroplast genome analyses from nine species in Sapindaceae revealed that a higher similarity was conserved in the IR regions than in the large single-copy (LSC) and small single-copy (SSC) regions. The phylogenetic analysis showed that N. lappaceum chloroplast genome has the closest relationship with that of Pometia tomentosa. The understanding of the chloroplast genomics of rambutan and comparative analysis of Sapindaceae species would provide insight into future research on the breeding of rambutan and Sapindaceae evolutionary studies.


Forests ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 826 ◽  
Author(s):  
Sui Wang ◽  
Su Chen ◽  
Caixia Liu ◽  
Yi Liu ◽  
Xiyang Zhao ◽  
...  

Research Highlights: A rigorous genome survey helped us to estimate the genomic characteristics, remove the DNA contamination, and determine the sequencing scheme of Betula platyphylla. Background and Objectives: B. platyphylla is a common tree species in northern China that has high economic and medicinal value. However, there is a lack of complete genomic information for this species, which severely constrains the progress of relevant research. The objective of this study was to survey the genome of B. platyphylla and determine the large-scale sequencing scheme of this species. Materials and Methods: Next-generation sequencing was used to survey the genome. The genome size, heterozygosity rate, and repetitive sequences were estimated by k-mer analysis. After preliminary genome assembly, sequence contamination was identified and filtered by sequence alignment. Finally, we obtained sterilized plantlets of B. platyphylla by plant tissue culture, which can be used for third-generation sequencing. Results: We estimated the genome size to be 432.9 Mb and the heterozygosity rate to be 1.22%, with repetitive sequences accounting for 62.2%. Bacterial contamination was observed in the leaves taken from the field, and most of the contaminants may be from the genus Mycobacterium. A total of 249,784 simple sequence repeat (SSR) loci were also identified in the B. platyphylla genome. Among the SSRs, only 11,326 can be used as candidates to distinguish the three Betula species. Conclusions: The B. platyphylla genome is complex and highly heterozygous and repetitive. Higher-depth third-generation sequencing may yield better assembly results. Sterilized plantlets can be used for sequencing to avoid contamination.


Plants ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 469
Author(s):  
Denis O. Omelchenko ◽  
Maxim S. Makarenko ◽  
Artem S. Kasianov ◽  
Mikhail I. Schelkunov ◽  
Maria D. Logacheva ◽  
...  

Shepherd’s purse (Capsella bursa-pastoris) is a cosmopolitan annual weed and a promising model plant for studying allopolyploidization in the evolution of angiosperms. Though plant mitochondrial genomes are a valuable source of genetic information, they are hard to assemble. At present, only the complete mitogenome of C. rubella is available out of all species of the genus Capsella. In this work, we have assembled the complete mitogenome of C. bursa-pastoris using high-precision PacBio SMRT third-generation sequencing technology. It is 287,799 bp long and contains 32 protein-coding genes, 3 rRNAs, 25 tRNAs corresponding to 15 amino acids, and 8 open reading frames (ORFs) supported by RNAseq data. Though many repeat regions have been found, none of them is longer than 1 kbp, and the most frequent structural variant originated from these repeats is present in only 4% of the mitogenome copies. The mitochondrial DNA sequence of C. bursa-pastoris differs from C. rubella, but not from C. orientalis, by two long inversions, suggesting that C. orientalis could be its maternal progenitor species. In total, 377 C to U RNA editing sites have been detected. All genes except cox1 and atp8 contain RNA editing sites, and most of them lead to non-synonymous changes of amino acids. Most of the identified RNA editing sites are identical to corresponding RNA editing sites in A. thaliana.


2020 ◽  
Author(s):  
Fei Dong ◽  
Zhicong Lin ◽  
Jing Lin ◽  
Ray Ming ◽  
Wenping Zhang

Abstract Background: Rambutan (Nephelium lappaceum L.) is an important fruit tree belongs to the family Sapindaceae and widely cultivated in Southeast Asia. The chloroplast of plants, as a photosynthetic organelle plays an important role in the photosynthesis and secondary metabolic activities. The chloroplast genome sequencing has become an integral part in understanding the genomic machinery and the phylogenetic histories of rambutan organelles.Results: We sequenced its chloroplast genome and assembled 161,321 bp circular DNA. It is characterized by a typical quadripartite structure composed of a large (86,068 bp) and small (18,153 bp) single-copy region interspersed by two identical inverted repeats (IRs) (28,550 bp). We identified 132 genes including 78 protein-coding, 29 tRNA and 4 rRNA genes, with 21 genes duplicated in the IRs. Sixty-three simple sequence repeats (SSRs) and 98 repetitive sequences were detected. Twenty-nine codons showed biased usage and 49 potential RNA editing sites were predicted across 18 protein-coding genes in the rambutan chloroplast genome. In addition, coding gene sequence divergence analysis of N. lappaceum suggested that ccsA, clpP, rpoA, rps12, psbJ and rps19 were under positive selection, which might reflect specific adaptations of N. lappaceum to its particular living environment. Comparative chloroplast genome analyses from five species in Sapindaceae revealed that a higher similarity was conserved in the IR regions than in the LSC and SSC regions. The phylogenetic analysis showed that N. lappaceum chloroplast genome has the closest relationship with that of Pometia tomentosa. Conclusions: The understanding of the chloroplast genomics of rambutan and comparative analysis of Sapindaceae species would provide insight into future research on the breeding of rambutan and Sapindaceae evolutionary studies.


2020 ◽  
Author(s):  
Yangmei Qin ◽  
Zhe Lin ◽  
Dan Shi ◽  
Mindong Zhong ◽  
Te An ◽  
...  

AbstractIt is a long-term challenge to undertake reliable transcriptomic research under different circumstances of genome availability. Here, we newly developed a genome-free computational method to aid accurate transcriptome assembly, using the amphioxus as the example. Via integrating ten next generation sequencing (NGS) transcriptome datasets and one third-generation sequencing (TGS) dataset, we built a sequence library of non-redundant expressed transcripts for the amphioxus. The library consisted of overall 91,915 distinct transcripts, 51,549 protein-coding transcripts, and 16,923 novel extragenic transcripts. This substantially improved current amphioxus genome annotation by expanding the distinct gene number from 21,954 to 38,777. We consolidated the library significantly outperformed the genome, as well as de novo method, in transcriptome assembly from multiple aspects. For convenience, we curated the Integrative Transcript Library database of the amphioxus (http://www.bio-add.org/InTrans/). In summary, this work provides a practical solution for most organisms to alleviate the heavy dependence on good quality genome in transcriptome research. It also ensures the amphioxus transcriptome research grounding on reliable data.


Author(s):  
P.D.N. HEBERT ◽  
◽  
T.W.A. BRAUKMANN ◽  
S.W.J. PROSSER ◽  
S. RATNASINGHAM ◽  
...  

2020 ◽  
Vol 15 ◽  
Author(s):  
Hongdong Li ◽  
Wenjing Zhang ◽  
Yuwen Luo ◽  
Jianxin Wang

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.


Sign in / Sign up

Export Citation Format

Share Document