scholarly journals The Tung Tree (Vernicia Fordii) Genome Provides A Resource for Understanding Genome Evolution and Oil Improvement

2019 ◽  
Author(s):  
Lin Zhang ◽  
Meilan Liu ◽  
Hongxu Long ◽  
Wei Dong ◽  
Asher Pasha ◽  
...  

AbstractTung tree (Vernicia fordii) is an economically important woody oil plant that produces tung oil containing a high proportion of eleostearic acid (∼80%). Here we report a high-quality, chromosome-scale tung tree genome sequence of 1.12 Gb with 28,422 predicted genes and over 73% repeat sequences. Tung tree genome was assembled by combining Illumina short reads, PacBio single-molecule real-time long reads and Hi-C sequencing data. Insertion time analysis revealed that the repeat-driven tung tree genome expansion might be due to long standing long terminal repeat (LTR) retrotransposon bursts and lack of efficient DNA deletion mechanisms. An electronic fluorescent pictographic (eFP) browser was generated based on genomic and RNA-seq data from 17 various tissues and developmental stages. We identified 88 nucleotide-binding site (NBS)-encoding resistance genes, of which 17 genes may help the tung tree resist the Fusarium wilt shortly after infection. A total of 651 oil-related genes were identified and 88 of them were predicted to be directly involved in tung oil biosynthesis. The fewer phosphoenolpyruvate carboxykinase (PEPC) genes, and synergistic effects between transcription factors and oil biosynthesis-related genes may contribute to high oil content in tung seeds. The tung tree genome should provide valuable resources for molecular breeding and genetic improvement.

2019 ◽  
Vol 20 (17) ◽  
pp. 4117 ◽  
Author(s):  
Yu Ge ◽  
Zhihao Cheng ◽  
Xiongyuan Si ◽  
Weihong Ma ◽  
Lin Tan ◽  
...  

Avocado (Persea americana Mill.) is an economically important crop because of its high nutritional value. However, the absence of a sequenced avocado reference genome has hindered investigations of secondary metabolism. For next-generation high-throughput transcriptome sequencing, we obtained 365,615,152 and 348,623,402 clean reads as well as 109.13 and 104.10 Gb of sequencing data for avocado mesocarp and seed, respectively, during five developmental stages. High-quality reads were assembled into 100,837 unigenes with an average length of 847.40 bp (N50 = 1725 bp). Additionally, 16,903 differentially expressed genes (DEGs) were detected, 17 of which were related to carotenoid biosynthesis. The expression levels of most of these 17 DEGs were higher in the mesocarp than in the seed during five developmental stages. In this study, the avocado mesocarp and seed transcriptome were also sequenced using single-molecule long-read sequencing to acquired 25.79 and 17.67 Gb clean data, respectively. We identified 233,014 and 238,219 consensus isoforms in avocado mesocarp and seed, respectively. Furthermore, 104 and 59 isoforms were found to correspond to the putative 11 carotenoid biosynthetic-related genes in the avocado mesocarp and seed, respectively. The isoform numbers of 10 out of the putative 11 genes involved in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed. Besides, alpha- and beta-carotene contents in the avocado mesocarp and seed during five developmental stages were also measured, and they were higher in the mesocarp than in the seed, which validated the results of transcriptome profiling. Gene expression changes and the associated variations in gene dosage could influence carotenoid biosynthesis. These results will help to further elucidate carotenoid biosynthesis in avocado.


Molecules ◽  
2016 ◽  
Vol 21 (11) ◽  
pp. 1486 ◽  
Author(s):  
Zhiyong Zhan ◽  
Yicun Chen ◽  
Jay Shockey ◽  
Xiaojiao Han ◽  
Yangdong Wang

2018 ◽  
Author(s):  
Huilong Du ◽  
Chengzhi Liang

AbstractDue to the large number of repetitive sequences in complex eukaryotic genomes, fragmented and incompletely assembled genomes lose value as reference sequences, often due to short contigs that cannot be anchored or mispositioned onto chromosomes. Here we report a novel method Highly Efficient Repeat Assembly (HERA), which includes a new concept called a connection graph as well as algorithms for constructing the graph. HERA resolves repeats at high efficiency with single-molecule sequencing data, and enables the assembly of chromosome-scale contigs by further integrating genome maps and Hi-C data. We tested HERA with the genomes of rice R498, maize B73, human HX1 and Tartary buckwheat Pinku1. HERA can correctly assemble most of the tandemly repetitive sequences in rice using single-molecule sequencing data only. Using the same maize and human sequencing data published by Jiao et al. (2017) and Shi et al. (2016), respectively, we dramatically improved on the sequence contiguity compared with the published assemblies, increasing the contig N50 from 1.3 Mb to 61.2 Mb in maize B73 assembly and from 8.3 Mb to 54.4 Mb in human HX1 assembly with HERA. We provided a high-quality maize reference genome with 96.9% of the gaps filled (only 76 gaps left) and several incorrectly positioned sequences fixed compared with the B73 RefGen_v4 assembly. Comparisons between the HERA assembly of HX1 and the human GRCh38 reference genome showed that many gaps in GRCh38 could be filled, and that GRCh38 contained some potential errors that could be fixed. We assembled the Pinku1 genome into 12 scaffolds with a contig N50 size of 27.85 Mb. HERA serves as a new genome assembly/phasing method to generate high quality sequences for complex genomes and as a curation tool to improve the contiguity and completeness of existing reference genomes, including the correction of assembly errors in repetitive regions.


2020 ◽  
Vol 10 (10) ◽  
pp. 3505-3514
Author(s):  
Hongmei Zhuang ◽  
Qiang Wang ◽  
Hongwei Han ◽  
Huifang Liu ◽  
Hao Wang

To generate the full-length transcriptome of Xinjiang green and purple turnips, Brassica rapa var. Rapa, using single-molecule real-time (SMRT) sequencing. The samples of two varieties of Brassica rapa var. Rapa at five developmental stages were collected and combined to perform SMRT sequencing. Meanwhile, next generation sequencing was performed to correct SMRT sequencing data. A series of analyses were performed to investigate the transcript structure. Finally, the obtained transcripts were mapped to the genome of Brassica rapa ssp. pekinesis Chiifu to identify potential novel transcripts. For green turnip (F01), a total of 19.54 Gb clean data were obtained from 8 cells. The number of reads of insert (ROI) and full-length non-chimeric (FLNC) reads were 510,137 and 267,666. In addition, 82,640 consensus isoforms were obtained in the isoform sequences clustering, of which 69,480 were high-quality, and 13,160 low-quality sequences were corrected using Illumina RNA seq data. For purple turnip (F02), there were 20.41 Gb clean data, 552,829 ROIs, and 274,915 FLNC sequences. A total of 93,775 consensus isoforms were obtained, of which 78,798 were high-quality, and the 14,977 low-quality sequences were corrected. Following the removal of redundant sequences, there were 46,516 and 49,429 non-redundant transcripts for F01 and F02, respectively; 7,774 and 9,385 alternative splicing events were predicted for F01 and F02; 63,890 simple sequence repeats, 59,460 complete coding sequences, and 535 long-non coding RNAs were predicted. Moreover, 5,194 and 5,369 novel transcripts were identified by mapping to Brassica rapa ssp. pekinesis Chiifu. The obtained transcriptome data may improve turnip genome annotation and facilitate further study of the Brassica rapa var. Rapa genome and transcriptome.


2019 ◽  
Vol 137 ◽  
pp. 74-80 ◽  
Author(s):  
Jun Chen ◽  
Wenjuan Liu ◽  
Yanru Fan ◽  
Xu Zhou ◽  
Xinwei Tang ◽  
...  

2020 ◽  
Author(s):  
Tiange Lang

Abstract Background. Gel-forming mucin domains of mucin genes show great complexity with tandem repeats (TRs), thus make it difficult to study the sequences. Methods. With the coming of single molecule real-time (SMRT) sequencing technologies, we manage to present sequence structure of mucin domains via SMRT long reads for MUC2, MUC5AC, MUC5B and MUC6. Results. Our study shows that for different individuals, single nucleotide polymorphisms (SNPs) could be found in mucin domains of MUC2, MUC5AC, MUC5B and MUC6, while different number of tandem repeats could be found in mucin domains of MUC2 and MUC6. Conclusions. This information will provided new insights on getting the sequence for Tandem Repeat parts which locate in coding region.


2020 ◽  
Author(s):  
Ivan de la Rubia ◽  
Joel A. Indi ◽  
Silvia Carbonell-Sala ◽  
Julien Lagarde ◽  
M Mar Albà ◽  
...  

AbstractSingle-molecule long-read sequencing with Nanopore provides an unprecedented opportunity to measure transcriptomes from any sample1–3. However, current analysis methods rely on the comparison with a reference genome or transcriptome2,4,5, or the use of multiple sequencing technologies6,7, thereby precluding cost-effective studies in species with no genome assembly available, in individuals underrepresented in the existing reference, and for the discovery of disease-specific transcripts not directly identifiable from a reference genome. Methods for DNA assembly8–10 cannot be directly transferred to transcriptomes since their consensus sequences lack the required interpretability for genes with multiple transcript isoforms. To address these challenges, we have developed RATTLE, the first tool to perform reference-free reconstruction and quantification of transcripts from Nanopore long reads. Using simulated data, isoform spike-ins, and sequencing data from tissues and cell lines, we demonstrate that RATTLE accurately determines transcript sequence and abundance, is comparable to reference-based methods, and shows saturation in the number of predicted transcripts with increasing number of input reads.


2021 ◽  
Author(s):  
Tiange Lang

Abstract Mucins are large glycoproteins that cover and protect epithelial surface of the body. Gel-forming mucin domains of mucin genes are rich in proline, threonine, and serine that are heavily glycosylate. These domains show great complexity with tandem repeats (TRs), thus make it difficult to study the sequences. With the coming of single molecule real-time (SMRT) sequencing technologies, we manage to present sequence structure of mucin domains via SMRT long reads for gel-forming mucins MUC2, MUC5AC, MUC5B and MUC6. Our study shows that for different individuals, single nucleotide polymorphisms (SNPs) could be found in mucin domains of MUC2, MUC5AC, MUC5B and MUC6, while different number of tandem repeats could be found in mucin domains of MUC2 and MUC6. Furthermore, we get the sequence of MUC2, MUC5AC, and MUC5B mucin domain in a Chinese individual at accuracy of possibly maximum 99.98%, 99.93%, and 99.76%, respectively. We report a new method to obtain DNA sequence of gel-forming mucin domains. This method will provided new insights on getting the sequence for Tandem Repeat parts which locate in coding region. With the sequences we obtained with this method, we can give more information for people to study the sequences of gel-forming mucin domains.


HortScience ◽  
2015 ◽  
Vol 50 (12) ◽  
pp. 1830-1832 ◽  
Author(s):  
Timothy Rinehart ◽  
Jay Shockey ◽  
Ned Edwards ◽  
James M. Spiers ◽  
Thomas Klasson

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Yang Gao ◽  
Zeyang Suding ◽  
Lele Wang ◽  
Dandan Liu ◽  
Shijie Su ◽  
...  

Abstract Background Eimeria necatrix is one of the most pathogenic parasites, causing high mortality in chickens. Although its genome sequence has been published, the sequences and complete structures of its mRNA transcripts remain unclear, limiting exploration of novel biomarkers, drug targets and genetic functions in E. necatrix. Methods Second-generation merozoites (MZ-2) of E. necatrix were collected using Percoll density gradients, and high-quality RNA was extracted from them. Single-molecule real-time (SMRT) sequencing and Illumina sequencing were combined to generate the transcripts of MZ-2. Combined with the SMRT sequencing data of sporozoites (SZ) collected in our previous study, the transcriptome and transcript structures of E. necatrix were studied. Results SMRT sequencing yielded 21,923 consensus isoforms in MZ-2. A total of 17,151 novel isoforms of known genes and 3918 isoforms of novel genes were successfully identified. We also identified 2752 (SZ) and 3255 (MZ-2) alternative splicing (AS) events, 1705 (SZ) and 1874 (MZ-2) genes with alternative polyadenylation (APA) sites, 4019 (SZ) and 2588 (MZ-2) fusion transcripts, 159 (SZ) and 84 (MZ-2) putative transcription factors (TFs) and 3581 (SZ) and 2039 (MZ-2) long non-coding RNAs (lncRNAs). To validate fusion transcripts, reverse transcription-PCR was performed on 16 candidates, with an accuracy reaching up to 87.5%. Sanger sequencing of the PCR products further confirmed the authenticity of chimeric transcripts. Comparative analysis of transcript structures revealed a total of 3710 consensus isoforms, 815 AS events, 1139 genes with APA sites, 20 putative TFs and 352 lncRNAs in both SZ and MZ-2. Conclusions We obtained many long-read isoforms in E. necatrix SZ and MZ-2, from which a series of lncRNAs, AS events, APA events and fusion transcripts were identified. Information on TFs will improve understanding of transcriptional regulation, and fusion event data will greatly improve draft versions of gene models in E. necatrix. This information offers insights into the mechanisms governing the development of E. necatrix and will aid in the development of novel strategies for coccidiosis control. Graphical Abstract


Sign in / Sign up

Export Citation Format

Share Document