scholarly journals Transcriptome analysis of colored calla lily (Zantedeschia rehmanniiEngl.) by Illumina sequencing:de novoassembly, annotation and EST-SSR marker development

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2378 ◽  
Author(s):  
Zunzheng Wei ◽  
Zhenzhen Sun ◽  
Binbin Cui ◽  
Qixiang Zhang ◽  
Min Xiong ◽  
...  

Colored calla lily is the short name for the species or hybrids in sectionAestivaeof genusZantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here,de novotranscriptome sequencing was performed to produce large transcript sequences forZ. rehmanniicv. ‘Rehmannii’ using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genusZantedeschia.

2016 ◽  
Author(s):  
Ying Wang ◽  
Kun Liu ◽  
De Bi ◽  
Biao Shou Zhou ◽  
Wen Jian Shao

Background. Resurrection plants constitute a unique cadre within angiosperms. Boea clarkeana Hemsl. (Boea, Gesneriaceae) is a desiccation-tolerant dicotyledonous herb that is endemic to China. Although research on angiosperms with DT could be instructive for crops, genomic resources for B. clarkeana remain scarce. In addition, transcriptome sequencing could be an effective way to study desiccation-tolerant plants. Methods. In the present study, we used the platform Illumina HiSeqTM 2000 and de novo assembly technology to obtain leaf transcriptomes of B. clarkeana and conducted a BLASTX alignment of the sequencing data and protein databases for sequence classification and annotation. Then, based on the sequence information obtained, we developed EST-SSR markers by means of EST-SSR mining, primer design and polymorphism identification. Results. A total of 91,449 unigenes were generated from the leaf cDNA library of B. clarkeana in this study. Based on a sequence similarity search with a known protein database, 72,087 unigenes were annotated. Among the annotated unigenes, a total of 71,170 unigenes showed significant similarity to known proteins of 463 popular model species in the Nr database, and 59,962 unigenes and 32,336 unigenes were assigned to GO classifications and COG, respectively. In addition, 44,924 unigenes were mapped in 128 KEGG pathways. Furthermore, a total of 7,610 unigenes with 8,563 microsatellites were found. Seventy-four primer pairs were selected from 436 primer pairs designed for polymorphism validation. SSRs with higher polymorphism rates were concentrated on dinucleotides, pentanucleotides and hexanucleotides. Finally, 17 pairs with highly polymorphic and stable loci were selected for polymorphism screening. There were a total of 65 alleles, with 2–6 alleles at each locus. Mainly due to the unique biological characteristics of plants, the HE, HO and PIC per locus were very low, ranging from 0 to 0.196, 0.082 to 0.14 and 0 to 0.155, respectively. Discussion. A substantial fraction transcriptome sequences of B. clarkeana were generated in this study, which is the first molecular-level analysis of this plant. These sequences are valuable resources for gene annotation and discovery and molecular marker development. These sequences could also provide a valuable basis for the future molecular study of B. clarkeana.


2016 ◽  
Author(s):  
Ying Wang ◽  
Kun Liu ◽  
De Bi ◽  
Biao Shou Zhou ◽  
Wen Jian Shao

Background. Resurrection plants constitute a unique cadre within angiosperms. Boea clarkeana Hemsl. (Boea, Gesneriaceae) is a desiccation-tolerant dicotyledonous herb that is endemic to China. Although research on angiosperms with DT could be instructive for crops, genomic resources for B. clarkeana remain scarce. In addition, transcriptome sequencing could be an effective way to study desiccation-tolerant plants. Methods. In the present study, we used the platform Illumina HiSeqTM 2000 and de novo assembly technology to obtain leaf transcriptomes of B. clarkeana and conducted a BLASTX alignment of the sequencing data and protein databases for sequence classification and annotation. Then, based on the sequence information obtained, we developed EST-SSR markers by means of EST-SSR mining, primer design and polymorphism identification. Results. A total of 91,449 unigenes were generated from the leaf cDNA library of B. clarkeana in this study. Based on a sequence similarity search with a known protein database, 72,087 unigenes were annotated. Among the annotated unigenes, a total of 71,170 unigenes showed significant similarity to known proteins of 463 popular model species in the Nr database, and 59,962 unigenes and 32,336 unigenes were assigned to GO classifications and COG, respectively. In addition, 44,924 unigenes were mapped in 128 KEGG pathways. Furthermore, a total of 7,610 unigenes with 8,563 microsatellites were found. Seventy-four primer pairs were selected from 436 primer pairs designed for polymorphism validation. SSRs with higher polymorphism rates were concentrated on dinucleotides, pentanucleotides and hexanucleotides. Finally, 17 pairs with highly polymorphic and stable loci were selected for polymorphism screening. There were a total of 65 alleles, with 2–6 alleles at each locus. Mainly due to the unique biological characteristics of plants, the HE, HO and PIC per locus were very low, ranging from 0 to 0.196, 0.082 to 0.14 and 0 to 0.155, respectively. Discussion. A substantial fraction transcriptome sequences of B. clarkeana were generated in this study, which is the first molecular-level analysis of this plant. These sequences are valuable resources for gene annotation and discovery and molecular marker development. These sequences could also provide a valuable basis for the future molecular study of B. clarkeana.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3422 ◽  
Author(s):  
Ying Wang ◽  
Kun Liu ◽  
De Bi ◽  
Shoubiao Zhou ◽  
Jianwen Shao

Background Desiccation-tolerant (DT) plants can recover full metabolic competence upon rehydration after losing most of their cellular water (>95%) for extended periods of time. Functional genomic approaches such as transcriptome sequencing can help us understand how DT plants survive and respond to dehydration, which has great significance for plant biology and improving the drought tolerance of crops. Boea clarkeana Hemsl. (Gesneriaceae) is a DT dicotyledonous herb. Its genomic sequences characteristics remain unknown. Based on transcriptomic analyses, polymorphic EST-SSR (simple sequence repeats in expressed sequence tags) molecular primers can be designed, which will greatly facilitate further investigations of the population genetics and demographic histories of DT plants. Methods In the present study, we used the platform Illumina HiSeq™2000 and de novo assembly technology to obtain leaf transcriptomes of B. clarkeana and conducted a BLASTX alignment of the sequencing data and protein databases for sequence classification and annotation. Then, based on the sequence information, the EST-SSR markers were developed, and the functional annotation of ESTs containing polymorphic SSRs were obtained through BLASTX. Results A total of 91,449 unigenes were generated from the leaf cDNA library of B. clarkeana. Based on a sequence similarity search with a known protein database, 72,087 unigenes were annotated. Among the annotated unigenes, a total of 71,170 unigenes showed significant similarity to the known proteins of 463 popular model species in the Nr database, and 59,962 unigenes and 32,336 unigenes were assigned to Gene Ontology (GO) classifications and Cluster of Orthologous Groups (COG), respectively. In addition, 44,924 unigenes were mapped in 128 KEGG pathways. Furthermore, a total of 7,610 unigenes with 8,563 microsatellites were found. Seventy-four primer pairs were selected from 436 primer pairs designed for polymorphism validation. SSRs with higher polymorphism rates were concentrated on dinucleotides, pentanucleotides and hexanucleotides. Finally, 17 pairs with stable, highly polymorphic loci were selected for polymorphism screening. There was a total of 65 alleles, with 2–6 alleles at each locus. Primarily due to the unique biological characteristics of plants, the HE (0–0.196), HO (0.082–0.14) and PIC (0–0.155) per locus were very low. The functional annotation distribution centered on ESTs containing di- and tri-nucleotide SSRs, and the ESTs containing primers BC2, BC4 and BC12 were annotated to vegetative dehydration/desiccation pathways. Discussion This work is the first genetic study of B. clarkeana as a new plant resource of DT genes. A substantial number of transcriptome sequences were generated in this study. These sequences are valuable resources for gene annotation and discovery as well as molecular marker development. These sequences could also provide a valuable basis for future molecular studies of B. clarkeana.


Horticulturae ◽  
2021 ◽  
Vol 7 (11) ◽  
pp. 431
Author(s):  
Juan Pacheco ◽  
Santiago Vilanova ◽  
Rubén Grillo-Risco ◽  
Francisco Garcia-Garcia ◽  
Jaime Prohens ◽  
...  

The tree tomato (Solanum betaceum Cav.) is an underutilized fruit crop native to the Andean region and phylogenetically related to the tomato and potato. Tree tomato fruits have a high amount of nutrients and bioactive compounds. However, so far there are no studies at the genome or transcriptome level for this species. We performed a de novo assembly and transcriptome annotation for purple-fruited (A21) and an orange-fruited (A23) accessions. A total of 174,252 (A21) and 194,417 (A23) transcripts were assembled with an average length of 851 and 849 bp. A total of 34,636 (A21) and 36,224 (A23) transcripts showed a significant similarity to known proteins. Among the annotated unigenes, 22,096 (A21) and 23,095 (A23) were assigned to the Gene Ontology (GO) term and 14,035 (A21) and 14,540 (A23) were found to have Clusters of Orthologous Group (COG) term classifications. Furthermore, 22,096 (A21) and 23,095 (A23) transcripts were assigned to 155 and 161 (A23) KEGG pathways. The carotenoid biosynthetic process GO terms were significantly enriched in the purple-fruited accession A21. Finally, 68,647 intraspecific single-nucleotide variations (SNVs) and almost 2 million interspecific SNVs were identified. The results of this study provide a wealth of genomic data for the genetic improvement of the tree tomato.


2014 ◽  
Vol 12 (S1) ◽  
pp. S83-S86 ◽  
Author(s):  
Yul-Kyun Ahn ◽  
Swati Tripathi ◽  
Young-Il Cho ◽  
Jeong-Ho Kim ◽  
Hye-Eun Lee ◽  
...  

Next-generation sequencing technique has been known as a useful tool for de novo transcriptome assembly, functional annotation of genes and identification of molecular markers. This study was carried out to mine molecular markers from de novo assembled transcriptomes of four chilli pepper varieties, the highly pungent ‘Saengryeg 211’ and non-pungent ‘Saengryeg 213’ and variably pigmented ‘Mandarin’ and ‘Blackcluster’. Pyrosequencing of the complementary DNA library resulted in 361,671, 274,269, 279,221, and 316,357 raw reads, which were assembled in 23,607, 19,894, 18,340 and 20,357 contigs, for the four varieties, respectively. Detailed sequence variant analysis identified numerous potential single-nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) for all the varieties for which the primers were designed. The transcriptome information and SNP/SSR markers generated in this study provide valuable resources for high-density molecular genetic mapping in chilli pepper and Quantitative trait loci analysis related to fruit qualities. These markers for pepper will be highly valuable for marker-assisted breeding and other genetic studies.


Author(s):  
Boyun Yang ◽  
Huolin Luo ◽  
Yuan Tao ◽  
Wenjing Yu ◽  
Liping Luo

Cymbidium kanran is an important commercially grown member of the Chinese orchid family. However, little information regarding the molecular biology of this species is available. In this study, the C. kanran root, shoot, stem, leaf, and flower transcriptomes were sequenced with the Illumina HiSeq 4000 system, which resulted in 8.9 Gb of clean reads that were assembled into 74,620 unigenes, with an average length and N50 of 983 bp and 1,640 bp, respectively. The screening of seven databases (NR, NT, GO, KOG, KEGG, Swiss-Prot, and InterPro) for similar sequences resulted in the functional annotation of 49,813 unigenes. Additionally, 173 MADS-box genes, which help to control major aspects of plant development, were identified and their codon usage bias was analyzed. Only 26 genes had a low ENC (less than or equal to 35), suggesting the codon usage bias was weak. Base mutations were the major determinants of codon usage, although natural selection pressure also influenced codon usage bias. Moreover, 22 optimal codons were identified based on ΔRSCU, and 20 codons ended with A/U. The results of this study provide the foundation for the molecular breeding of new varieties


Insects ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 101
Author(s):  
Miao Wang ◽  
Hanyu Li ◽  
Huoqing Zheng ◽  
Liuwei Zhao ◽  
Xiaofeng Xue ◽  
...  

The invasion of Vespa velutina presents a great threat to the agriculture economy, the ecological environment, and human health. An effective strategy for this hornet control is urgently required, but the limited genome information of Vespa velutina restricts the application of molecular-genomic tools for targeted hornet management. Therefore, we conducted large-scale transcriptome profiling of the hornet brain to obtain functional target genes and molecular markers. Using an Illumina HiSeq platform, more than 41 million clean reads were obtained and de novo assembled into 182,087 meaningful unigenes. A total of 56,400 unigenes were annotated against publicly available protein sequence databases and a set of reliable Simple Sequence Repeats (SSRs) and Single Nucleotide Polymorphisms (SNP) markers were developed. The homologous genes encoding crucial behavior regulation factors, odorant binding proteins (OBPs), and vitellogenin, were also identified from highly expressed transcripts. This study provides abundant molecular targets and markers for invasive hornet control and further promotes the genetic and molecular study of Vespa velutina.


2018 ◽  
Vol 54 (No. 1) ◽  
pp. 17-25 ◽  
Author(s):  
D.-D. Vu ◽  
T.T.-X. Bui ◽  
T.H.-N. Nguyen ◽  
S.N.M. Shah ◽  
N.-H. Vu ◽  
...  

A total 20 074 230 sequencing reads were generated by Illumina HiSeq<sup>™ </sup>2500 from three different Toxicodendron vernicifluum tissue samples. In total, 48 693 unigenes with an average length of 703.34 bp were obtained by de novo assembly. 3392 potential EST-SSRs (expressed sequence tag-simple sequence repeat) were identified as potential molecular markers from unigenes with lengths exceeding 1 kb. A total of 80 pairs of PCR primers were randomly selected to validate the assembly quality and develop EST-SSR markers from genomic DNA. Of these primer pairs, 14 primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism within the lacquer tree population in Langao, Shaanxi province, China. There were high genetic diversities (number of alleles per locus (A) = 2.93, polymorphic information content (PIC) = 0.53, observed heterozygosity (Ho) = 0.62 and expected heterozygosity (He) = 0.85) in the lacquer tree natural population. The four loci were significantly deviated from Hardy-Weinberg equilibrium. These results suggested high homozygosity in the population and low or deficiency in heterozygosity (inbreeding coefficient (Fis) = 0.27). These polymorphic EST-SSR markers will provide the base for further studies of genetic structure and breeding in T. vernicifluum.


Genome ◽  
2014 ◽  
Vol 57 (9) ◽  
pp. 499-506 ◽  
Author(s):  
Periyasamy Vijayakumar ◽  
Ashwin Ashok Raut ◽  
Pushpendra Kumar ◽  
Deepak Sharma ◽  
Anamika Mishra

The jungle crow (Corvus macrorhynchos) belongs to the order Passeriformes of bird species and is important for avian ecological and evolutionary genetics studies. However, there is limited information on the transcriptome data of this species. In the present study, we report the characterization of the lung transcriptome of the jungle crow using GS FLX Titanium XLR70. Altogether, 1 510 303 high-quality sequence reads with 581 198 230 bases was de novo assembled into 22 169 isotigs (isotig represents an individual transcript) and 784 009 singletons. Using these isotigs and 581 681 length-filtered (greater than 300 bp) singletons, 20 010 unique protein-coding genes were identified by BLASTx comparison against a nonredundant (nr) protein sequence database. Comparative analysis revealed that 46 604 (70.29%) and 51 642 (72.48%) of the assembled transcripts have significant similarity to zebra finch and chicken RefSeq proteins, respectively. As determined by GO annotation and KEGG pathway mapping, functional annotation of the unigenes recovered diverse biological functions and processes. Transcripts putatively involved in the immune response were identified. Furthermore, 20 599 single nucleotide polymorphisms (SNPs) and 7525 simple sequence repeats (SSRs) were retrieved from the assembled transcript database. This resource should lay an important base for future ecological, evolutionary, and conservation genetic studies on this species and in other related species.


2020 ◽  
Author(s):  
Duy Dinh Vu ◽  
Syed Noor Muhammad Shah ◽  
Mai Phuong Pham ◽  
Van Thang Bui ◽  
Minh Tam Nguyen ◽  
...  

Abstract Background: Understanding the genetic diversity in endangered species that occur in forest remnants is necessary to establish efficient strategies for the species conservation, restoration and management. Panax vietnamensis Ha et Grushv. is medicinally important, endemic and endangered species of Vietnam. However, genetic diversity and structure of population are unknown due to lack of efficient molecular markers. Results: In this study, we employed Illumina HiSeqTM 4000 sequencing to analyze the transcriptomes of P. vietnamensis (roots, leaves and stems). Raw reads total of 23,741,783 was obtained and then assembled, from which the generated unigenes were 89,271 (average length = 598.3191 nt). The 31,686 unigenes were annotated in different databases i.e. Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, Nucleotide Collection (NR/NT) and Swiss-Prot for functional annotation. Further, 11,343 EST-SSRs were detected. From 7,774 primer pairs, 101 were selected for polymorphism validation, in which; 20 primer pairs were successfully amplified to DNA fragments and significant amounts of polymorphism was observed within population. The nine polymorphic microsatellite loci were used for population structure and diversity analyses. The obtained results revealed high levels of genetic diversity in populations, the average observed and expected heterozygosity were HO = 0.422 and HE = 0.479, respectively. During the Bottleneck analysis using TPM and SMM models (p < 0.01) shows that targeted population is significantly heterozygote deficient. This suggests sign of the bottleneck in all populations. Genetic differentiation between populations was moderate (FST = 0.133) and indicating slightly high level of gene flow (Nm = 1.63). Analysis of molecular variance (AMOVA) showed 63.17% of variation within individuals and 12.45% among populations. Our results shows two genetic clusters related to geographical distances. Conclusion: Our study will assist conservators in future conservation management, breeding, production and habitats restoration of the species.


Sign in / Sign up

Export Citation Format

Share Document