Comprehensive Stress-Based De Novo Transcriptome Assembly and Annotation of Guar (Cyamopsis tetragonoloba (L.) Taub.): An Important Industrial and Forage Crop

The forage crop Guar (Cyamopsis tetragonoloba (L.) Taub.) has the ability to endure heat, drought, and mild salinity. A complete image on its genic architecture will promote our understanding about gene expression networks and different tolerance mechanisms at the molecular level. Therefore, whole mRNA sequence approach on the Guar plant was conducted to provide a snapshot of the mRNA information in the cell under salinity, heat, and drought stresses to be integrated with previous transcriptomic studies. RNA-Seq technology was employed to perform a 2×100 paired-end sequencing using an Illumina HiSeq 2500 platform for the transcriptome of leaves of C. tetragonoloba under normal, heat, drought, and salinity conditions. Trinity was used to achieve a de novo assembly followed by gene annotation, functional classification, metabolic pathway analysis, and identification of SSR markers. A total of 218.2 million paired-end raw reads (~44 Gbp) were generated. Of those, 193.5M paired-end reads of high quality were used to reconstruct a total of 161,058 transcripts (~266 Mbp) with N50 of 2552 bp and 61,508 putative genes. There were 6463 proteins having >90% full-length coverage against the Swiss-Prot database and 94% complete orthologs against Embryophyta. Approximately, 62.87% of transcripts were blasted, 50.46% mapped, and 43.50% annotated. A total of 4715 InterProScan families, 3441 domains, 74 repeats, and 490 sites were detected. Biological processes, molecular functions, and cellular components comprised 64.12%, 25.42%, and 10.4%, respectively. The transcriptome was associated with 985 enzymes and 156 KEGG pathways. A total of 27,066 SSRs were gained with an average frequency of one SSR/9.825 kb in the assembled transcripts. This resulting data will be helpful for the advanced analysis of Guar to multi-stress tolerance.

Download Full-text

De novo transcriptome assembly and analysis of the codon usage bias of the MADS-box gene family in Cymbidium kanran

Indian Journal of Genetics and Plant Breeding (The) ◽

10.31742/ijgpb.79.2.13 ◽

2019 ◽

Vol 79 (02) ◽

Author(s):

Boyun Yang ◽

Huolin Luo ◽

Yuan Tao ◽

Wenjing Yu ◽

Liping Luo

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

De Novo ◽

Average Length ◽

Transcriptome Assembly ◽

Mads Box ◽

Illumina Hiseq ◽

Optimal Codons ◽

Mads Box Gene ◽

New Varieties

Cymbidium kanran is an important commercially grown member of the Chinese orchid family. However, little information regarding the molecular biology of this species is available. In this study, the C. kanran root, shoot, stem, leaf, and flower transcriptomes were sequenced with the Illumina HiSeq 4000 system, which resulted in 8.9 Gb of clean reads that were assembled into 74,620 unigenes, with an average length and N50 of 983 bp and 1,640 bp, respectively. The screening of seven databases (NR, NT, GO, KOG, KEGG, Swiss-Prot, and InterPro) for similar sequences resulted in the functional annotation of 49,813 unigenes. Additionally, 173 MADS-box genes, which help to control major aspects of plant development, were identified and their codon usage bias was analyzed. Only 26 genes had a low ENC (less than or equal to 35), suggesting the codon usage bias was weak. Base mutations were the major determinants of codon usage, although natural selection pressure also influenced codon usage bias. Moreover, 22 optimal codons were identified based on ΔRSCU, and 20 codons ended with A/U. The results of this study provide the foundation for the molecular breeding of new varieties

Download Full-text

Gill Transcriptome Sequencing and De Novo Annotation of Acanthogobius ommaturus in Response to Salinity Stress

Genes ◽

10.3390/genes11060631 ◽

2020 ◽

Vol 11 (6) ◽

pp. 631

Author(s):

Zhicheng Sun ◽

Fangrui Lou ◽

Yuan Zhang ◽

Na Song

Keyword(s):

Signal Transduction ◽

Salinity Stress ◽

De Novo ◽

Enrichment Analysis ◽

Gill Tissue ◽

Control Group ◽

Rna Seq ◽

Illumina Hiseq ◽

Kegg Pathways ◽

Pathways Analysis

Acanthogobius ommaturus is a euryhaline fish widely distributed in coastal, bay and estuarine areas, showing a strong tolerance to salinity. In order to understand the mechanism of adaptation to salinity stress, RNA-seq was used to compare the transcriptome responses of Acanthogobius ommaturus to the changes of salinity. Four salinity gradients, 0 psu, 15 psu (control), 30 psu and 45 psu were set to conduct the experiment. In total, 131,225 unigenes were obtained from the gill tissue of A. ommaturus using the Illumina HiSeq 2000 platform (San Diego, USA). Compared with the gene expression profile of the control group, 572 differentially expressed genes (DEGs) were screened, with 150 at 0 psu, 170 at 30 psu, and 252 at 45 psu. Additionally, among these DEGs, Gene Ontology (GO) analysis indicated that binding, metabolic processes and cellular processes were significantly enriched. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis detected 3, 5 and 8 pathways related to signal transduction, metabolism, digestive and endocrine systems at 0 psu, 30 psu and 45 psu, respectively. Based on GO enrichment analysis and manual literature searches, the results of the present study indicated that A. ommaturus mainly responded to energy metabolism, ion transport and signal transduction to resist the damage caused by salinity stress. Eight DEGs were randomly selected for further validation by quantitative real-time PCR (qRT-PCR) and the results were consistent with the RNA-seq data.

Download Full-text

Identification of cordycepin biosynthesis-related genes through de novo transcriptome assembly and analysis in Cordyceps cicadae

Royal Society Open Science ◽

10.1098/rsos.181247 ◽

2018 ◽

Vol 5 (12) ◽

pp. 181247 ◽

Cited By ~ 3

Author(s):

Tengfei Liu ◽

Ziyao Liu ◽

Xueyan Yao ◽

Ying Huang ◽

Qingsong Qu ◽

...

Keyword(s):

Fruiting Body ◽

De Novo ◽

Transcriptome Assembly ◽

Quantitative Polymerase Chain Reaction ◽

Nucleotide Metabolism ◽

Differentially Expressed ◽

Parasitic Fungus ◽

Active Constituent ◽

Illumina Hiseq ◽

Cordyceps Cicadae

Cordyceps cicadae (Chanhua) is a parasitic fungus that grows on Cicada flammata larvae and is used to relieve exhaustion and treat numerous diseases, in part through its active constituent, cordycepin. We used de novo Illumina HiSeq 4000 sequencing to obtain transcriptomes of C. cicadae mycelium, fruiting body, and sclerotium, and identify differentially expressed genes. In the mycelium versus sclerotium libraries, 1576 upregulated and 2300 downregulated genes were identified. In the mycelium versus fruiting body and fruiting body versus sclerotium body libraries, 1604 and 1474 upregulated and 1365 and 1320 downregulated genes, respectively, were identified. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses identified 19 genes differentially expressed in mycelium versus fruiting body as related to the purine pathway, along with 28 and 16 genes differentially expressed in the mycelium versus sclerotium and fruiting body versus sclerotium groups, respectively. Gene expression of six key enzymes was validated by quantitative polymerase chain reaction. Specifically, 5′-nucleotidase (c62060g1) and adenosine deaminase (c35629g1) in purine nucleotide metabolism, which are involved in cordycepin biosynthesis, were significantly upregulated in the sclerotium group. These findings improved our understanding of genes involved in the biosynthesis of cordycepin and other characteristic secondary metabolites in C. cicadae .

Download Full-text

Characterization of the transcriptome and EST-SSR development in Boea clarkeana, a desiccation-tolerant plant endemic to China

10.7287/peerj.preprints.2603v1 ◽

2016 ◽

Author(s):

Ying Wang ◽

Kun Liu ◽

De Bi ◽

Biao Shou Zhou ◽

Wen Jian Shao

Keyword(s):

De Novo ◽

Gene Annotation ◽

Sequence Similarity ◽

Molecular Study ◽

Sequence Information ◽

Sequencing Data ◽

Protein Database ◽

Illumina Hiseq ◽

Significant Similarity ◽

Assembly Technology

Background. Resurrection plants constitute a unique cadre within angiosperms. Boea clarkeana Hemsl. (Boea, Gesneriaceae) is a desiccation-tolerant dicotyledonous herb that is endemic to China. Although research on angiosperms with DT could be instructive for crops, genomic resources for B. clarkeana remain scarce. In addition, transcriptome sequencing could be an effective way to study desiccation-tolerant plants. Methods. In the present study, we used the platform Illumina HiSeqTM 2000 and de novo assembly technology to obtain leaf transcriptomes of B. clarkeana and conducted a BLASTX alignment of the sequencing data and protein databases for sequence classification and annotation. Then, based on the sequence information obtained, we developed EST-SSR markers by means of EST-SSR mining, primer design and polymorphism identification. Results. A total of 91,449 unigenes were generated from the leaf cDNA library of B. clarkeana in this study. Based on a sequence similarity search with a known protein database, 72,087 unigenes were annotated. Among the annotated unigenes, a total of 71,170 unigenes showed significant similarity to known proteins of 463 popular model species in the Nr database, and 59,962 unigenes and 32,336 unigenes were assigned to GO classifications and COG, respectively. In addition, 44,924 unigenes were mapped in 128 KEGG pathways. Furthermore, a total of 7,610 unigenes with 8,563 microsatellites were found. Seventy-four primer pairs were selected from 436 primer pairs designed for polymorphism validation. SSRs with higher polymorphism rates were concentrated on dinucleotides, pentanucleotides and hexanucleotides. Finally, 17 pairs with highly polymorphic and stable loci were selected for polymorphism screening. There were a total of 65 alleles, with 2–6 alleles at each locus. Mainly due to the unique biological characteristics of plants, the HE, HO and PIC per locus were very low, ranging from 0 to 0.196, 0.082 to 0.14 and 0 to 0.155, respectively. Discussion. A substantial fraction transcriptome sequences of B. clarkeana were generated in this study, which is the first molecular-level analysis of this plant. These sequences are valuable resources for gene annotation and discovery and molecular marker development. These sequences could also provide a valuable basis for the future molecular study of B. clarkeana.

Download Full-text

Proteotranscriptomics assisted gene annotation and spatial proteomics of Bombyx mori BmN4 cell line

10.21203/rs.3.rs-23159/v2 ◽

2020 ◽

Author(s):

Michal Levin ◽

Marion Scheibe ◽

Falk Butter

Keyword(s):

Mass Spectrometry ◽

Bombyx Mori ◽

Cell Line ◽

De Novo ◽

High Resolution Mass Spectrometry ◽

Gene Annotation ◽

Transcriptome Assembly ◽

Model Organisms ◽

Sequence Information ◽

A Genome

Abstract BackgroundThe process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-Seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. ResultsCombining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. ConclusionsWe show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.

Download Full-text

Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists

Scientific Reports ◽

10.1038/s41598-020-75270-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

D. N. U. Naranpanawa ◽

C. H. W. M. R. B. Chandrasekara ◽

P. C. G. Bandaranayake ◽

A. U. Bandaranayake

Keyword(s):

De Novo ◽

Sequence Data ◽

Transcriptome Assembly ◽

Low Cost ◽

Santalum Album ◽

Sequencing Data ◽

Illumina Hiseq ◽

Tissue Samples ◽

Downstream Analysis ◽

Bioinformatics Workflow

Abstract Recent advances in next-generation sequencing technologies have paved the path for a considerable amount of sequencing data at a relatively low cost. This has revolutionized the genomics and transcriptomics studies. However, different challenges are now created in handling such data with available bioinformatics platforms both in assembly and downstream analysis performed in order to infer correct biological meaning. Though there are a handful of commercial software and tools for some of the procedures, cost of such tools has made them prohibitive for most research laboratories. While individual open-source or free software tools are available for most of the bioinformatics applications, those components usually operate standalone and are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures starting from raw sequence data too complicated and time-consuming with the associated learning-curve. Here, we outline a procedure for de novo transcriptome assembly and Simple Sequence Repeats (SSR) primer design solely based on tools that are available online for free use. For validation of the developed workflow, we used Illumina HiSeq reads of different tissue samples of Santalum album (sandalwood), generated from a previous transcriptomics project. A portion of the designed primers were tested in the lab with relevant samples and all of them successfully amplified the targeted regions. The presented bioinformatics workflow can accurately assemble quality transcriptomes and develop gene specific SSRs. Beginner biologists and researchers in bioinformatics can easily utilize this workflow for research purposes.

Download Full-text

De novo transcriptome assembly of Premnotrypes vorax (Coleoptera: Curculionidae)

10.21203/rs.2.21229/v1 ◽

2020 ◽

Author(s):

Luisa-Fernanda Velásquez C. ◽

Pablo Emiliano Canton ◽

Alejandro Sánchez-Flores ◽

Alejandra Bravo ◽

Jairo Cerón

Keyword(s):

Biological Control ◽

Control Strategy ◽

De Novo ◽

Transcriptome Assembly ◽

Insect Pest ◽

Illumina Hiseq ◽

Tissue Samples ◽

Cry Toxin ◽

Illumina Hiseq Platform ◽

Potato Crops

Abstract Objective: Premnotrypes vorax (P. vorax) is an insect pest that causes significant losses to potato crops in Colombia. Currently, the insect control is mainly done by using highly toxic chemical insecticides and there are no reports of any commercial biological control strategy against this pest. Hence, the objective of this study was to characterize the insect genetic expression to search for genes that could codify for Bacillus thuringiensis Cry toxin receptors. Using an RNA-seq approach, we sequenced the mRNA from the insect tissue, performed a de novo assembly and analyzed the reconstructed transcriptome of P. vorax. To our knowledge, this is the first genetic report of this endemic insect which will set the basis of a possible biological control strategy.Results: The transcriptome data was obtained from dissected midgut tissue samples of P. vorax larvae. The isolated RNA was isolated and sequenced using the Illumina HiSeq platform with a configuration of 2x150pb reads. A total of 383,552,246 reads were obtained and subsequently a quality and cleaning process was performed through FastQC and Trimmomatic software, respectively. A novo assembly was done using the Trinity software, obtaining a transcriptome assembly with 25,631 genes that showed at least one annotation record, resulting in 74,984 transcript isoforms.

Download Full-text

De novo Transcriptome Assembly of Myllocerinus aurolineatus Voss in Tea Plants

Frontiers in Sustainable Food Systems ◽

10.3389/fsufs.2021.631990 ◽

2021 ◽

Vol 5 ◽

Author(s):

Xin Xie ◽

Junmei Jiang ◽

Meiqing Chen ◽

Maoxi Huang ◽

Linhong Jin ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Nucleotide Polymorphisms ◽

Illumina Hiseq ◽

Single Nucleotide ◽

Functional Studies ◽

Myllocerinus Aurolineatus ◽

Tea Plants ◽

The Trinity ◽

Simple Sequence

Myllocerinus aurolineatus Voss is a species of the insecta class in the arthropod. In this study, we first observed and identified M. aurolineatus Voss in tea plants in Guizhou, China, where it caused severe quantity and quality losses in tea plants. Knowledge on M. aurolineatus Voss genome is inadequate, especially for biological or functional research. We performed the first transcriptome sequencing by using the Illumina Hiseq™ technique on M. aurolineatus Voss. Over 55.9 million high-quality paired-end reads were generated and assembled into 69,439 unigenes using the Trinity short read software, resulting in a cluster of 1,207 bp of the N50 length. A total of 69,439 genes were predicted by BLAST to known proteins in the NCBI database and were distributed into Gene Ontology (20,190), eukaryotic complete genomes (12,488), and the Kyoto Encyclopedia of Genes and Genomes (3,170). We also identified 96,790 single-nucleotide polymorphisms and 13,121 simple sequence repeats in these unigenes. Our transcriptome data provide a useful resource for future functional studies of M. aurolineatus Voss for dispersal control in tea plants.

Download Full-text

A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

10.1101/156406 ◽

2017 ◽

Cited By ~ 2

Author(s):

Mickael Orgeur ◽

Marvin Martens ◽

Stefan T. Börno ◽

Bernd Timmermann ◽

Delphine Duprez ◽

...

Keyword(s):

Genome Sequence ◽

De Novo ◽

Gene Annotation ◽

Transcriptome Assembly ◽

Draft Genome ◽

Transcript Abundance ◽

Accurate Estimation ◽

Rna Seq ◽

A Genome ◽

Transcript Discovery

AbstractThe sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads and the gene annotation that defines gene features must also be taken into account. Partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and ade novotranscriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

Download Full-text