De novo assembly and annotation of the eastern fence lizard (Sceloporus undulatus) transcriptome

Mapping Intimacies ◽

10.1101/136069 ◽

2017 ◽

Author(s):

Mariana B. Grizante ◽

Marc Tollis ◽

Juan J. Rodriguez ◽

Ofir Levy ◽

Michael J. Angilletta ◽

...

Keyword(s):

Complex Traits ◽

De Novo ◽

Transcriptome Assembly ◽

Single Copy ◽

Genomic Research ◽

Sceloporus Undulatus ◽

Protein Coding ◽

Average Contig Length ◽

Green Anole Lizard ◽

Eastern Fence Lizard

AbstractBackgroundThe eastern fence lizard (Sceloporus undulatus) has been a model species for ecological and evolutionary research. Genomic and transcriptomic resources for this species would promote investigation of genetic mechanisms that underpin plastic responses to environmental stress, such as climate warming. Moreover, such resources would aid comparative studies of complex traits at the molecular level, such as the transition from oviparous to viviparous reproduction, which happened at least four times within Sceloporus.FindingsA de novo transcriptome assembly for Sceloporus undulatus, Sund_v1.0, was generated using over 179 million Illumina reads obtained from three tissues (whole brain, skeletal muscle, and embryo) as well as previously reported liver sequences. The Sund_v1.0 assembly had an average contig length of 782 nucleotides and an E90N50 statistic of 2,550 nucleotides. Comparing S. undulatus transcripts with the benchmarking universal single-copy orthologs (BUSCO) for tetrapod species yielded 97.2% gene representation. A total of 13,422 protein-coding orthologs were identified in comparison to the genome of the green anole lizard, Anolis carolinensis, which is the closest related species with genomic data available.ConclusionsThe multi-tissue transcriptome of S. undulatus is the first for a member of the family Phrynosomatidae, offering an important resource to advance studies of adaptation in this species and genomic research in reptiles.

Genomic Analysis of Sarcomyxa edulis Reveals the Basis of Its Medicinal Properties and Evolutionary Relationships

Frontiers in Microbiology ◽

10.3389/fmicb.2021.652324 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fenghua Tian ◽

Changtian Li ◽

Yu Li

Keyword(s):

Single Molecule ◽

De Novo ◽

Genomic Analysis ◽

Single Copy ◽

Whole Genome Sequence ◽

Type I ◽

Whole Genome ◽

Uridine Diphosphate ◽

Protein Coding ◽

Medicinal Value

Yuanmo [Sarcomyxa edulis (Y.C. Dai, Niemelä & G.F. Qin) T. Saito, Tonouchi & T. Harada] is an important edible and medicinal mushroom endemic to Northeastern China. Here we report the de novo sequencing and assembly of the S. edulis genome using single-molecule real-time sequencing technology. The whole genome was approximately 35.65 Mb, with a G + C content of 48.31%. Genome assembly generated 41 contigs with an N50 length of 1,772,559 bp. The genome comprised 9,364 annotated protein-coding genes, many of which encoded enzymes involved in the modification, biosynthesis, and degradation of glycoconjugates and carbohydrates or enzymes predicted to be involved in the biosynthesis of secondary metabolites such as terpene, type I polyketide, siderophore, and fatty acids, which are responsible for the pharmacodynamic activities of S. edulis. We also identified genes encoding 1,3-β-glucan synthase and endo-1,3(4)-β-glucanase, which are involved in polysaccharide and uridine diphosphate glucose biosynthesis. Phylogenetic and comparative analyses of Basidiomycota fungi based on a single-copy orthologous protein indicated that the Sarcomyxa genus is an independent group that evolved from the Pleurotaceae family. The annotated whole-genome sequence of S. edulis can serve as a reference for investigations of bioactive compounds with medicinal value and the development and commercial production of superior S. edulis varieties.

An Improved Human smORF Annnotation Workflow Combining De Novo Transcriptome Assembly and Ribo-Seq

10.1101/523860 ◽

2019 ◽

Author(s):

Thomas F. Martinez ◽

Qian Chu ◽

Cynthia Donaldson ◽

Dan Tan ◽

Maxim N. Shokhirev ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Human Leukocyte ◽

Open Reading Frames ◽

Translation Efficiency ◽

Proteomics Data ◽

Protein Coding ◽

Leukocyte Antigen ◽

Human Genes ◽

Small Open Reading Frames

Protein-coding small open reading frames (smORFs) are emerging as an important class of genes, however, the coding capacity of smORFs in the human genome is unclear. By integrating de novo transcriptome assembly and Ribo-Seq, we confidently annotate thousands of novel translated smORFs in three human cell lines. We find that smORF translation prediction is noisier than for annotated coding sequences, underscoring the importance of analyzing multiple experiments and footprinting conditions. These smORFs are located within non-coding and antisense transcripts, the UTRs of mRNAs, and unannotated transcripts. Analysis of RNA levels and translation efficiency during cellular stress identifies regulated smORFs, providing an approach to select smORFs for further investigation. Sequence conservation and signatures of positive selection indicate that encoded microproteins are likely functional. Additionally, proteomics data from enriched human leukocyte antigen complexes validates the translation of hundreds of smORFs and positions them as a source of novel antigens. Thus, smORFs represent a significant number of important, yet unexplored human genes.

The complete chloroplast genome of Saxifraga sinomontana (Saxifragaceae) and comparative analysis with other Saxifragaceae species

Revista Brasileira de Botânica ◽

10.1007/s40415-019-00561-y ◽

2019 ◽

Vol 42 (4) ◽

pp. 601-611 ◽

Cited By ~ 1

Author(s):

Yan Li ◽

Liukun Jia ◽

Zhihua Wang ◽

Rui Xing ◽

Xiaofeng Chi ◽

...

Keyword(s):

Comparative Analysis ◽

Chloroplast Genome ◽

Phylogenetic Relationships ◽

De Novo ◽

Single Copy ◽

Bootstrap Support ◽

Protein Coding ◽

Complete Chloroplast Genome ◽

Protein Coding Genes ◽

Chloroplast Genomes

Abstract Saxifraga sinomontana J.-T. Pan & Gornall belongs to Saxifraga sect. Ciliatae subsect. Hirculoideae, a lineage containing ca. 110 species whose phylogenetic relationships are largely unresolved due to recent rapid radiations. Analyses of complete chloroplast genomes have the potential to significantly improve the resolution of phylogenetic relationships in this young plant lineage. The complete chloroplast genome of S. sinomontana was de novo sequenced, assembled and then compared with that of other six Saxifragaceae species. The S. sinomontana chloroplast genome is 147,240 bp in length with a typical quadripartite structure, including a large single-copy region of 79,310 bp and a small single-copy region of 16,874 bp separated by a pair of inverted repeats (IRs) of 25,528 bp each. The chloroplast genome contains 113 unique genes, including 79 protein-coding genes, four rRNAs and 30 tRNAs, with 18 duplicates in the IRs. The gene content and organization are similar to other Saxifragaceae chloroplast genomes. Sixty-one simple sequence repeats were identified in the S. sinomontana chloroplast genome, mostly represented by mononucleotide repeats of polyadenine or polythymine. Comparative analysis revealed 12 highly divergent regions in the intergenic spacers, as well as coding genes of matK, ndhK, accD, cemA, rpoA, rps19, ndhF, ccsA, ndhD and ycf1. Phylogenetic reconstruction of seven Saxifragaceae species based on 66 protein-coding genes received high bootstrap support values for nearly all identified nodes, suggesting a promising opportunity to resolve infrasectional relationships of the most species-rich section Ciliatae of Saxifraga.

Construction of a reference transcriptome for the analysis of male sterility in sugi (Cryptomeria japonica D. Don) focusing on MALE STERILITY 1 (MS1)

PLoS ONE ◽

10.1371/journal.pone.0247180 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247180

Author(s):

Fu-Jin Wei ◽

Saneyoshi Ueno ◽

Tokuko Ujino-Ihara ◽

Maki Saito ◽

Yoshihiko Tsumura ◽

...

Keyword(s):

Male Sterility ◽

Cryptomeria Japonica ◽

Evolutionary Biology ◽

De Novo ◽

Transcriptome Assembly ◽

Single Copy ◽

Reference Transcriptome ◽

Three Stages ◽

Significant Expression ◽

Short Timeframe

Sugi (Cryptomeria japonica D. Don) is an important conifer used for afforestation in Japan. As the genome of this species is 11 Gbps, it is too large to assemble within a short timeframe. Transcriptomics is one approach that can address this deficiency. Here we designed a workflow consisting of three stages to de novo assemble transcriptome using Oases and Trinity. The three transcriptomic stage used were independent assembly, automatic and semi-manual integration, and refinement by filtering out potential contamination. We identified a set of 49,795 cDNA and an equal number of translated proteins. According to the benchmark set by BUSCO, 87.01% of cDNAs identified were complete genes, and 78.47% were complete and single-copy genes. Compared to other full-length cDNA resources collected by Sanger and PacBio sequencers, the extent of the coverage in our dataset was the highest, indicating that these data can be safely used for further studies. When two tissue-specific libraries were compared, there were significant expression differences between male strobili and leaf and bark sets. Moreover, subtle expression difference between male-fertile and sterile libraries were detected. Orthologous genes from other model plants and conifer species were identified. We demonstrated that our transcriptome assembly output (CJ3006NRE) can serve as a reference transcriptome for future functional genomics and evolutionary biology studies.

Transcriptome assembly and annotation of johnsongrass (Sorghum halepense) rhizomes identifies candidate rhizome-specific genes

10.1101/243956 ◽

2018 ◽

Author(s):

Nathan Ryder ◽

Kevin M. Dorn ◽

Mark Huitsing ◽

Micah Adams ◽

Jeff Ploegstra ◽

...

Keyword(s):

De Novo ◽

Tissue Sample ◽

Transcriptome Assembly ◽

Perennial Grass ◽

Perennial Grasses ◽

Sorghum Halepense ◽

Protein Coding ◽

Grain Crop ◽

Perennial Grain ◽

Rhizome Development

AbstractRhizomes facilitate the wintering and vegetative propagation of many perennial grasses. Sorghum halepense (johnsongrass) is an aggressive perennial grass that relies on a robust rhizome system to persist through winters and reproduce asexually from its rootstock nodes. This study aimed to sequence and assemble expressed transcripts within the johnsongrass rhizome. A de novo transcriptome assembly was generated from a single johnsongrass rhizome meristem tissue sample. A total of 141,176 probable protein-coding sequences from the assembly were identified and assigned gene ontology terms using Blast2GO. The johnsongrass assembly was compared to Sorghum bicolor, a related non-rhizomatous species, along with an assembly of similar rhizome tissue from the perennial grain crop Thinopyrum intermedium. The presence/absence analysis yielded a set of 259 johnsongrass contigs that are likely associated with rhizome development.

De novo assembly and annotation of Asiatic lion (Panthera leo persica) genome

10.1101/549790 ◽

2019 ◽

Cited By ~ 2

Author(s):

Siuli Mitra ◽

Ara Sreenivas ◽

Divya Tej Sowpati ◽

Amitha Sampat Kumar ◽

Gowri Awasthi ◽

...

Keyword(s):

De Novo ◽

Diversity Index ◽

Conservation Status ◽

Single Copy ◽

Genomic Diversity ◽

Segmental Duplications ◽

Sequence Coverage ◽

Protein Coding ◽

Asiatic Lion ◽

Specific Expansion

AbstractWe report the first draft of the whole genome assembly of a male Asiatic lion, Atul and whole transcriptomes of five Asiatic lion individuals. Evaluation of genetic diversity placed the Asiatic lion in the lowest bracket of genomic diversity index highlighting the gravity of its conservation status. Comparative analysis with other felids and mammalian genomes unraveled the evolutionary history of Asiatic lion and its position among other felids. The genome is estimated to be 2.3 Gb (Gigabase) long with 62X sequence coverage and is found to have 20,543 protein-coding genes. About 2.66% of the genome is covered by simple sequence repeats (SSRs) and 0.4% is estimated to have segmental duplications. Comparison with seven well annotated genomes indicates the presence of 6,295 single copy orthologs, 4 co-orthologs, 21 paralogs uniquely present in Asiatic lion and 8,024 other orthologs. Assessment of male and female transcriptomes gave a list of genes specifically expressed in the male.Our genomic analyses provide candidates for phenotypes characteristic to felids and lion, inviting further confirmation of their contribution through population genetic studies. An Asiatic lion-specific expansion is detected in the Cysteine Dioxygenase-I (CDO-I) family that is responsible for taurine biosynthesis in cats. Wilm’s tumor-associated protein (WT1) family, a non-Y chromosome genetic factor underlying male-sex determination and differentiation is found to have undergone expansion, interestingly like that of the human genome. Another protein family, translation machinery-associated protein 7 (TMA7) that has undergone expansion in humans, also expanded in Asiatic lion and can be further investigated as a candidate responsible for mane in lions because of its role in hair follicle morphogenesis.

Transcriptome Dynamics of Human Neuronal Differentiation From iPSC

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.727747 ◽

2021 ◽

Vol 9 ◽

Author(s):

Meltem Kuruş ◽

Soheil Akbari ◽

Doğa Eskier ◽

Ahmet Bursalı ◽

Kemal Ergin ◽

...

Keyword(s):

Stem Cells ◽

Neuronal Differentiation ◽

De Novo ◽

Neurological Diseases ◽

Transcriptome Assembly ◽

Embryonic Stem ◽

Protein Coding ◽

Regulatory Factors ◽

Adult Cell ◽

Induced Pluripotent

The generation and use of induced pluripotent stem cells (iPSCs) in order to obtain all differentiated adult cell morphologies without requiring embryonic stem cells is one of the most important discoveries in molecular biology. Among the uses of iPSCs is the generation of neuron cells and organoids to study the biological cues underlying neuronal and brain development, in addition to neurological diseases. These iPSC-derived neuronal differentiation models allow us to examine the gene regulatory factors involved in such processes. Among these regulatory factors are long non-coding RNAs (lncRNAs), genes that are transcribed from the genome and have key biological functions in establishing phenotypes, but are frequently not included in studies focusing on protein coding genes. Here, we provide a comprehensive analysis and overview of the coding and non-coding transcriptome during multiple stages of the iPSC-derived neuronal differentiation process using RNA-seq. We identify previously unannotated lncRNAs via genome-guided de novo transcriptome assembly, and the distinct characteristics of the transcriptome during each stage, including differentially expressed and stage specific genes. We further identify key genes of the human neuronal differentiation network, representing novel candidates likely to have critical roles in neurogenesis using coexpression network analysis. Our findings provide a valuable resource for future studies on neuronal differentiation.

De novo assembly and characterization of the first draft genome of quince (Cydonia oblonga Mill.)

Scientific Reports ◽

10.1038/s41598-021-83113-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Aysenur Soyturk ◽

Fatima Sen ◽

Ali Tevfik Uncu ◽

Ibrahim Celik ◽

Ayse Ozgur Uncu

Keyword(s):

Ab Initio ◽

De Novo ◽

Draft Genome ◽

Mirna Precursor ◽

Machine Learning Algorithms ◽

Genomic Research ◽

Support Vector ◽

Cydonia Oblonga ◽

Protein Coding ◽

A Genome

AbstractQuince (Cydonia oblonga Mill.) is the sole member of the genus Cydonia in the Rosacea family and closely related to the major pome fruits, apple (Malus domestica Borkh.) and pear (Pyrus communis L.). In the present work, whole genome shotgun paired-end sequencing was employed in order to assemble the first draft genome of quince. A genome assembly that spans 488.4 Mb of sequence corresponding to 71.2% of the estimated genome size (686 Mb) was produced in the study. Gene predictions via ab initio and homology-based sequence annotation strategies resulted in the identification of 25,428 and 30,684 unique putative protein coding genes, respectively. 97.4 and 95.6% of putative homologs of Arabidopsis and rice transcription factors were identified in the ab initio predicted genic sequences. Different machine learning algorithms were tested for classifying pre-miRNA (precursor microRNA) coding sequences, identifying Support Vector Machine (SVM) as the best performing classifier. SVM classification predicted 600 putative pre-miRNA coding loci. Repetitive DNA content of the assembly was also characterized. The first draft assembly of the quince genome produced in this work would constitute a foundation for functional genomic research in quince toward dissecting the genetic basis of important traits and performing genomics-assisted breeding.

Plastid genomics of Nicotiana (Solanaceae): insights into molecular evolution, positive selection and the origin of the maternal genome of Aztec tobacco (Nicotiana rustica)

PeerJ ◽

10.7717/peerj.9552 ◽

2020 ◽

Vol 8 ◽

pp. e9552

Author(s):

Furrukh Mehmood ◽

Abdullah ◽

Zartasha Ubaid ◽

Iram Shahzadi ◽

Ibrar Ahmed ◽

...

Keyword(s):

Selective Pressure ◽

De Novo ◽

Intergenic Spacer ◽

Evolutionary Model ◽

Cost Effective ◽

Single Copy ◽

Model Systems ◽

Protein Coding ◽

Plastid Genomes ◽

Worldwide Production

Species of the genus Nicotiana (Solanaceae), commonly referred to as tobacco plants, are often cultivated as non-food crops and garden ornamentals. In addition to the worldwide production of tobacco leaves, they are also used as evolutionary model systems due to their complex development history tangled by polyploidy and hybridization. Here, we assembled the plastid genomes of five tobacco species: N. knightiana, N. rustica, N. paniculata, N. obtusifolia and N. glauca. De novo assembled tobacco plastid genomes had the typical quadripartite structure, consisting of a pair of inverted repeat (IR) regions (25,323–25,369 bp each) separated by a large single-copy (LSC) region (86,510–86,716 bp) and a small single-copy (SSC) region (18,441–18,555 bp). Comparative analyses of Nicotiana plastid genomes with currently available Solanaceae genome sequences showed similar GC and gene content, codon usage, simple sequence and oligonucleotide repeats, RNA editing sites, and substitutions. We identified 20 highly polymorphic regions, mostly belonging to intergenic spacer regions (IGS), which could be suitable for the development of robust and cost-effective markers for inferring the phylogeny of the genus Nicotiana and family Solanaceae. Our comparative plastid genome analysis revealed that the maternal parent of the tetraploid N. rustica was the common ancestor of N. paniculata and N. knightiana, and the later species is more closely related to N. rustica. Relaxed molecular clock analyses estimated the speciation event between N. rustica and N. knightiana appeared 0.56 Ma (HPD 0.65–0.46). Biogeographical analysis supported a south-to-north range expansion and diversification for N. rustica and related species, where N. undulata and N. paniculata evolved in North/Central Peru, while N. rustica developed in Southern Peru and separated from N. knightiana, which adapted to the Southern coastal climatic regimes. We further inspected selective pressure on protein-coding genes among tobacco species to determine if this adaptation process affected the evolution of plastid genes. These analyses indicate that four genes involved in different plastid functions, including DNA replication (rpoA) and photosynthesis (atpB, ndhD and ndhF), came under positive selective pressure as a result of specific environmental conditions. Genetic mutations in these genes might have contributed to better survival and superior adaptations during the evolutionary history of tobacco species.

De novo transcriptome assembly of the Italian white truffle (Tuber magnatum Pico)

10.1101/461483 ◽

2018 ◽

Cited By ~ 1

Author(s):

Federico Vita ◽

Amedeo Alpi ◽

Edoardo Bertolini

Keyword(s):

Molecular Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Rna Seq ◽

Genetic Studies ◽

Protein Coding ◽

Important Species ◽

White Truffle ◽

Tuber Magnatum ◽

Insight Into

AbstractThe Italian white truffle (Tuber magnatum Pico) is a gastronomic delicacy that dominates the worldwide truffle market. Despite its importance, the genomic resources currently available for this species are still limited. Here we present the first de novo transcriptome assembly of T. magnatum. Illumina RNA-seq data were assembled using a single-k-mer approach into 22,932 transcripts with N50 of 1,524 bp. Our approach allowed to predict and annotate 12,367 putative protein coding sequences, reunited in 6,723 loci. In addition, we identified 2,581 gene-based SSR markers. This work provides the first publicly available reference transcriptome for genomics and genetic studies providing insight into the molecular mechanisms underlying the biology of this important species.