scholarly journals Enhancement of de novo sequencing, assembly and annotation of the Mongolian gerbil genome with transcriptome sequencing and assembly from several different tissues

2019 ◽  
Author(s):  
Shifeng Cheng ◽  
Yuan Fu ◽  
Yaolei Zhang ◽  
Wenfei Xian ◽  
Hongli Wang ◽  
...  

Abstract BACKGROUND: The Mongolian gerbil (Meriones unguiculatus) has historically been used as a model organism for the auditory and visual systems, stroke/ischemia, epilepsy and aging related research since 1935 when laboratory gerbils were separated from their wild counterparts. In this study we report genome sequencing, assembly, and annotation further supported by transcriptome sequencing and assembly from 27 different tissues samples. RESULTS: The genome was assembled using Illumina HiSeq 2000 and resulted in a final genome size of 2.54 Gbp with contig and scaffold N50 values of 31.4 Kbp and 500.0 Kbp, respectively. Based on the k-mer estimated genome size of 2.48 Gbp, the assembly appears to be complete. The genome annotation was supported by transcriptome data that identified 31 769 (>2000bp) predicted protein-coding genes across 27 tissue samples. A BUSCO search of 3023 mammalian groups resulted in 86% of curated single copy orthologs present among predicted genes, indicating a high level of completeness of the genome. CONCLUSIONS: We report a de novo assembly of the Mongolian gerbil genome that was further enhanced by assembly of transcriptome data from several tissues. Sequencing of this genome increases the utility of the gerbil as a model organism, opening the availability of now widely used genetic tools.

2019 ◽  
Author(s):  
Shifeng Cheng ◽  
Yuan Fu ◽  
Yaolei Zhang ◽  
Wenfei Xian ◽  
Hongli Wang ◽  
...  

Abstract BACKGROUND: The Mongolian gerbil (Meriones unguiculatus) has historically been used as a model organism for the auditory and visual systems, stroke/ischemia, epilepsy and aging related research since 1935 when laboratory gerbils were separated from their wild counterparts. In this study we report genome sequencing, assembly, and annotation further supported by transcriptome sequencing and assembly from 27 different tissues samples. RESULTS: The genome was sequenced using Illumina HiSeq 2000 and after assembly resulted in a final genome size of 2.54 Gbp with contig and scaffold N50 values of 31.4 Kbp and 500.0 Kbp, respectively. Based on the k-mer estimated genome size of 2.48 Gbp, the assembly appears to be complete. The genome annotation was supported by transcriptome data that identified 31 769 (>2000bp) predicted protein-coding genes across 27 tissue samples. A BUSCO search of 3023 mammalian groups resulted in 86% of curated single copy orthologs present among predicted genes, indicating a high level of completeness of the genome. CONCLUSIONS: We report the first de novo assembly of the Mongolian gerbil genome enhanced by assembly of transcriptome data from several tissues. Sequencing of this genome and transcriptome increases the utility of the gerbil as a model organism, opening the availability of now widely used genetic tools.


2019 ◽  
Author(s):  
Shifeng Cheng ◽  
Yuan Fu ◽  
Yaolei Zhang ◽  
Wenfei Xian ◽  
Hongli Wang ◽  
...  

Abstract BACKGROUND: The Mongolian gerbil (Meriones unguiculatus) has historically been used as a model organism for the auditory and visual systems, stroke/ischemia, epilepsy and aging related research since 1935 when laboratory gerbils were separated from their wild counterparts. In this study we report genome sequencing, assembly, and annotation further supported by transcriptome sequencing and assembly from 27 different tissues samples. RESULTS: The genome was sequenced using Illumina HiSeq 2000 and after assembly resulted in a final genome size of 2.54 Gbp with contig and scaffold N50 values of 31.4 Kbp and 500.0 Kbp, respectively. Based on the k-mer estimated genome size of 2.48 Gbp, the assembly appears to be complete. The genome annotation was supported by transcriptome data that identified 31 769 (>2000bp) predicted protein-coding genes across 27 tissue samples. A BUSCO search of 3023 mammalian groups resulted in 86% of curated single copy orthologs present among predicted genes, indicating a high level of completeness of the genome. CONCLUSIONS: We report the first de novo assembly of the Mongolian gerbil genome enhanced by assembly of transcriptome data from several tissues. Sequencing of this genome and transcriptome increases the utility of the gerbil as a model organism, opening the availability of now widely used genetic tools.


2019 ◽  
Author(s):  
Shifeng Cheng ◽  
Yuan Fu ◽  
Yaolei Zhang ◽  
Wenfei Xian ◽  
Hongli Wang ◽  
...  

Abstract BACKGROUND: The Mongolian gerbil ( Meriones unguiculatus ) has historically been used as a model organism for the auditory and visual systems, stroke/ischemia, epilepsy and aging related research since 1935 when laboratory gerbils were separated from their wild counterparts. In this study we report genome sequencing, assembly, and annotation further supported by transcriptome sequencing and assembly from 27 different tissues samples. RESULTS: The genome was sequenced using Illumina HiSeq 2000 and after assembly resulted in a final genome size of 2.54 Gbp with contig and scaffold N50 values of 31.4 Kbp and 500.0 Kbp, respectively. Based on the k-mer estimated genome size of 2.48 Gbp, the assembly appears to be complete. The genome annotation was supported by transcriptome data that identified 31 769 (>2000bp) predicted protein-coding genes across 27 tissue samples. A BUSCO search of 3023 mammalian groups resulted in 86% of curated single copy orthologs present among predicted genes, indicating a high level of completeness of the genome. CONCLUSIONS: We report the first de novo assembly of the Mongolian gerbil genome enhanced by assembly of transcriptome data from several tissues. Sequencing of this genome and transcriptome increases the utility of the gerbil as a model organism, opening the availability of now widely used genetic tools.


2019 ◽  
Author(s):  
Shifeng Cheng ◽  
Yuan Fu ◽  
Yaolei Zhang ◽  
Wenfei Xian ◽  
Hongli Wang ◽  
...  

AbstractBACKGROUNDThe Mongolian gerbil (Meriones unguiculatus) has historically been used as a model organism for the auditory and visual systems, stroke/ischemia, epilepsy and aging related research since 1935 when laboratory gerbils were separated from their wild counterparts. In this study we report genome sequencing, assembly, and annotation further supported by transcriptome data from 27 different tissues samples.FINDINGSThe genome was assembled using Illumina HiSeq 2000 and resulted in a final genome size of 2.54 Gbp with contig and scaffold N50 values of 31.4 Kbp and 500.0 Kbp, respectively. Based on the k-mer estimated genome size of 2.48 Gbp, the assembly appears to be complete. The genome annotation was supported by transcriptome data that identified 36 019 predicted protein-coding genes across 27 tissue samples. A BUSCO search of 3023 mammalian groups resulted in 86% of curated single copy orthologs present among predicted genes, indicating a high level of completeness of the genome.CONCLUSIONSWe report a de novo assembly of the Mongolian gerbil genome that was further enhanced by annotation of transcriptome data from several tissues. Sequencing of this genome increases the utility of the gerbil as a model organism, opening the availability of now widely used genetic tools.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Mikhail Rayko ◽  
Aleksey Komissarov ◽  
Jason C. Kwan ◽  
Grace Lim-Fong ◽  
Adelaide C. Rhodes ◽  
...  

Abstract Many animal phyla have no representatives within the catalog of whole metazoan genome sequences. This dataset fills in one gap in the genome knowledge of animal phyla with a draft genome of Bugula neritina (phylum Bryozoa). Interest in this species spans ecology and biomedical sciences because B. neritina is the natural source of bioactive compounds called bryostatins. Here we present a draft assembly of the B. neritina genome obtained from PacBio and Illumina HiSeq data, as well as genes and proteins predicted de novo and verified using transcriptome data, along with the functional annotation. These sequences will permit a better understanding of host-symbiont interactions at the genomic level, and also contribute additional phylogenomic markers to evaluate Lophophorate or Lophotrochozoa phylogenetic relationships. The effort also fits well with plans to ultimately sequence all orders of the Metazoa.


2018 ◽  
Vol 54 (No. 1) ◽  
pp. 17-25 ◽  
Author(s):  
D.-D. Vu ◽  
T.T.-X. Bui ◽  
T.H.-N. Nguyen ◽  
S.N.M. Shah ◽  
N.-H. Vu ◽  
...  

A total 20 074 230 sequencing reads were generated by Illumina HiSeq<sup>™ </sup>2500 from three different Toxicodendron vernicifluum tissue samples. In total, 48 693 unigenes with an average length of 703.34 bp were obtained by de novo assembly. 3392 potential EST-SSRs (expressed sequence tag-simple sequence repeat) were identified as potential molecular markers from unigenes with lengths exceeding 1 kb. A total of 80 pairs of PCR primers were randomly selected to validate the assembly quality and develop EST-SSR markers from genomic DNA. Of these primer pairs, 14 primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism within the lacquer tree population in Langao, Shaanxi province, China. There were high genetic diversities (number of alleles per locus (A) = 2.93, polymorphic information content (PIC) = 0.53, observed heterozygosity (Ho) = 0.62 and expected heterozygosity (He) = 0.85) in the lacquer tree natural population. The four loci were significantly deviated from Hardy-Weinberg equilibrium. These results suggested high homozygosity in the population and low or deficiency in heterozygosity (inbreeding coefficient (Fis) = 0.27). These polymorphic EST-SSR markers will provide the base for further studies of genetic structure and breeding in T. vernicifluum.


2020 ◽  
Author(s):  
Guolin Zhou ◽  
Ping Zhu

Abstract Background: Rhododendron molle (Ericaceae) is a traditional Chinese medicinal plant, its flower and root have been widely used to treat rheumatism and relieve pain for thousands of years in China. Chemical studies have revealed that R. molle contains abundant secondary metabolites such as terpenoinds, flavonoids and lignans, some of which have exhibited various bioactivities including antioxidant, hypotension and analgesic activity. In spite of immense pharmaceutical importance, the mechanism underlying the biosynthesis of secondary metabolites remains unknown and the genomic information is unavailable. Results: To gain molecular insight into this plant, especially on the information of pharmaceutically important secondary metabolites including grayanane diterpenoids, we conducted deep transcriptome sequencing for R. molle flower and root using the Illumina Hiseq platform. In total, 100,603 unigenes were generated through de novo assembly with mean length of 778 bp, 57.1% of these unigenes were annotated in public databases and 17,906 of those unigenes showed significant match in the KEGG database. Unigenes involved in the biosynthesis of secondary metabolites were annotated, including the TPSs and CYPs that were potentially responsible for the biosynthesis of grayanoids. Moreover, 3,376 transcription factors and 10,828 simple sequence repeats (SSRs) were also identified. Additionally, we further performed differential gene expression (DEG) analysis of the flower and root transcriptome libraries and identified numerous genes that were specifically expressed or up-regulated in flower.Conclusions: To the best of our knowledge, this is the first time to generate and thoroughly analyze the transcriptome data of both R. molle flower and root. This study provided an important genetic resource which will shed light on elucidating various secondary metabolite biosynthetic pathways in R. molle, especially for those with medicinal value and allow for drug development in this plant.


2020 ◽  
Author(s):  
Guolin Zhou ◽  
Ping Zhu

Abstract Background: Rhododendron molle (Ericaceae) is a traditional Chinese medicinal plant, its flower and root have been widely used to treat rheumatism and relieve pain for thousands of years in China. Chemical studies have revealed that R. molle contains abundant secondary metabolites such as terpenoinds, flavonoids and lignans, some of which have exhibited various bioactivities including antioxidant, hypotension and analgesic activity. In spite of immense pharmaceutical importance, the mechanism underlying the biosynthesis of secondary metabolites remains unknown and the genomic information is unavailable. Results: To gain molecular insight into this plant, especially on the pharmaceutically important secondary metabolic information, we conducted deep transcriptome sequencing for R. molle flower and root using the Illumina Hiseq platform. In total, 100,603 unigenes were generated through de novo assembly with mean length of 778 bp, 57.1% of these unigenes were annotated in public databases and 17,906 of those unigenes showed significant match in the KEGG database. Unigenes involved in the biosynthesis of secondary metabolites were annotated, including the TPSs and CYPs that were potentially responsible for the biosynthesis of grayanoids. Moreover, 3,376 transcription factors and 10,828 simple sequence repeats (SSRs) were also identified. Additionally, we further performed differential gene expression (DEG) analysis of the flower and root transcriptome libraries and identified numerous genes that were specifically expressed or up-regulated in flower.Conclusions: To the best of our knowledge, this is the first time to generate and thoroughly analyze the transcriptome data of both R. molle flower and root. This study provided an important genetic resource which will shed light on elucidating various secondary metabolite biosynthetic pathways in R. molle, especially for those with medicinal value and allow for drug development in this plant.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
D. N. U. Naranpanawa ◽  
C. H. W. M. R. B. Chandrasekara ◽  
P. C. G. Bandaranayake ◽  
A. U. Bandaranayake

Abstract Recent advances in next-generation sequencing technologies have paved the path for a considerable amount of sequencing data at a relatively low cost. This has revolutionized the genomics and transcriptomics studies. However, different challenges are now created in handling such data with available bioinformatics platforms both in assembly and downstream analysis performed in order to infer correct biological meaning. Though there are a handful of commercial software and tools for some of the procedures, cost of such tools has made them prohibitive for most research laboratories. While individual open-source or free software tools are available for most of the bioinformatics applications, those components usually operate standalone and are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures starting from raw sequence data too complicated and time-consuming with the associated learning-curve. Here, we outline a procedure for de novo transcriptome assembly and Simple Sequence Repeats (SSR) primer design solely based on tools that are available online for free use. For validation of the developed workflow, we used Illumina HiSeq reads of different tissue samples of Santalum album (sandalwood), generated from a previous transcriptomics project. A portion of the designed primers were tested in the lab with relevant samples and all of them successfully amplified the targeted regions. The presented bioinformatics workflow can accurately assemble quality transcriptomes and develop gene specific SSRs. Beginner biologists and researchers in bioinformatics can easily utilize this workflow for research purposes.


Sign in / Sign up

Export Citation Format

Share Document