scholarly journals Transcriptome analyses provide insights into the difference of alkaloids biosynthesis in the Chinese goldthread (Coptis chinensis Franch.) from different biotopes

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3303 ◽  
Author(s):  
Hanting Chen ◽  
Cao Deng ◽  
Hu Nie ◽  
Gang Fan ◽  
Yang He

Coptis chinensis Franch., the Chinese goldthread (‘Weilian’ in Chinese), one of the most important medicinal plants from the family Ranunculaceae, and its rhizome has been widely used in Traditional Chinese Medicine for centuries. Here, we analyzed the chemical components and the transcriptome of the Chinese goldthread from three biotopes, including Zhenping, Zunyi and Shizhu. We built comprehensive, high-quality de novo transcriptome assemblies of the Chinese goldthread from short-read RNA-Sequencing data, obtaining 155,710 transcripts and 56,071 unigenes. More than 98.39% and 95.97% of core eukaryotic genes were found in the transcripts and unigenes respectively, indicating that this unigene set capture the majority of the coding genes. A total of 520,462, 493,718, and 507,247 heterozygous SNPs were identified in the three accessions from Zhenping, Zunyi, and Shizhu respectively, indicating high polymorphism in coding regions of the Chinese goldthread (∼1%). Chemical analyses of the rhizome identified six major components, including berberine, palmatine, coptisine, epiberberine, columbamine, and jatrorrhizine. Berberine has the highest concentrations, followed by coptisine, palmatine, and epiberberine sequentially for all the three accessions. The drug quality of the accession from Shizhu may be the highest among these accessions. Differential analyses of the transcriptome identified four pivotal candidate enzymes, including aspartate aminotransferaseprotein, polyphenol oxidase, primary-amine oxidase, and tyrosine decarboxylase, were significantly differentially expressed and may be responsible for the difference of alkaloids contents in the accessions from different biotopes.

2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Momchilo Vuyisich ◽  
Ayesha Arefin ◽  
Karen Davenport ◽  
Shihai Feng ◽  
Cheryl Gleasner ◽  
...  

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2017 ◽  
Author(s):  
Adriana Munoz ◽  
Boris Yamrom ◽  
Yoon-ha Lee ◽  
Peter Andrews ◽  
Steven Marks ◽  
...  

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.


2015 ◽  
Author(s):  
Stefano Lonardi ◽  
Hamid Mirebrahim ◽  
Steve Wanamaker ◽  
Matthew Alpert ◽  
Gianfranco Ciardo ◽  
...  

Since the invention of DNA sequencing in the seventies, computational biologists have had to deal with the problem de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, for the first time we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Specifically, we explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed by our group), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on "divide and conquer": we "slice" a large dataset into smaller samples of optimal size, decode each slice independently, then merge the results. Experimental results on over 15,000 barley BACs and over 4,000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9114 ◽  
Author(s):  
Jiawei Wang ◽  
Weizhen Liu ◽  
Dongzi Zhu ◽  
Xiang Zhou ◽  
Po Hong ◽  
...  

The sweet cherry (Prunus avium) is one of the most economically important fruit species in the world. However, there is a limited amount of genetic information available for this species, which hinders breeding efforts at a molecular level. We were able to describe a high-quality reference genome assembly and annotation of the diploid sweet cherry (2n = 2x = 16) cv. Tieton using linked-read sequencing technology. We generated over 750 million clean reads, representing 112.63 GB of raw sequencing data. The Supernova assembler produced a more highly-ordered and continuous genome sequence than the current P. avium draft genome, with a contig N50 of 63.65 KB and a scaffold N50 of 2.48 MB. The final scaffold assembly was 280.33 MB in length, representing 82.12% of the estimated Tieton genome. Eight chromosome-scale pseudomolecules were constructed, completing a 214 MB sequence of the final scaffold assembly. De novo, homology-based, and RNA-seq methods were used together to predict 30,975 protein-coding loci. 98.39% of core eukaryotic genes and 97.43% of single copy orthologues were identified in the embryo plant, indicating the completeness of the assembly. Linked-read sequencing technology was effective in constructing a high-quality reference genome of the sweet cherry, which will benefit the molecular breeding and cultivar identification in this species.


2015 ◽  
Author(s):  
Ivan Sovic ◽  
Kresimir Krizanovic ◽  
Karolj Skala ◽  
Mile Sikic

Recent emergence of nanopore sequencing technology set a challenge for the established assembly methods not optimized for the combination of read lengths and high error rates of nanopore reads. In this work we assessed how existing de novo assembly methods perform on these reads. We benchmarked three non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of E. coli K-12, using several sequencing coverages of nanopore data (20x, 30x, 40x and 50x). We attempted to assess the quality of assembly at each of these coverages, to estimate the requirements for closed bacterial genome assembly. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. Furthermore, when coverage is above 40x, all non-hybrid methods correctly assemble the E. coli genome, even a non-hybrid method tailored for Pacific Bioscience reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

AbstractBackgroundThe quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data.ResultsWe developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5’ and 3’ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.ConclusionsOur proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2021 ◽  
Author(s):  
Noah Dukler ◽  
Mehreen R Mughal ◽  
Ritika Ramani ◽  
Yi-Fei Huang ◽  
Adam Siepel

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5443 ◽  
Author(s):  
Dujun Wang ◽  
Li Zhao ◽  
Dan Wang ◽  
Jia Liu ◽  
Xiaofeng Yu ◽  
...  

Mulberry (Morus alba L.) represents one of the most commonly utilized plants in traditional medicine and as a nutritional plant used worldwide. The polyhydroxylated alkaloid 1-deoxynojirimycin (DNJ) is the major bioactive compounds of mulberry in treating diabetes. However, the DNJ content in mulberry is very low. Therefore, identification of key genes involved in DNJ alkaloid biosynthesis will provide a basis for the further analysis of its biosynthetic pathway and ultimately for the realization of synthetic biological production. Here, two cDNA libraries of mulberry leaf samples with different DNJ contents were constructed. Approximately 16 Gb raw RNA-Seq data was generated and de novo assembled into 112,481 transcripts, with an average length of 766 bp and an N50 value of 1,392. Subsequently, all unigenes were annotated based on nine public databases; 11,318 transcripts were found to be significantly differentially regulated. A total of 38 unique candidate genes were identified as being involved in DNJ alkaloid biosynthesis in mulberry, and nine unique genes had significantly different expression. Three key transcripts of DNJ biosynthesis were identified and further characterized using RT-PCR; they were assigned to lysine decarboxylase and primary-amine oxidase genes. Five CYP450 transcripts and two methyltransferase transcripts were significantly associated with DNJ content. Overall, the biosynthetic pathway of DNJ alkaloid was preliminarily speculated.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 644
Author(s):  
Carlus Deneke ◽  
Holger Brendebach ◽  
Laura Uelze ◽  
Maria Borowiak ◽  
Burkhard Malorny ◽  
...  

Sequencing of whole microbial genomes has become a standard procedure for cluster detection, source tracking, outbreak investigation and surveillance of many microorganisms. An increasing number of laboratories are currently in a transition phase from classical methods towards next generation sequencing, generating unprecedented amounts of data. Since the precision of downstream analyses depends significantly on the quality of raw data generated on the sequencing instrument, a comprehensive, meaningful primary quality control is indispensable. Here, we present AQUAMIS, a Snakemake workflow for an extensive quality control and assembly of raw Illumina sequencing data, allowing laboratories to automatize the initial analysis of their microbial whole-genome sequencing data. AQUAMIS performs all steps of primary sequence analysis, consisting of read trimming, read quality control (QC), taxonomic classification, de-novo assembly, reference identification, assembly QC and contamination detection, both on the read and assembly level. The results are visualized in an interactive HTML report including species-specific QC thresholds, allowing non-bioinformaticians to assess the quality of sequencing experiments at a glance. All results are also available as a standard-compliant JSON file, facilitating easy downstream analyses and data exchange. We have applied AQUAMIS to analyze ~13,000 microbial isolates as well as ~1000 in-silico contaminated datasets, proving the workflow’s ability to perform in high throughput routine sequencing environments and reliably predict contaminations. We found that intergenus and intragenus contaminations can be detected most accurately using a combination of different QC metrics available within AQUAMIS.


Sign in / Sign up

Export Citation Format

Share Document