Gene Space and Transcriptome Assemblies of Leafy Spurge (Euphorbia esula) Identify Promoter Sequences, Repetitive Elements, High-Quality Markers, and a Full-Length Chloroplast Genome

Weed Science ◽  
2018 ◽  
Vol 66 (3) ◽  
pp. 355-367 ◽  
Author(s):  
David P. Horvath ◽  
Sagar Patel ◽  
Münevver Doğramaci ◽  
Wun S. Chao ◽  
James V. Anderson ◽  
...  

AbstractLeafy spurge (Euphorbia esulaL.) is an invasive perennial weed infesting range and recreational lands of North America. Previous research and omics projects withE. esulahave helped develop it as a model for studying many aspects of perennial plant development and response to abiotic stress. However, the lack of an assembled genome forE. esulahas limited the power of previous transcriptomics studies to identify functional promoter elements and transcription factor binding sites. An assembled genome forE. esulawould enhance our understanding of signaling processes controlling plant development and responses to environmental stress and provide a better understanding of genetic factors impacting weediness traits, evolution, and herbicide resistance. A comprehensive transcriptome database would also assist in analyzing future RNA-seq studies and is needed to annotate and assess genomic sequence assemblies. Here, we assembled and annotated 56,234 unigenes from an assembly of 589,235 RNA-seq-derived contigs and a previously published Sanger-sequenced expressed sequence tag collection. The resulting data indicate that we now have sequence for >90% of the expressedE. esulaprotein-coding genes. We also assembled the gene space ofE. esulaby using a limited coverage (18X) genomic sequence database. In this study, the programs Velvet and Trinity produced the best gene-space assemblies based on representation of expressed and conserved eukaryotic genes. The results indicate thatE. esulacontains as much as 23% repetitive sequences, of which 11% are unique. Our sequence data were also sufficient for assembling a full chloroplast and partial mitochondrial genome. Further, marker analysis identified more than 150,000 high-quality variants in ourE. esulaL-RNA–scaffolded, whole-genome, Trinity-assembled genome. Based on these results,E. esulaappears to have limited heterozygosity. This study provides a blueprint for low-cost genomic assemblies in weed species and new resources for identifying conserved and novel promoter regions among coordinately expressed genes ofE. esula.

2015 ◽  
Vol 9S4 ◽  
pp. BBI.S29333 ◽  
Author(s):  
Stefan E. Seemann ◽  
Christian Anthon ◽  
Oana Palasca ◽  
Jan Gorodkin

The era of high-throughput sequencing has made it relatively simple to sequence genomes and transcriptomes of individuals from many species. In order to analyze the resulting sequencing data, high-quality reference genome assemblies are required. However, this is still a major challenge, and many domesticated animal genomes still need to be sequenced deeper in order to produce high-quality assemblies. In the meanwhile, ironically, the extent to which RNA seq and other next-generation data is produced frequently far exceeds that of the genomic sequence. Furthermore, basic comparative analysis is often affected by the lack of genomic sequence. Herein, we quantify the quality of the genome assemblies of 20 domesticated animals and related species by assessing a range of measurable parameters, and we show that there is a positive correlation between the fraction of mappable reads from RNAseq data and genome assembly quality. We rank the genomes by their assembly quality and discuss the implications for genotype analyses.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2689 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Pedro J. Martínez-García ◽  
Abhaya M. Dandekar

Background: The transcriptome, a treasure trove of gene space information, remains severely under-used by current genome annotation methods. Methods: Here, we present an annotation method in the YeATS suite (YeATSAM), based on information encoded by the transcriptome, that demonstrates artifacts of the assembler, which must be addressed to achieve proper annotation. Results and Discussion: YeATSAM was applied to the transcriptome obtained from twenty walnut tissues and compared to MAKER-P annotation of the recently published walnut genome sequence (WGS). MAKER-P and YeATSAM both failed to annotate several hundred proteins found by the other. Although many of these unannotated proteins have repetitive sequences (possibly transposable elements), other crucial proteins were excluded by each method. An egg cell-secreted protein and a homer protein were undetected by YeATSAM, although these did not produce any transcripts. Importantly, MAKER-P failed to classify key photosynthesis-related proteins, which we show emanated from Trinity assembly artifacts potentially not handled by MAKER-P. Also, no proteins from the large berberine bridge enzyme (BBE) family were annotated by MAKER-P. BBE is implicated in biosynthesis of several alkaloids metabolites, like anti-microbial berberine. As further validation, YeATSAM identified ~1000 genes that are not annotated in the NCBI database by Gnomon. YeATSAM used a RNA-seq derived chickpea (Cicer arietinum L.) transcriptome assembled using Newbler v2.3. Conclusions: Since the current version of YeATSAM does not have an ab initio module, we suggest a combined annotation scheme using both MAKER-P and YeATSAM to comprehensively and accurately annotate the WGS.


Weed Science ◽  
1988 ◽  
Vol 36 (6) ◽  
pp. 784-786 ◽  
Author(s):  
Stephen J. Harvey ◽  
Robert M. Nowierski

The growth and development of leafy spurge (Euphorbia esulaL. #3EPHES) collected during postsenescent dormancy and grown in the greenhouse was increasingly stimulated by chilling treatments longer than 14 days duration at 0 to 6 C. Production of stems with flower buds, primary flowers, and secondary flowers was greater in plants chilled for 42 days or more. The effects of chilling on total number of stems, number of strictly vegetative stems, or number of stems with vegetative branching were not significant. The height of the tallest stem per pot was influenced by chilling longer than 42 days. Growth rate also increased as a function of chilling duration. Based on our findings, we believe that there is little possibility that any significant growth can occur in the postsenescent period because of the prevailing climatic conditions found in areas of leafy spurge distribution in North America.


Author(s):  
Tomas N Generalovic ◽  
Shane A McCarthy ◽  
Ian A Warren ◽  
Jonathan M D Wood ◽  
James Torrance ◽  
...  

Abstract Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a BUSCO completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analysed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of a lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome five. Release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterisation of genes of interest and genetic modification of this economically important species.


Weeds ◽  
1956 ◽  
Vol 4 (3) ◽  
pp. 275 ◽  
Author(s):  
Duane Le Tourneau

1987 ◽  
Vol 1 (4) ◽  
pp. 314-318 ◽  
Author(s):  
Rodney G. Lym ◽  
Donald R. Kirby

Leafy spurge causes economic loss by reducing both herbage production and use. Herbage use by grazing cattle in various densities of leafy spurge (Euphorbia esulaL. #3EPHES) was evaluated over a 3-yr period in North Dakota. Forage production and disappearance were estimated in four density classes of leafy spurge. Use of cool- and warm-season graminoids, forbs, and leafy spurge was estimated during the middle and the end of each grazing season. Cattle used 20 and 2% of the herbage in the zero and low density infestations, respectively, by mid-season. Moderate and high density infestations were avoided until the milky latex in leafy spurge disappeared in early fall, and herbage availability in zero and low density infestations declined. Herbage use in moderate and high density infestations increased to an average of 46% by the end of the grazing season compared to 61% in zero and low density infestations. An annual herbage loss of at least 35% occurred in pasture infested with 50% density or more of leafy spurge.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009631
Author(s):  
Raquel Linheiro ◽  
John Archer

With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/.


2020 ◽  
Author(s):  
Jing Li ◽  
Meiqi Lv ◽  
Lei Du ◽  
A Yunga ◽  
Shijie Hao ◽  
...  

AbstractThe monocot family Melanthiaceae with varying genome sizes in a range of 230-fold is an ideal model to study the genome size fluctuation in plants. Its family member Paris genus demonstrates an evolutionary trend of bearing huge genomes characterized by an average c-value of 49.22 pg. Here, we report a 70.18 Gb genome assembly out of the 82.55 Gb genome of Paris polyphylla var. yunnanensis (PPY), which represents the biggest sequenced genome to date. We annotate 69.53% repetitive sequences in this genome and 62.50% of which are long-terminal repeat (LTR) transposable elements. Further evolution analysis indicates that the giant genome likely results from the joint effect of common and species-specific expansion of different LTR superfamilies, which might contribute to the environment adaptation after speciation. Moreover, we identify the candidate pathway genes for the biogenesis of polyphyllins, the PPY-specific medicinal saponins, by complementary approaches including genome mining, comprehensive analysis of 31 next-generation RNA-seq data and 55.23 Gb single-molecule circular consensus sequencing (CCS) RNA-seq reads, and correlation of the transcriptome and phytochemical data of five different tissues at four growth stages. This study not only provides significant insights into plant genome size evolution, but also paves the way for the following polyphyllin synthetic biology.


Sign in / Sign up

Export Citation Format

Share Document