scholarly journals Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Shujun Ou ◽  
Weija Su ◽  
Yi Liao ◽  
Kapeel Chougule ◽  
Jireh R. A. Agda ◽  
...  

Abstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. Results We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. Conclusions The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

2019 ◽  
Author(s):  
Shujun Ou ◽  
Weija Su ◽  
Yi Liao ◽  
Kapeel Chougule ◽  
Doreen Ware ◽  
...  

AbstractSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.


2021 ◽  
Author(s):  
Matias Rodriguez ◽  
Wojciech Makałowski

AbstractTransposable elements (TEs) are major genomic components in most eukaryotic genomes and play an important role in genome evolution. However, despite their relevance the identification of TEs is not an easy task and a number of tools were developed to tackle this problem. To better understand how they perform, we tested several widely used tools for de novo TE detection and compared their performance on both simulated data and well curated genomic sequences. The results will be helpful for identifying common issues associated with TE-annotation and for evaluating how comparable are the results obtained with different tools.


2019 ◽  
Author(s):  
JR Bermúdez-Barrientos ◽  
O Ramírez-Sánchez ◽  
FWN Chow ◽  
AH Buck ◽  
C Abreu-Goodger

ABSTRACTMany organisms exchange small RNAs during their interactions, and these RNAs can target or bolster defense strategies in host-pathogen systems. Current sRNA-Seq technology can determine the small RNAs present in any symbiotic system, but there are very few bioinformatic tools available to interpret the results. We show that one of the biggest challenges comes from sequences that map equally well to the genomes of both interacting organisms. This arises due to the small size of the sRNA compared to large genomes, and because many of the produced sRNAs come from genomic regions that encode highly conserved miRNAs, rRNAs or tRNAs. Here we present strategies to disentangle sRNA-Seq data from samples of communicating organisms, developed using diverse plant and animal species that are known to exchange RNA with their parasites. We show that sequence assembly, both de novo and genome-guided, can be used for sRNA-Seq data, greatly reducing the ambiguity of mapping reads. Even confidently mapped sequences can be misleading, so we further demonstrate the use of differential expression strategies to determine the true parasitic sRNAs within host cells. Finally, we validate our methods on new experiments designed to probe the nature of the extracellular vesicle sRNAs from the parasitic nematode H. bakeri that get into mouse epithelial cells.


2020 ◽  
Vol 48 (12) ◽  
pp. 6685-6698 ◽  
Author(s):  
Xinyan Zhang ◽  
Meixia Zhao ◽  
Donald R McCarty ◽  
Damon Lisch

Abstract Transposable elements (TEs) are ubiquitous DNA segments capable of moving from one site to another within host genomes. The extant distributions of TEs in eukaryotic genomes have been shaped by both bona fide TE integration preferences in eukaryotic genomes and by selection following integration. Here, we compare TE target site distribution in host genomes using multiple de novo transposon insertion datasets in both plants and animals and compare them in the context of genome-wide transcriptional landscapes. We showcase two distinct types of transcription-associated TE targeting strategies that suggest a process of convergent evolution among eukaryotic TE families. The integration of two precision-targeting elements are specifically associated with initiation of RNA Polymerase II transcription of highly expressed genes, suggesting the existence of novel mechanisms of precision TE targeting in addition to passive targeting of open chromatin. We also highlight two features that can facilitate TE survival and rapid proliferation: tissue-specific transposition and minimization of negative impacts on nearby gene function due to precision targeting.


2021 ◽  
Author(s):  
Matias Rodríguez ◽  
Wojciech Makalowski

Abstract Transposable elements (TEs) are major genomic components in most eukaryotic genomes and play an important role in genome evolution. However, despite their relevance the identification of TEs is not an easy task and a number of tools were developed to tackle this problem. To better understand how they perform, we tested several widely used tools for de novo!TE detection and compared their performance on both simulated data and well curated genomic sequences. The results will be helpful for identifying common issues associated with TE-annotation and for evaluating how comparable are the results obtained with different tools.


2019 ◽  
Vol 11 (11) ◽  
pp. 3181-3193
Author(s):  
Stefan Cerbin ◽  
Ching Man Wai ◽  
Robert VanBuren ◽  
Ning Jiang

Abstract Transposable elements represent the largest components of many eukaryotic genomes and different genomes harbor different combinations of elements. Here, we discovered a novel DNA transposon in the genome of the clubmoss Selaginella lepidophylla. Further searching for related sequences to the conserved DDE region uncovered the presence of this superfamily of elements in fish, coral, sea anemone, and other animal species. However, this element appears restricted to Bryophytes and Lycophytes in plants. This transposon, named GingerRoot, is associated with a 6 bp (base pair) target site duplication, and 100–150 bp terminal inverted repeats. Analysis of transposase sequences identified the DDE motif, a catalytic domain, which shows similarity to the integrase of Gypsy-like long terminal repeat retrotransposons, the most abundant component in plant genomes. A total of 77 intact and several hundred truncated copies of GingerRoot elements were identified in S. lepidophylla. Like Gypsy retrotransposons, GingerRoots show a lack of insertion preference near genes, which contrasts to the compact genome size of about 100 Mb. Nevertheless, a considerable portion of GingerRoot elements was found to carry gene fragments, suggesting the capacity of duplicating gene sequences is unlikely attributed to the proximity to genes. Elements carrying gene fragments appear to be less methylated, more diverged, and more distal to genes than those without gene fragments, indicating they are preferentially retained in gene-poor regions. This study has identified a broadly dispersed, novel DNA transposon, and the first plant DNA transposon with an integrase-related transposase, suggesting the possibility of de novo formation of Gypsy-like elements in plants.


2020 ◽  
Author(s):  
Dafang Wang ◽  
Jianbo Zhang ◽  
Tao Zuo ◽  
Damon Lisch ◽  
Meixia Zhao ◽  
...  

AbstractAlthough Transposable Elements (TEs) comprise a major fraction of many higher eukaryotic genomes, most TEs are silenced by host defense mechanisms. The means by which otherwise active TEs are recognized and silenced remains poorly understood. Here we analyzed two independent cases of spontaneous silencing of the active maize Ac/Ds transposon system. This silencing was initiated by Alternative Transposition (AT), a type of aberrant transposition event that engages the termini of two nearby separate TEs. AT during DNA replication can generate Composite Insertions (CIs) that contain inverted duplications of the transposon sequences. We show that the inverted duplications of two CIs are transcribed to produce dsRNAs that trigger the production of two distinct classes of siRNAs: a 24-nt class complementary to the TE terminal inverted repeats (TIRs) and non-coding sub-terminal regions, and a 21-22 nt class corresponding to the TE transcribed regions. Plants containing these siRNA-generating CIs exhibit decreased levels of Ac transcript and heritable repression of Ac/Ds transposition. This study documents the first case of TE silencing attributable to transposon self-initiated AT and may represent a general initiating mechanism for silencing of DNA transposons.Article summaryTransposable Elements (TEs) are often silenced by their hosts, but how TEs are initially recognized for silencing remains unclear. Here we describe two independent loci that induce de novo heritable silencing of maize Ac/Ds transposons. Plants containing these loci produce dsRNA and Ac-homologous small interfering RNAs, and exhibit decreased levels of Ac transcript and heritable repression of Ac/Ds transposition. We show that these loci comprise inverted duplications of TE sequences generated by Alternative Transposition coupled with DNA re-replication. This study documents the first case of transposon silencing induced by AT and may represent a general initiating mechanism for TE silencing.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Alexandre Perochon ◽  
Harriet R. Benbow ◽  
Katarzyna Ślęczka-Brady ◽  
Keshav B. Malla ◽  
Fiona M. Doohan

AbstractThere is increasing evidence that some functionally related, co-expressed genes cluster within eukaryotic genomes. We present a novel pipeline that delineates such eukaryotic gene clusters. Using this tool for bread wheat, we uncovered 44 clusters of genes that are responsive to the fungal pathogen Fusarium graminearum. As expected, these Fusarium-responsive gene clusters (FRGCs) included metabolic gene clusters, many of which are associated with disease resistance, but hitherto not described for wheat. However, the majority of the FRGCs are non-metabolic, many of which contain clusters of paralogues, including those implicated in plant disease responses, such as glutathione transferases, MAP kinases, and germin-like proteins. 20 of the FRGCs encode nonhomologous, non-metabolic genes (including defence-related genes). One of these clusters includes the characterised Fusarium resistance orphan gene, TaFROG. Eight of the FRGCs map within 6 FHB resistance loci. One small QTL on chromosome 7D (4.7 Mb) encodes eight Fusarium-responsive genes, five of which are within a FRGC. This study provides a new tool to identify genomic regions enriched in genes responsive to specific traits of interest and applied herein it highlighted gene families, genetic loci and biological pathways of importance in the response of wheat to disease.


2021 ◽  
Author(s):  
Jakob M. Goldmann ◽  
Vladimir B. Seplyarskiy ◽  
Wendy S. W. Wong ◽  
Thierry Vilboux ◽  
Pieter B. Neerincx ◽  
...  

2021 ◽  
Vol 22 (2) ◽  
pp. 602
Author(s):  
Elisa Carotti ◽  
Federica Carducci ◽  
Adriana Canapa ◽  
Marco Barucca ◽  
Samuele Greco ◽  
...  

Transposable elements (TEs) represent a considerable fraction of eukaryotic genomes, thereby contributing to genome size, chromosomal rearrangements, and to the generation of new coding genes or regulatory elements. An increasing number of works have reported a link between the genomic abundance of TEs and the adaptation to specific environmental conditions. Diadromy represents a fascinating feature of fish, protagonists of migratory routes between marine and freshwater for reproduction. In this work, we investigated the genomes of 24 fish species, including 15 teleosts with a migratory behaviour. The expected higher relative abundance of DNA transposons in ray-finned fish compared with the other fish groups was not confirmed by the analysis of the dataset considered. The relative contribution of different TE types in migratory ray-finned species did not show clear differences between oceanodromous and potamodromous fish. On the contrary, a remarkable relationship between migratory behaviour and the quantitative difference reported for short interspersed nuclear (retro)elements (SINEs) emerged from the comparison between anadromous and catadromous species, independently from their phylogenetic position. This aspect is likely due to the substantial environmental changes faced by diadromous species during their migratory routes.


Sign in / Sign up

Export Citation Format

Share Document