scholarly journals TEsorter: lineage-level classification of transposable elements using conserved protein domains

2019 ◽  
Author(s):  
Ren-Gang Zhang ◽  
Zhao-Xuan Wang ◽  
Shujun Ou ◽  
Guang-Yuan Li

AbstractSummaryTransposable elements (TEs) constitute an import part in eukaryotic genomes, but their classification, especially in the lineage or clade level, is still challenging. For this purpose, we propose TEsorter, which is based on conserved protein domains of TEs. It is easy-to-use, fast with multiprocessing, sensitive and precise to classify TEs especially LTR retrotransposons (LTR-RTs). Its results can also directly reflect phylogenetic relationships and diversities of the classified LTR-RTs.AvailabilityThe code in Python is freely available at https://github.com/zhangrengang/TEsorter.

2011 ◽  
Vol 2011 ◽  
pp. 1-9 ◽  
Author(s):  
Christine Dubreuil-Tranchant ◽  
Romain Guyot ◽  
Amira Guellim ◽  
Caroline Duret ◽  
Marion de la Mare ◽  
...  

Miniature Inverted-repeat Transposable Elements (MITEs) are small nonautonomous class-II transposable elements distributed throughout eukaryotic genomes. We identified a novel family of MITEs (named Alex) in the Coffea canephora genome often associated with expressed sequences. The Alex-1 element is inserted in an intron of a gene at the CcEIN4 locus. Its mobility was demonstrated by sequencing the insertion site in C. canephora accessions and Coffea species. Analysis of the insertion polymorphism of Alex-1 at this locus in Coffea species and in C. canephora showed that there was no relationship between the geographical distribution of the species, their phylogenetic relationships, and insertion polymorphism. The intraspecific distribution of C. canephora revealed an original situation within the E diversity group. These results suggest possibly greater gene flow between species than previously thought. This MITE family will enable the study of the C. canephora genome evolution, phylogenetic relationships, and possible gene flows within the Coffea genus.


2012 ◽  
Vol 34 (8) ◽  
pp. 1009-1019
Author(s):  
Hong-En XU ◽  
Hua-Hao ZHANG ◽  
Min-Jin HAN ◽  
Yi-Hong SHEN ◽  
Xian-Zhi HUANG ◽  
...  

Author(s):  
Murilo Horacio Pereira da Cruz ◽  
Douglas Silva Domingues ◽  
Priscila Tiemi Maeda Saito ◽  
Alexandre Rossi Paschoal ◽  
Pedro Henrique Bugatti

Abstract Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:[email protected]


2009 ◽  
Vol 1 ◽  
pp. 205-220 ◽  
Author(s):  
Cédric Feschotte ◽  
Umeshkumar Keswani ◽  
Nirmal Ranganathan ◽  
Marcel L. Guibotsy ◽  
David Levine

Genome ◽  
2018 ◽  
Vol 61 (8) ◽  
pp. 587-594 ◽  
Author(s):  
Fei Hou ◽  
Bi Ma ◽  
Youchao Xin ◽  
Lulu Kuang ◽  
Ningjia He

Horizontal transposable element transfer (HTT) events have occurred among a large number of species and play important roles in the composition and evolution of eukaryotic genomes. HTTs are also regarded as effective forces in promoting genomic variation and biological innovation. In the present study, HTT events were identified and analyzed in seven sequenced species of Rosales using bioinformatics methods by comparing sequence conservation and Ka/Ks value of reverse transcriptase (RT) with 20 conserved genes, estimating the dating of HTTs, and analyzing the phylogenetic relationships. Seven HTT events involving long terminal repeat (LTR) retrotransposons, two HTTs between Morus notabilis and Ziziphus jujuba, and five between Malus domestica and Pyrus bretschneideri were identified. Further analysis revealed that these LTR retrotransposons had functional structures, and the copy insertion times were lower than the dating of HTTs, particularly in Mn.Zj.1 and Md.Pb.3. Altogether, the results demonstrate that LTR retrotransposons still have potential transposition activity in host genomes. These results indicate that HTT events are another strategy for exchanging genetic material among species and are important for the evolution of genomes.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8311 ◽  
Author(s):  
Simon Orozco-Arias ◽  
Gustavo Isaza ◽  
Romain Guyot ◽  
Reinel Tabares-Soto

Background Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable results on different types of TEs. Machine learning (ML) techniques can automatically extract hidden patterns and novel information from labeled or non-labeled data and have been applied to solving several scientific problems. Methodology We followed the Systematic Literature Review (SLR) process, applying the six stages of the review protocol from it, but added a previous stage, which aims to detect the need for a review. Then search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. Results Several ML approaches have already been tested on other bioinformatics problems with promising results, yet there are few algorithms and architectures available in literature focused specifically on TEs, despite representing the majority of the nuclear DNA of many organisms. Only 35 articles were found and categorized as relevant in TE or related fields. Conclusions ML is a powerful tool that can be used to address many problems. Although ML techniques have been used widely in other biological tasks, their utilization in TE analyses is still limited. Following the SLR, it was possible to notice that the use of ML for TE analyses (detection and classification) is an open problem, and this new field of research is growing in interest.


Sign in / Sign up

Export Citation Format

Share Document