scholarly journals TERL: classification of transposable elements by convolutional neural networks

Author(s):  
Murilo Horacio Pereira da Cruz ◽  
Douglas Silva Domingues ◽  
Priscila Tiemi Maeda Saito ◽  
Alexandre Rossi Paschoal ◽  
Pedro Henrique Bugatti

Abstract Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:[email protected]

Author(s):  
Murilo Horacio Pereira da Cruz ◽  
Douglas Silva Domingues ◽  
Priscila Tiemi Maeda Saito ◽  
Alexandre Rossi Paschoal ◽  
Pedro Henrique Bugatti

AbstractTransposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. They are capable of transpose and generate multiple copies of themselves throughout genomes. These sequences can produce a variety of effects on organisms, such as regulation of gene expression. There are several types of these elements, which are classified in a hierarchical way into classes, subclasses, orders and superfamilies. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology based search, which could be inefficient for classifying non-homologous sequences. Here we propose a pipeline, transposable elements representation learner (TERL), that use four preprocessing steps, a transformation of one-dimensional nucleic acid sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks (CNNs). CNN is used to classify TE sequences because it is a very flexible classification method, given it can be easily retrained to classify different categories and any other DNA sequences. This classification method tries to learn the best representation of the input data to correctly classify it. CNNs can also be accelerated via GPUs to provide fast results. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for the superfamily sequences from RepBase and 95.7% and 91.5% for the order sequences from RepBase respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. We also show a way to preprocess sequences and prepare train and test sets. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system, is on average 162 times and four orders of magnitude faster than TEclass and PASTEC respectively and on a real-world scenario obtained better accuracy, recall, and specificity than the other methods.


2020 ◽  
Vol 2020 (10) ◽  
pp. 28-1-28-7 ◽  
Author(s):  
Kazuki Endo ◽  
Masayuki Tanaka ◽  
Masatoshi Okutomi

Classification of degraded images is very important in practice because images are usually degraded by compression, noise, blurring, etc. Nevertheless, most of the research in image classification only focuses on clean images without any degradation. Some papers have already proposed deep convolutional neural networks composed of an image restoration network and a classification network to classify degraded images. This paper proposes an alternative approach in which we use a degraded image and an additional degradation parameter for classification. The proposed classification network has two inputs which are the degraded image and the degradation parameter. The estimation network of degradation parameters is also incorporated if degradation parameters of degraded images are unknown. The experimental results showed that the proposed method outperforms a straightforward approach where the classification network is trained with degraded images only.


Sign in / Sign up

Export Citation Format

Share Document