Detecting operons in bacterial genomes via visual representation learning

ABSTRACTContiguous genes in prokaryotes are often arranged into operons. Detecting operons plays a critical role in inferring gene functionality and regulatory networks. Human experts annotate operons by visually inspecting gene neighborhoods across pileups of related genomes. These visual representations capture the inter-genic distance, strand direction, gene size, functional relatedness, and gene neighborhood conservation, which are the most prominent operon features mentioned in the literature. By studying these features, an expert can then decide whether a genomic region is part of an operon. We propose a deep learning based method named Operon Hunter that uses visual representations of genomic fragments to make operon predictions. Using transfer learning and data augmentation techniques facilitates leveraging the powerful neural networks trained on image datasets by re-training them on a more limited dataset of extensively validated operons. Our method outperforms the previously reported state-of-the-art tools, especially when it comes to predicting full operons and their boundaries accurately. Furthermore, our approach makes it possible to visually identify the features influencing the network’s decisions to be subsequently cross-checked by human experts.

Download Full-text

Deep Learning for Person Reidentification Using Support Vector Machines

Advances in Multimedia ◽

10.1155/2017/9874345 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Mengyu Xu ◽

Zhenmin Tang ◽

Yazhou Yao ◽

Lingxiang Yao ◽

Huafeng Liu ◽

...

Keyword(s):

Data Augmentation ◽

Metric Learning ◽

Representation Learning ◽

Activation Function ◽

Feature Representation ◽

Support Vector ◽

Similarity Learning ◽

Vector Machines ◽

Augmentation Techniques ◽

Suboptimal Solution

Due to the variations of viewpoint, pose, and illumination, a given individual may appear considerably different across different camera views. Tracking individuals across camera networks with no overlapping fields is still a challenging problem. Previous works mainly focus on feature representation and metric learning individually which tend to have a suboptimal solution. To address this issue, in this work, we propose a novel framework to do the feature representation learning and metric learning jointly. Different from previous works, we represent the pairs of pedestrian images as new resized input and use linear Support Vector Machine to replace softmax activation function for similarity learning. Particularly, dropout and data augmentation techniques are also employed in this model to prevent the network from overfitting. Extensive experiments on two publically available datasets VIPeR and CUHK01 demonstrate the effectiveness of our proposed approach.

Download Full-text

Self-Organizing Representation Learning

10.36227/techrxiv.16826578.v1 ◽

2021 ◽

Author(s):

noureddine kermiche

Keyword(s):

Data Augmentation ◽

Representation Learning ◽

Self Organization ◽

Learning Methods ◽

Self Organizing Maps ◽

Output Layer ◽

Data Representations ◽

Augmentation Techniques ◽

Using Data ◽

Self Organizing

Using data augmentation techniques, unsupervised representation learning methods extract features from data by training artificial neural networks to recognize that different views of an object are just different instances of the same object. We extend current unsupervised representation learning methods to networks that can self-organize data representations into two-dimensional (2D) maps. The proposed method combines ideas from Kohonen’s original self-organizing maps (SOM) and recent development in unsupervised representation learning. A ResNet backbone with an added 2D <i>Softmax</i> output layer is used to organize the data representations. A new loss function with linear complexity is proposed to enforce SOM requirements of winner-take-all (WTA) and competition between neurons while explicitly avoiding collapse into trivial solutions. We show that enforcing SOM topological neighborhood requirement can be achieved by a fixed radial convolution at the 2D output layer without having to resort to actual radial activation functions which prevented the original SOM algorithm from being extended to nowadays neural network architectures. We demonstrate that when combined with data augmentation techniques, self-organization is a simple emergent property of the 2D output layer because of neighborhood recruitment combined with WTA competition between neurons. The proposed methodology is demonstrated on SVHN and CIFAR10 data sets. The proposed algorithm is the first end-to-end unsupervised learning method that combines data self-organization and visualization as integral parts of unsupervised representation learning.

Download Full-text

Improving Machine-Learning Diagnostics with Model-Based Data Augmentation Showcased for a Transformer Fault

Energies ◽

10.3390/en14206816 ◽

2021 ◽

Vol 14 (20) ◽

pp. 6816

Author(s):

Jannis N. Kahlen ◽

Michael Andres ◽

Albert Moser

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

State Of The Art ◽

Synthetic Data ◽

Electrical Equipment ◽

Diagnostic Systems ◽

Additional Information ◽

Model Based ◽

Augmentation Techniques ◽

Abnormal Condition

Machine-learning diagnostic systems are widely used to detect abnormal conditions in electrical equipment. Training robust and accurate diagnostic systems is challenging because only small databases of abnormal-condition data are available. However, the performance of the diagnostic systems depends on the quantity and quality of the data. The training database can be augmented utilizing data augmentation techniques that generate synthetic data to improve diagnostic performance. However, existing data augmentation techniques are generic methods that do not include additional information in the synthetic data. In this paper, we develop a model-based data augmentation technique integrating computer-implementable electromechanical models. Synthetic normal- and abnormal-condition data are generated with an electromechanical model and a stochastic parameter value sampling method. The model-based data augmentation is showcased to detect an abnormal condition of a distribution transformer. First, the synthetic data are compared with the measurements to verify the synthetic data. Then, ML-based diagnostic systems are created using model-based data augmentation and are compared with state-of-the-art diagnostic systems. It is shown that using the model-based data augmentation results in an improved accuracy compared to state-of-the-art diagnostic systems. This holds especially true when only a small abnormal-condition database is available.

Download Full-text

Stateful Premise Selection by Recurrent Neural Networks

10.29007/j5hd ◽

2020 ◽

Author(s):

Bartosz Piotrowski ◽

Josef Urban

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Language Translation ◽

New Method ◽

New Learning ◽

Augmentation Techniques

In this work we develop a new learning-based method for selecting facts (premises) when proving new goals over large formal libraries. Unlike previous methods that choose sets of facts independently of each other by their rank, the new method uses the notion of state that is updated each time a choice of a fact is made. Our stateful architecture is based on recurrent neural networks which have been recently very successful in stateful tasks such as language translation. The new method is combined with data augmentation techniques, evaluated in several ways on a standard large-theory benchmark and compared to state-of-the-art premise approach based on gradient boosted trees. It is shown to perform significantly better and to solve many new problems.

Download Full-text

What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3485135 ◽

2022 ◽

Vol 31 (2) ◽

pp. 1-34

Author(s):

Patrick Keller ◽

Abdoul Kader Kaboré ◽

Laura Plein ◽

Jacques Klein ◽

Yves Le Traon ◽

...

Keyword(s):

Transfer Learning ◽

Language Processing ◽

State Of The Art ◽

Semantic Representation ◽

Source Code ◽

Visual Representations ◽

Representation Learning ◽

Classification Problem ◽

Semantic Code ◽

Code Clone

Recent successes in training word embeddings for Natural Language Processing ( NLP ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the WySiWiM ( ‘ ‘What You See Is What It Means ” ) approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, our WySiWiM approach performs as effectively as state-of-the-art approaches such as ASTNN or TBCNN. We also showed with data from NVD and SARD that WySiWiM representation can be used to learn a vulnerable code detector with reasonable performance (accuracy ∼90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.

Download Full-text

Improved Arabic Alphabet Characters Classification Using Convolutional Neural Networks (CNN)

Computational Intelligence and Neuroscience ◽

10.1155/2022/9965426 ◽

2022 ◽

Vol 2022 ◽

pp. 1-16

Author(s):

Nesrine Wagaa ◽

Hichem Kallel ◽

Nédra Mellouli

Keyword(s):

Neural Network ◽

Data Augmentation ◽

State Of The Art ◽

Optimization Algorithms ◽

Convolution Neural Network ◽

Proposed Model ◽

Augmentation Techniques ◽

High Recognition Accuracy ◽

Suitable Change

Handwritten characters recognition is a challenging research topic. A lot of works have been present to recognize letters of different languages. The availability of Arabic handwritten characters databases is limited. Motivated by this topic of research, we propose a convolution neural network for the classification of Arabic handwritten letters. Also, seven optimization algorithms are performed, and the best algorithm is reported. Faced with few available Arabic handwritten datasets, various data augmentation techniques are implemented to improve the robustness needed for the convolution neural network model. The proposed model is improved by using the dropout regularization method to avoid data overfitting problems. Moreover, suitable change is presented in the choice of optimization algorithms and data augmentation approaches to achieve a good performance. The model has been trained on two Arabic handwritten characters datasets AHCD and Hijja. The proposed algorithm achieved high recognition accuracy of 98.48% and 91.24% on AHCD and Hijja, respectively, outperforming other state-of-the-art models.

Download Full-text

Data augmentation and transfer learning to classify malware images in a deep learning context

Journal of Computer Virology and Hacking Techniques ◽

10.1007/s11416-021-00381-3 ◽

2021 ◽

Author(s):

Niccolò Marastoni ◽

Roberto Giacobazzi ◽

Mila Dalla Preda

Keyword(s):

Neural Network ◽

Deep Learning ◽

Transfer Learning ◽

Data Augmentation ◽

Short Term Memory ◽

State Of The Art ◽

Learning Models ◽

Learning Context ◽

Automatic Feature Extraction ◽

Augmentation Techniques

AbstractIn the past few years, malware classification techniques have shifted from shallow traditional machine learning models to deeper neural network architectures. The main benefit of some of these is the ability to work with raw data, guaranteed by their automatic feature extraction capabilities. This results in less technical expertise needed while building the models, thus less initial pre-processing resources. Nevertheless, such advantage comes with its drawbacks, since deep learning models require huge quantities of data in order to generate a model that generalizes well. The amount of data required to train a deep network without overfitting is often unobtainable for malware analysts. We take inspiration from image-based data augmentation techniques and apply a sequence of semantics-preserving syntactic code transformations (obfuscations) to a small dataset of programs to generate a larger dataset. We then design two learning models, a convolutional neural network and a bi-directional long short-term memory, and we train them on images extracted from compiled binaries of the newly generated dataset. Through transfer learning we then take the features learned from the obfuscated binaries and train the models against two state of the art malware datasets, each containing around 10 000 samples. Our models easily achieve up to 98.5% accuracy on the test set, which is on par or better than the present state of the art approaches, thus validating the approach.

Download Full-text

Apple Detection in Complex Scene Using the Improved YOLOv4 Model

Agronomy ◽

10.3390/agronomy11030476 ◽

2021 ◽

Vol 11 (3) ◽

pp. 476

Author(s):

Lin Wu ◽

Jie Ma ◽

Yuehua Zhao ◽

Hong Liu

Keyword(s):

Data Augmentation ◽

State Of The Art ◽

Vision System ◽

Image Data ◽

Test Results ◽

Complex Scene ◽

Apple Industry ◽

Detection Model ◽

Large Size ◽

Augmentation Techniques

To enable the apple picking robot to quickly and accurately detect apples under the complex background in orchards, we propose an improved You Only Look Once version 4 (YOLOv4) model and data augmentation methods. Firstly, the crawler technology is utilized to collect pertinent apple images from the Internet for labeling. For the problem of insufficient image data caused by the random occlusion between leaves, in addition to traditional data augmentation techniques, a leaf illustration data augmentation method is proposed in this paper to accomplish data augmentation. Secondly, due to the large size and calculation of the YOLOv4 model, the backbone network Cross Stage Partial Darknet53 (CSPDarknet53) of the YOLOv4 model is replaced by EfficientNet, and convolution layer (Conv2D) is added to the three outputs to further adjust and extract the features, which make the model lighter and reduce the computational complexity. Finally, the apple detection experiment is performed on 2670 expanded samples. The test results show that the EfficientNet-B0-YOLOv4 model proposed in this paper has better detection performance than YOLOv3, YOLOv4, and Faster R-CNN with ResNet, which are state-of-the-art apple detection model. The average values of Recall, Precision, and F1 reach 97.43%, 95.52%, and 96.54% respectively, the average detection time per frame of the model is 0.338 s, which proves that the proposed method can be well applied in the vision system of picking robots in the apple industry.

Download Full-text

Self-Organizing Representation Learning

10.36227/techrxiv.16826578 ◽

2021 ◽

Author(s):

noureddine kermiche

Keyword(s):

Data Augmentation ◽

Representation Learning ◽

Self Organization ◽

Learning Methods ◽

Self Organizing Maps ◽

Output Layer ◽

Data Representations ◽

Augmentation Techniques ◽

Using Data ◽

Self Organizing

Using data augmentation techniques, unsupervised representation learning methods extract features from data by training artificial neural networks to recognize that different views of an object are just different instances of the same object. We extend current unsupervised representation learning methods to networks that can self-organize data representations into two-dimensional (2D) maps. The proposed method combines ideas from Kohonen’s original self-organizing maps (SOM) and recent development in unsupervised representation learning. A ResNet backbone with an added 2D <i>Softmax</i> output layer is used to organize the data representations. A new loss function with linear complexity is proposed to enforce SOM requirements of winner-take-all (WTA) and competition between neurons while explicitly avoiding collapse into trivial solutions. We show that enforcing SOM topological neighborhood requirement can be achieved by a fixed radial convolution at the 2D output layer without having to resort to actual radial activation functions which prevented the original SOM algorithm from being extended to nowadays neural network architectures. We demonstrate that when combined with data augmentation techniques, self-organization is a simple emergent property of the 2D output layer because of neighborhood recruitment combined with WTA competition between neurons. The proposed methodology is demonstrated on SVHN and CIFAR10 data sets. The proposed algorithm is the first end-to-end unsupervised learning method that combines data self-organization and visualization as integral parts of unsupervised representation learning.

Download Full-text