scholarly journals БАГАТОШАРОВА МОДЕЛЬ ТА МЕТОД НАВЧАННЯ ДЛЯ ДЕТЕКТУВАННЯ ШКІДЛИВОГО ТРАФІКУ НА ОСНОВІ АНСАМБЛЮ ДЕРЕВ РІШЕНЬ

2020 ◽  
pp. 92-101
Author(s):  
В’ячеслав Васильович Москаленко ◽  
Микола Олександрович Зарецький ◽  
Альона Сергіївна Москаленко ◽  
Антон Михайлович Кудрявцев ◽  
Віктор Анатолійович Семашко

The model and training method of multilayer feature extractor and decision rules for a malware traffic detector is proposed. The feature extractor model is based on a convolutional sparse coding network whose sparse encoder is approximated by a regression random forest model according to the principles of knowledge distillation. In this case, an algorithm of growing sparse coding neural gas has been developed for unsupervised training the features extractor with automatic determination of the required number of features on each layer. As for feature extractor, at the training phase to implement of sparse coding the greedy L1-regularized method of Orthogonal Matching Pursuit was used, and at the knowledge distillation phase, the L1-regularized method at the least angles (Least regression algorithm) was additionally used. Due to the explaining-away effect, the extracted features are uncorrelated and robust to noise and adversarial attacks. The proposed feature extractor is unsupervised trained to separate the explanatory factors and allows to use the unlabeled training data, which are usually quite large, with the maximum efficiency. As a model of the decision rules proposed to use the binary encoder of input observations based on an ensemble of decision trees and information-extreme closed hyper-surfaces (containers) for class separation, that are recovery in radial-basis of Hemming' binary space. The addition of coding trees is based on the boosting principle, and the radius of class containers is optimized by direct search. The information-extreme classifier is characterized by low computational complexity and high generalization capacity for small sets of labeled training data. The verification results of the trained model on open CTU test data sets confirm the suitability of the proposed algorithms for practical application since the accuracy of malware traffic detection is 96.1 %.

Author(s):  
В’ячеслав Васильович Москаленко ◽  
Альона Сергіївна Москаленко ◽  
Микола Олександрович Зарецький

It is proposed the model of the hierarchical convolutional extractor of malware traffic features. Image with resolution 28x28 pixels and 10-th channels formed on the basis of successive 10 network packet flows is considered as model input. It allows to describe the spatial-temporal statistical characteristics of the traffic. The feature extractor consists of two convolutional layers with three-dimensional filters, sub-sampling layers, and activation calculation layers based on the orthogonal matching pursuit algorithm and the ReLU function. It is proposed the model of decision rules of the malware traffic detector based on information-extreme classifier. It allows to receive computatially simple decision rules and evaluate the informational efficiency of the feature extractor in the condition of the limited volume of the relevant labeled training dataset. The classifier performs an adaptive feature discretization and construction of the optimal in the information sense of radial-basis containers of classes in binary Hamming space. An information criterion of learning efficiency is the modification of S. Kulbak's measure as a function of the frequency of errors of the first and second type. Growing neural gas algorithm for pretraining of the feature extractor is improved by modifying the mechanism of insertion and updating of neurons. It allows utilizing unlabeled training samples and obtaining the optimal distribution of neurons to cover the training sample. Modification of the mechanism of insertion of new neurons is to form a new neuron at the reach of the threshold, and not with a given frequency. It allows you to improve the stability of the learning process and regulate the generalization ability of the model. The modification of the mechanism for updating the weighting coefficients of the neurons is to use the of Oja's rule instead of the Hebb's rule, which allows to avoid uncontrolled growth of neuron weights and adapts convolutional filters for sparse coding of input observation. It is proposed meta-heuristic search algorithm of simulated annealing for the training of decision rules and fine-tuning high-level filters of feature extractor. Simulation results using CTU-Mixed and CTU-13 datasets confirm the effectiveness of the resulting decision rules for recognizing the malware traffic from test samples


Author(s):  
Xin Liu ◽  
Kai Liu ◽  
Xiang Li ◽  
Jinsong Su ◽  
Yubin Ge ◽  
...  

The lack of sufficient training data in many domains, poses a major challenge to the construction of domain-specific machine reading comprehension (MRC) models with satisfying performance. In this paper, we propose a novel iterative multi-source mutual knowledge transfer framework for MRC. As an extension of the conventional knowledge transfer with one-to-one correspondence, our framework focuses on the many-to-many mutual transfer, which involves synchronous executions of multiple many-to-one transfers in an iterative manner.Specifically, to update a target-domain MRC model, we first consider other domain-specific MRC models as individual teachers, and employ knowledge distillation to train a multi-domain MRC model, which is differentially required to fit the training data and match the outputs of these individual models according to their domain-level similarities to the target domain. After being initialized by the multi-domain MRC model, the target-domain MRC model is fine-tuned to match both its training data and the output of its previous best model simultaneously via knowledge distillation. Compared with previous approaches, our framework can continuously enhance all domain-specific MRC models by enabling each model to iteratively and differentially absorb the domain-shared knowledge from others. Experimental results and in-depth analyses on several benchmark datasets demonstrate the effectiveness of our framework.


2019 ◽  
Vol 5 (11) ◽  
pp. 85 ◽  
Author(s):  
Ayan Chatterjee ◽  
Peter W. T. Yuen

This paper proposes a simple yet effective method for improving the efficiency of sparse coding dictionary learning (DL) with an implication of enhancing the ultimate usefulness of compressive sensing (CS) technology for practical applications, such as in hyperspectral imaging (HSI) scene reconstruction. CS is the technique which allows sparse signals to be decomposed into a sparse representation “a” of a dictionary D u . The goodness of the learnt dictionary has direct impacts on the quality of the end results, e.g., in the HSI scene reconstructions. This paper proposes the construction of a concise and comprehensive dictionary by using the cluster centres of the input dataset, and then a greedy approach is adopted to learn all elements within this dictionary. The proposed method consists of an unsupervised clustering algorithm (K-Means), and it is then coupled with an advanced sparse coding dictionary (SCD) method such as the basis pursuit algorithm (orthogonal matching pursuit, OMP) for the dictionary learning. The effectiveness of the proposed K-Means Sparse Coding Dictionary (KMSCD) is illustrated through the reconstructions of several publicly available HSI scenes. The results have shown that the proposed KMSCD achieves ~40% greater accuracy, 5 times faster convergence and is twice as robust as that of the classic Spare Coding Dictionary (C-SCD) method that adopts random sampling of data for the dictionary learning. Over the five data sets that have been employed in this study, it is seen that the proposed KMSCD is capable of reconstructing these scenes with mean accuracies of approximately 20–500% better than all competing algorithms adopted in this work. Furthermore, the reconstruction efficiency of trace materials in the scene has been assessed: it is shown that the KMSCD is capable of recovering ~12% better than that of the C-SCD. These results suggest that the proposed DL using a simple clustering method for the construction of the dictionary has been shown to enhance the scene reconstruction substantially. When the proposed KMSCD is incorporated with the Fast non-negative orthogonal matching pursuit (FNNOMP) to constrain the maximum number of materials to coexist in a pixel to four, experiments have shown that it achieves approximately ten times better than that constrained by using the widely employed TMM algorithm. This may suggest that the proposed DL method using KMSCD and together with the FNNOMP will be more suitable to be the material allocation module of HSI scene simulators like the CameoSim package.


2016 ◽  
Vol 61 (4) ◽  
pp. 413-429 ◽  
Author(s):  
Saif Dawood Salman Al-Shaikhli ◽  
Michael Ying Yang ◽  
Bodo Rosenhahn

AbstractThis paper presents a novel fully automatic framework for multi-class brain tumor classification and segmentation using a sparse coding and dictionary learning method. The proposed framework consists of two steps: classification and segmentation. The classification of the brain tumors is based on brain topology and texture. The segmentation is based on voxel values of the image data. Using K-SVD, two types of dictionaries are learned from the training data and their associated ground truth segmentation: feature dictionary and voxel-wise coupled dictionaries. The feature dictionary consists of global image features (topological and texture features). The coupled dictionaries consist of coupled information: gray scale voxel values of the training image data and their associated label voxel values of the ground truth segmentation of the training data. For quantitative evaluation, the proposed framework is evaluated using different metrics. The segmentation results of the brain tumor segmentation (MICCAI-BraTS-2013) database are evaluated using five different metric scores, which are computed using the online evaluation tool provided by the BraTS-2013 challenge organizers. Experimental results demonstrate that the proposed approach achieves an accurate brain tumor classification and segmentation and outperforms the state-of-the-art methods.


2014 ◽  
Vol 23 (03) ◽  
pp. 1460004 ◽  
Author(s):  
Jayaraman J. Thiagarajan ◽  
Karthikeyan Natesan Ramamurthy ◽  
Deepta Rajan ◽  
Andreas Spanias ◽  
Anup Puri ◽  
...  

In this paper, we propose sparse coding-based approaches for segmentation of tumor regions from magnetic resonance (MR) images. Sparse coding with data-adapted dictionaries has been successfully employed in several image recovery and vision problems. The proposed approaches obtain sparse codes for each pixel in brain MR images considering their intensity values and location information. Since it is trivial to obtain pixel-wise sparse codes, and combining multiple features in the sparse coding setup is not straight-forward, we propose to perform sparse coding in a high-dimensional feature space where non-linear similarities can be effectively modeled. We use the training data from expert-segmented images to obtain kernel dictionaries with the kernel K-lines clustering procedure. For a test image, sparse codes are computed with these kernel dictionaries, and they are used to identify the tumor regions. This approach is completely automated, and does not require user intervention to initialize the tumor regions in a test image. Furthermore, a low complexity segmentation approach based on kernel sparse codes, which allows the user to initialize the tumor region, is also presented. Results obtained with both the proposed approaches are validated against manual segmentation by an expert radiologist, and it is shown that proposed methods lead to accurate tumor identification.


2015 ◽  
Vol 24 (1) ◽  
pp. 135-143 ◽  
Author(s):  
Omer F. Alcin ◽  
Abdulkadir Sengur ◽  
Jiang Qian ◽  
Melih C. Ince

AbstractExtreme learning machine (ELM) is a recent scheme for single hidden layer feed forward networks (SLFNs). It has attracted much interest in the machine intelligence and pattern recognition fields with numerous real-world applications. The ELM structure has several advantages, such as its adaptability to various problems with a rapid learning rate and low computational cost. However, it has shortcomings in the following aspects. First, it suffers from the irrelevant variables in the input data set. Second, choosing the optimal number of neurons in the hidden layer is not well defined. In case the hidden nodes are greater than the training data, the ELM may encounter the singularity problem, and its solution may become unstable. To overcome these limitations, several methods have been proposed within the regularization framework. In this article, we considered a greedy method for sparse approximation of the output weight vector of the ELM network. More specifically, the orthogonal matching pursuit (OMP) algorithm is embedded to the ELM. This new technique is named OMP-ELM. OMP-ELM has several advantages over regularized ELM methods, such as lower complexity and immunity to the singularity problem. Experimental works on nine commonly used regression problems indicate that the investigated OMP-ELM method confirms these advantages. Moreover, OMP-ELM is compared with the ELM method, the regularized ELM scheme, and artificial neural networks.


2021 ◽  
Author(s):  
Tiantian Zhang ◽  
Xueqian Wang ◽  
Bin Liang ◽  
Bo Yuan

The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" (a.k.a. "catastrophic forgetting") and the collapse in performance as later training is likely to overwrite and interfer with previously learned good policies. In this paper, we introduce the concept of "context" into the single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.


2021 ◽  
Author(s):  
Mohamed Omar ◽  
Lotte Mulder ◽  
Tendai Coady ◽  
Claudio Zanettini ◽  
Eddie Luidy Imada ◽  
...  

Machine learning (ML) algorithms are used to build predictive models or classifiers for specific disease outcomes using transcriptomic data. However, some of these models show deteriorating performance when tested on unseen data which undermines their clinical utility. In this study, we show the importance of directly embedding prior biological knowledge into the classifier decision rules to build simple and interpretable gene signatures. We tested this in two important classification examples: a) progression in non-muscle invasive bladder cancer; and b) response to neoadjuvant chemotherapy (NACT) in triple-negative breast cancer (TNBC) using different ML algorithms. For each algorithm, we developed two sets of classifiers: agnostic, trained using either individual gene expression values or the corresponding pairwise ranks without biological consideration; and mechanistic, trained by restricting the search to a set of gene pairs capturing important biological relations. Both types were trained on the same training data and their performance was evaluated on unseen testing data using different methodologies and multiple evaluation metrics. Our analysis shows that mechanistic models outperform their agnostic counterparts when tested on independent data and show more consistency to their performance in the training with enhanced interpretability. These findings suggest that using biological constraints in the training process can yield more robust and interpretable gene signatures with high translational potential.


2021 ◽  
Vol 11 (18) ◽  
pp. 8412
Author(s):  
Hyeong-Ju Na ◽  
Jeong-Sik Park

The performance of automatic speech recognition (ASR) may be degraded when accented speech is recognized because the speech has some linguistic differences from standard speech. Conventional accented speech recognition studies have utilized the accent embedding method, in which the accent embedding features are directly fed into the ASR network. Although the method improves the performance of accented speech recognition, it has some restrictions, such as increasing the computational costs. This study proposes an efficient method of training the ASR model for accented speech in a domain adversarial way based on the Domain Adversarial Neural Network (DANN). The DANN plays a role as a domain adaptation in which the training data and test data have different distributions. Thus, our approach is expected to construct a reliable ASR model for accented speech by reducing the distribution differences between accented speech and standard speech. DANN has three sub-networks: the feature extractor, the domain classifier, and the label predictor. To adjust the DANN for accented speech recognition, we constructed these three sub-networks independently, considering the characteristics of accented speech. In particular, we used an end-to-end framework based on Connectionist Temporal Classification (CTC) to develop the label predictor, a very important module that directly affects ASR results. To verify the efficiency of the proposed approach, we conducted several experiments of accented speech recognition for four English accents including Australian, Canadian, British (England), and Indian accents. The experimental results showed that the proposed DANN-based model outperformed the baseline model for all accents, indicating that the end-to-end domain adversarial training effectively reduced the distribution differences between accented speech and standard speech.


Author(s):  
Xiang Deng ◽  
Zhongfei Zhang

Knowledge distillation (KD) transfers knowledge from a teacher network to a student by enforcing the student to mimic the outputs of the pretrained teacher on training data. However, data samples are not always accessible in many cases due to large data sizes, privacy, or confidentiality. Many efforts have been made on addressing this problem for convolutional neural networks (CNNs) whose inputs lie in a grid domain within a continuous space such as images and videos, but largely overlook graph neural networks (GNNs) that handle non-grid data with different topology structures within a discrete space. The inherent differences between their inputs make these CNN-based approaches not applicable to GNNs. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a GNN without graph data. The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multinomial distribution. We then introduce a gradient estimator to optimize this framework. Essentially, the gradients w.r.t. graph structures are obtained by only using GNN forward-propagation without back-propagation, which means that GFKD is compatible with modern GNN libraries such as DGL and Geometric. Moreover, we provide the strategies for handling different types of prior knowledge in the graph data or the GNNs. Extensive experiments demonstrate that GFKD achieves the state-of-the-art performance for distilling knowledge from GNNs without training data.


Sign in / Sign up

Export Citation Format

Share Document