Prospects for Generative - Adversarial Networks in Network Traffic Classification Tasks

Abstract The paper presents an approach that allows increasing the training sample and reducing class imbalance for traffic classification problems. The basic principles and architecture of generative adversarial networks are considered. The mathematical model of network traffic classification is described. The training sample taken to solve the problem has been analyzed. The data proprocessing is carried out and justified. An architecture of the generative-adversarial network is constructed and an algorithm for generating new features is developed. Machine learning models for traffic classification problem were considered and built: Logistic regression, k Nearest Neighbors, Decision tree, Random forest. A comparative analysis of the results of machine learning models without and with the generation of new features is conducted. The obtained results can be applied both in the tasks of network traffic classification, and in general cases of multiclass classification and exclusion of unbalanced features.

Download Full-text

Counterfactual Examples for Data Augmentation: A Case Study

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128503 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Md Golam Moula Mehedi Hasan ◽

Douglas A. Talbert

Keyword(s):

Machine Learning ◽

Potential Application ◽

Data Augmentation ◽

Generative Adversarial Networks ◽

Application Area ◽

Learning Models ◽

Adversarial Networks ◽

Feature Values ◽

Machine Learning Models

Counterfactual explanations are gaining in popularity as a way of explaining machine learning models. Counterfactual examples are generally created to help interpret the decision of a model. In this case, if a model makes a certain decision for an instance, the counterfactual examples of that instance reverse the decision of the model. The counterfactual examples can be created by craftily changing particular feature values of the instance. Though counterfactual examples are generated to explain the decision of machine learning models, in this work, we explore another potential application area of counterfactual examples, whether counterfactual examples are useful for data augmentation. We demonstrate the efficacy of this approach on the widely used “Adult-Income” dataset. We consider several scenarios where we do not have enough data and use counterfactual examples to augment the dataset. We compare our approach with Generative Adversarial Networks approach for dataset augmentation. The experimental results show that our proposed approach can be an effective way to augment a dataset.

Download Full-text

Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks

10.1145/3485832.3485838 ◽

2021 ◽

Author(s):

Hailong Hu ◽

Jun Pang

Keyword(s):

Machine Learning ◽

Generative Adversarial Networks ◽

Learning Models ◽

Adversarial Networks ◽

Machine Learning Models

Download Full-text

Inverse Airfoil Design Method for Generating Varieties of Smooth Airfoils Using Conditional WGAN-GP

10.21203/rs.3.rs-618399/v1 ◽

2021 ◽

Author(s):

Kazuo Yonekura ◽

Nozomu Miyamoto ◽

Katsuyuki Suzuki

Keyword(s):

Machine Learning ◽

Design Method ◽

Lift Coefficient ◽

Flow Analysis ◽

Generative Adversarial Networks ◽

Learning Models ◽

Smoothing Methods ◽

Adversarial Networks ◽

Proposed Model ◽

Machine Learning Models

Abstract Machine learning models are recently utilized for airfoil shape generation methods. It is desired to obtain airfoil shapes that satisfies required lift coefficient. Generative adversarial networks (GAN) output reasonable airfoil shapes. However, shapes obtained from ordinal GAN models are not smooth, and they need smoothing before flow analysis. Therefore, the models need to be coupled with B'ezier curves or other smoothing methods to obtain smooth shapes. Generating shapes without any smoothing methods is challenging. In this study, we employed conditional Wasserstein GAN with gradient penalty (CWGAN-GP) to generate airfoil shapes, and the obtained shapes are as smooth as those obtained using smoothing methods. With the proposed method, no additional smoothing method is needed to generate airfoils. Moreover, the proposed model outputs shapes that satisfy the lift coefficient requirements.

Download Full-text

A New Integrated Approach for Landslide Data Balancing and Spatial Prediction Based on Generative Adversarial Networks (GAN)

Remote Sensing ◽

10.3390/rs13194011 ◽

2021 ◽

Vol 13 (19) ◽

pp. 4011

Author(s):

Husam A. H. Al-Najjar ◽

Biswajeet Pradhan ◽

Raju Sarkar ◽

Ghassan Beydoun ◽

Abdullah Alamri

Keyword(s):

Machine Learning ◽

Spatial Prediction ◽

Generative Models ◽

Generative Adversarial Networks ◽

Slope Aspect ◽

Support Vector ◽

Learning Models ◽

Adversarial Networks ◽

Landslide Data ◽

Machine Learning Models

Landslide susceptibility mapping has significantly progressed with improvements in machine learning techniques. However, the inventory / data imbalance (DI) problem remains one of the challenges in this domain. This problem exists as a good quality landslide inventory map, including a complete record of historical data, is difficult or expensive to collect. As such, this can considerably affect one’s ability to obtain a sufficient inventory or representative samples. This research developed a new approach based on generative adversarial networks (GAN) to correct imbalanced landslide datasets. The proposed method was tested at Chukha Dzongkhag, Bhutan, one of the most frequent landslide prone areas in the Himalayan region. The proposed approach was then compared with the standard methods such as the synthetic minority oversampling technique (SMOTE), dense imbalanced sampling, and sparse sampling (i.e., producing non-landslide samples as many as landslide samples). The comparisons were based on five machine learning models, including artificial neural networks (ANN), random forests (RF), decision trees (DT), k-nearest neighbours (kNN), and the support vector machine (SVM). The model evaluation was carried out based on overall accuracy (OA), Kappa Index, F1-score, and area under receiver operating characteristic curves (AUROC). The spatial database was established with a total of 269 landslides and 10 conditioning factors, including altitude, slope, aspect, total curvature, slope length, lithology, distance from the road, distance from the stream, topographic wetness index (TWI), and sediment transport index (STI). The findings of this study have shown that both GAN and SMOTE data balancing approaches have helped to improve the accuracy of machine learning models. According to AUROC, the GAN method was able to boost the models by reaching the maximum accuracy of ANN (0.918), RF (0.933), DT (0.927), kNN (0.878), and SVM (0.907) when default parameters used. With the optimum parameters, all models performed best with GAN at their highest accuracy of ANN (0.927), RF (0.943), DT (0.923) and kNN (0.889), except SVM obtained the highest accuracy of (0.906) with SMOTE. Our finding suggests that RF balanced with GAN can provide the most reasonable criterion for landslide prediction. This research indicates that landslide data balancing may substantially affect the predictive capabilities of machine learning models. Therefore, the issue of DI in the spatial prediction of landslides should not be ignored. Future studies could explore other generative models for landslide data balancing. By using state-of-the-art GAN, the proposed model can be considered in the areas where the data are limited or imbalanced.

Download Full-text

ORGANIC (1).pdf

10.26434/chemrxiv.5309668.v1 ◽

2017 ◽

Author(s):

Benjamin Sanchez-Lengeling ◽

Carlos Outeiral ◽

Gabriel L. Guimaraes ◽

Alan Aspuru-Guzik

Keyword(s):

Machine Learning ◽

Learning Community ◽

Chemical Species ◽

Material Design ◽

Organic Photovoltaic ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Photovoltaic Material

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.

Download Full-text

Dynamics of Fourier Modes in Torus Generative Adversarial Networks

Mathematics ◽

10.3390/math9040325 ◽

2021 ◽

Vol 9 (4) ◽

pp. 325

Author(s):

Ángel González-Prieto ◽

Alberto Mozo ◽

Edgar Talavera ◽

Sandra Gómez-Canaval

Keyword(s):

Fourier Series ◽

Generative Adversarial Networks ◽

Learning Models ◽

Training Process ◽

Small Perturbations ◽

Adversarial Networks ◽

Novel Method ◽

Truncated Fourier Series ◽

Real Flow ◽

Machine Learning Models

Generative Adversarial Networks (GANs) are powerful machine learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable, and typically, it is necessary to implement several accessory heuristics to the networks to reach acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of generative adversarial networks. For this purpose, we propose to decompose the objective function of the adversary min–max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous alternating gradient descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of GAN. This approach is confirmed empirically by studying the training flow in a 2-parametric GAN, aiming to generate an unknown exponential distribution. As a by-product, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs.

Download Full-text

MODC: A Pareto-Optimal Optimization Approach for Network Traffic Classification Based on the Divide and Conquer Strategy

Information ◽

10.3390/info9090233 ◽

2018 ◽

Vol 9 (9) ◽

pp. 233 ◽

Cited By ~ 1

Author(s):

Zuleika Nascimento ◽

Djamel Sadok

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Machine Learning Algorithms ◽

Divide And Conquer ◽

Pareto Optimal ◽

Optimization Approach ◽

Traffic Classification ◽

Multi Objective ◽

Network Traffic Classification ◽

Changes Over Time

Network traffic classification aims to identify categories of traffic or applications of network packets or flows. It is an area that continues to gain attention by researchers due to the necessity of understanding the composition of network traffics, which changes over time, to ensure the network Quality of Service (QoS). Among the different methods of network traffic classification, the payload-based one (DPI) is the most accurate, but presents some drawbacks, such as the inability of classifying encrypted data, the concerns regarding the users’ privacy, the high computational costs, and ambiguity when multiple signatures might match. For that reason, machine learning methods have been proposed to overcome these issues. This work proposes a Multi-Objective Divide and Conquer (MODC) model for network traffic classification, by combining, into a hybrid model, supervised and unsupervised machine learning algorithms, based on the divide and conquer strategy. Additionally, it is a flexible model since it allows network administrators to choose between a set of parameters (pareto-optimal solutions), led by a multi-objective optimization process, by prioritizing flow or byte accuracies. Our method achieved 94.14% of average flow accuracy for the analyzed dataset, outperforming the six DPI-based tools investigated, including two commercial ones, and other machine learning-based methods.

Download Full-text