Data Transformation Schemes for CNN-Based Network Traffic Analysis: A Survey

The enormous growth of services and data transmitted over the internet, the bloodstream of modern civilization, has caused a remarkable increase in cyber attack threats. This fact has forced the development of methods of preventing attacks. Among them, an important and constantly growing role is that of machine learning (ML) approaches. Convolutional neural networks (CNN) belong to the hottest ML techniques that have gained popularity, thanks to the rapid growth of computing power available. Thus, it is no wonder that these techniques have started to also be applied in the network traffic classification domain. This has resulted in a constant increase in the number of scientific papers describing various approaches to CNN-based traffic analysis. This paper is a survey of them, prepared with particular emphasis on a crucial but often disregarded aspect of this topic—the data transformation schemes. Their importance is a consequence of the fact that network traffic data and machine learning data have totally different structures. The former is a time series of values—consecutive bytes of the datastream. The latter, in turn, are one-, two- or even three-dimensional data samples of fixed lengths/sizes. In this paper, we introduce a taxonomy of data transformation schemes. Next, we use this categorization to describe various CNN-based analytical approaches found in the literature.

Download Full-text

MODC: A Pareto-Optimal Optimization Approach for Network Traffic Classification Based on the Divide and Conquer Strategy

Information ◽

10.3390/info9090233 ◽

2018 ◽

Vol 9 (9) ◽

pp. 233 ◽

Cited By ~ 1

Author(s):

Zuleika Nascimento ◽

Djamel Sadok

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Machine Learning Algorithms ◽

Divide And Conquer ◽

Pareto Optimal ◽

Optimization Approach ◽

Traffic Classification ◽

Multi Objective ◽

Network Traffic Classification ◽

Changes Over Time

Network traffic classification aims to identify categories of traffic or applications of network packets or flows. It is an area that continues to gain attention by researchers due to the necessity of understanding the composition of network traffics, which changes over time, to ensure the network Quality of Service (QoS). Among the different methods of network traffic classification, the payload-based one (DPI) is the most accurate, but presents some drawbacks, such as the inability of classifying encrypted data, the concerns regarding the users’ privacy, the high computational costs, and ambiguity when multiple signatures might match. For that reason, machine learning methods have been proposed to overcome these issues. This work proposes a Multi-Objective Divide and Conquer (MODC) model for network traffic classification, by combining, into a hybrid model, supervised and unsupervised machine learning algorithms, based on the divide and conquer strategy. Additionally, it is a flexible model since it allows network administrators to choose between a set of parameters (pareto-optimal solutions), led by a multi-objective optimization process, by prioritizing flow or byte accuracies. Our method achieved 94.14% of average flow accuracy for the analyzed dataset, outperforming the six DPI-based tools investigated, including two commercial ones, and other machine learning-based methods.

Download Full-text

Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey

IEEE Communications Surveys & Tutorials ◽

10.1109/comst.2018.2883147 ◽

2019 ◽

Vol 21 (2) ◽

pp. 1988-2014 ◽

Cited By ~ 21

Author(s):

Fannia Pacheco ◽

Ernesto Exposito ◽

Mathieu Gineste ◽

Cedric Baudoin ◽

Jose Aguilar

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Traffic Classification ◽

Systematic Survey ◽

Network Traffic Classification

Download Full-text

Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison

Performance Evaluation ◽

10.1016/j.peva.2010.01.001 ◽

2010 ◽

Vol 67 (6) ◽

pp. 451-467 ◽

Cited By ~ 70

Author(s):

Murat Soysal ◽

Ece Guran Schmidt

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Traffic Classification ◽

Network Traffic Classification ◽

Classification Evaluation

Download Full-text

Study on Process of Network Traffic Classification Using Machine Learning

2010 Fifth Annual ChinaGrid Conference ◽

10.1109/chinagrid.2010.53 ◽

2010 ◽

Cited By ~ 1

Author(s):

Jian-Min Wang ◽

Cheng-Lu Qian ◽

Chun-Hui Che ◽

Hai-Tao He

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Traffic Classification ◽

Network Traffic Classification

Download Full-text

Network Traffic Classification Using Machine Learning for Software Defined Networks

Machine Learning for Networking - Lecture Notes in Computer Science ◽

10.1007/978-3-030-45778-5_3 ◽

2020 ◽

pp. 28-39

Author(s):

Menuka Perera Jayasuriya Kuranage ◽

Kandaraj Piamrat ◽

Salima Hamma

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Traffic Classification ◽

Software Defined Networks ◽

Network Traffic Classification

Download Full-text

A Survey on Finding Network Traffic Classification Methods based on C5.0 Machine Learning Algorithm

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.788791 ◽

2019 ◽

Vol 7 (4) ◽

pp. 788-791

Author(s):

Amit Kumar ◽

Daya Shankar Pandey ◽

Varsha Namdeo

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Traffic Classification ◽

Classification Methods ◽

Network Traffic Classification

Download Full-text

Problem of Network Traffic Classification in Multiprovider Cloud Infrastructures Based on Machine Learning Methods

2021 10th Mediterranean Conference on Embedded Computing (MECO) ◽

10.1109/meco52532.2021.9460171 ◽

2021 ◽

Author(s):

Dmitry Perepelkin ◽

Maria Ivanchikova

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Traffic Classification ◽

Learning Methods ◽

Machine Learning Methods ◽

Network Traffic Classification ◽

Cloud Infrastructures

Download Full-text

Effective Packet Number for 5G IM WeChat Application at Early Stage Traffic Classification

Mobile Information Systems ◽

10.1155/2017/3146868 ◽

2017 ◽

Vol 2017 ◽

pp. 1-22 ◽

Cited By ~ 4

Author(s):

Muhammad Shafiq ◽

Xiangzhan Yu

Keyword(s):

Machine Learning ◽

Mutual Information ◽

Network Traffic ◽

Early Stage ◽

Statistical Tests ◽

Internet Traffic ◽

Machine Learning Algorithms ◽

Experimental Results ◽

Traffic Classification ◽

Network Traffic Classification

Accurate network traffic classification at early stage is very important for 5G network applications. During the last few years, researchers endeavored hard to propose effective machine learning model for classification of Internet traffic applications at early stage with few packets. Nevertheless, this essential problem still needs to be studied profoundly to find out effective packet number as well as effective machine learning (ML) model. In this paper, we tried to solve the above-mentioned problem. For this purpose, five Internet traffic datasets are utilized. Initially, we extract packet size of 20 packets and then mutual information analysis is carried out to find out the mutual information of each packet onnflow type. Thereafter, we execute 10 well-known machine learning algorithms using crossover classification method. Two statistical analysis tests, Friedman and Wilcoxon pairwise tests, are applied for the experimental results. Moreover, we also apply the statistical tests for classifiers to find out effective ML classifier. Our experimental results show that 13–19 packets are the effective packet numbers for 5G IM WeChat application at early stage network traffic classification. We also find out effective ML classifier, where Random Forest ML classifier is effective classifier at early stage Internet traffic classification.

Download Full-text